Multivariate stochastic bias corrections with optimal transport

. Bias correction methods are used to calibrate climate model outputs with respect to observational records. The goal is to ensure that statistical features (such as means and variances) of climate simulations are coherent with observations. In this article, a multivariate stochastic bias correction method is developed based on optimal transport. Bias correction methods are usually deﬁned as transfer functions between random variables. We show that such transfer functions induce a joint probability distribution between the biased random variable and its correction. The optimal transport theory allows us to construct a joint distribution that minimizes an energy spent in bias correction. This ex-tends the classical univariate quantile mapping techniques in the multivariate case. We also propose a deﬁnition of non-stationary bias correction as a transfer of the model to the observational world, and we extend our method in this context. Those methodologies are ﬁrst tested on an idealized chaotic system with three variables. In those controlled ex-periments, the correlations between variables appear almost perfectly corrected by our method, as opposed to a univariate correction. Our methodology is also tested on daily precipitation and temperatures over 12 locations in southern France. The correction of the inter-variable and inter-site structures


Introduction
Global climate models (GCMs) and regional climate models (RCMs) are used to study the climate system. However, their outputs often appear biased compared to observational references (e.g., Randall et al., 2007). For example, the tempera-ture means can be shifted. Thus, removing this bias is often necessary to drive impact studies such as those based on crop or hydrological models (Chen et al., 2013). The main goal of bias correction (BC) is to match the statistical features of climate model outputs with observations (see, e.g., Ehret et al., 2012;Gudmundsson et al., 2012). The most used method is the quantile mapping (Panofsky and Brier, 1958;Wood et al., 2004;Déqué, 2007), which adjusts the quantiles of the variables of interest in the stationary case (Shrestha et al., 2014). The importance of the stationarity hypothesis has been discussed by a few studies (Christensen et al., 2008;Maraun, 2012;Nahar et al., 2017). Some extensions, like CDF-t (Cumulative Distribution Function transfer, Michelangeli et al., 2009), can take into account some of the non-stationarity in GCMs or RCMs.
Most of those methods are univariate, and do not take into account the spatial and inter-variable correlations, which may alter the quality of the corrections (e.g., Wilcke et al., 2013;Maraun, 2016). Maraun et al. (2017) have pointed out that correcting model output could induce biases of physical processes and that such procedures require an understanding of the nature of the biases. In particular it is crucial to investigate the way key climate variables co-vary.
This shortcoming has led to the recent development of multivariate techniques. As mentioned by Vrac (2018), two kinds of methods are currently available. The first type corrects separately each marginal and applies afterwards a correction of the dependence structure (e.g., Vrac and Friederichs, 2015;Vrac, 2018;Nahar et al., 2018;Cannon, 2018). The second kind performs recursive corrections: each variable is corrected conditionally on the previously already corrected variables (Bárdossy and Pegram, 2012;Dekens et al., 2017). These last methods have two main limitations. First, the correction depends on the ordering of the Published by Copernicus Publications on behalf of the European Geosciences Union. marginals. Second, each marginal is adjusted conditionally on previously corrected marginals, which reduces the number of data at each step. Furthermore, the variability of observations is generally greater than that of the climate models. To increase the variability, von Storch (1999), Wong et al. (2014) and Mao et al. (2015) suggested introducing a stochastic component into the bias correction procedure. In this paper, we develop a multivariate and stochastic bias correction method, different from the two categories presented, based on elements from optimal transport theory.
Optimal transport theory is a natural way to measure the dissimilarity between multivariate probability distributions (Villani, 2008;Muskulus and Verduyn-Lunel, 2011;Robin et al., 2017), especially in a multivariate case. For example, this has already been successfully applied in image processing to transfer colors between images (Rubner et al., 2000;Ferradans et al., 2013). Here, our goal is to apply optimal transport techniques to perform bias correction in estimating a particular joint law (called a transport plan) that links the probability distributions of a biased random variable and its correction. This joint law minimizes a cost function, representing the energy needed to transform a multivariate probability distribution to another. In this optimal transport context, any realization of the biased random variable induces a conditional law of the transport plan, associating the realization and its correction. As the corrections are randomly drawn from these conditional laws, the suggested method is stochastic by construction.
Moreover, Maraun et al. (2017) also stressed that BC methods do not correct the physical processes of the model, and errors can propagate into the corrections. However, one key aspect of the present work is to highlight that, in a climate change context (or more generally, in a framework where corrections are performed in conditions different from the calibration dataset), a proper BC method should provide changes -from calibration to projection periods -in agreement with the modeled data to be corrected. Knowing the quality of the raw modeled data (and of the underlying processes) is therefore an important a priori step. Nevertheless, this is beyond the scope of bias correction per se. This paper is organized as follows. In Sect. 2, the developed theoretical framework to perform bias correction is presented. In particular, the classical definition of bias correction as transfer function is generalized with optimal transport theory. Two methods are presented: optimal transport correction (OTC, stationary case) and dynamical optimal transport correction (dOTC, non-stationary case). In Sect. 3, the proposed methodology is tested on an idealized non-stationary case based on chaotic attractors. In Sect. 4, a multivariate bias correction is performed on a regional climate model (RCM) simulation of temperatures and precipitation, in a cross-validation case. Section 5 provides conclusions and perspectives.

Theoretical framework
The general goal of this paper is the correction of a random variable, denoted X (e.g., a biased climate model output) with respect to a reference random variable, denoted Y. The random variables X and Y live in dimension d. If d = 1, we denote them X and Y . The probability law of X (or Y) is denoted P X (or P Y ).
Following Piani et al. (2010), a bias correction method of X with respect to Y is a map T : R d → R d , called a transfer function, such that the random variable T (X) (the correction) follows the same law as Y, i.e., P T (X) = P Y . This definition covers most of the practical cases, but we can construct random variables where no deterministic transfer function exists, e.g., if X is constant and Y is not. Thus, beyond a multivariate transfer function, it is necessary to extend the definition of bias correction.
In the first part, we highlight our method of bias correction with a univariate example starting from quantile mapping. In the second part, the mathematical theory is explained. Finally, an extension of our method in a non-stationary context is presented.

From quantile mapping to optimal transport
We start with the construction of a quantile mapping method in the univariate case, i.e., with d = 1. In this context, the biased and reference random variables are denoted X and Y , respectively. A transfer function T between X and Y is constructed on the cumulative distribution functions (CDFs) of X and Y , defined by A realization y of Y is the correction of a realization x of X if and only if F X (x) = F Y (y). Under the assumption that F Y is invertible, the correction y of x is given by Thus the transfer function is written T = F −1 Y • F X . This method is called quantile mapping. Indeed, the quantiles of X and Y are matched through the relation F X (x) = F Y (y).
We illustrate the quantile mapping method with an example in Fig. 1a. In this example, the random variables X and Y are two Gaussian laws centered, respectively, on 0 and 10, with a standard deviation of 1. We cut R into cells of length 1 and estimate the histograms. Fig. 1a shows the two histograms of X and Y in red and blue, respectively. The x axis gives the empirical quantiles of the edges of each cell. The black arrows indicate how the quantile mapping connects a cell of X to a cell of Y . For example, the realizations of X in the first blue cell are corrected and transferred to realizations in the first three red cells of Y .
The main point here is the following: in the univariate context, we can perform a bias correction with only the black arrows. A realization in a cell of X is corrected to a realization into a cell of Y connected by a black arrow. Because in a multivariate context the quantile mapping can not be used to estimate these arrows (CDFs are not invertible), our problem is the following: how to construct these black arrows in a multivariate context. For this, let x i (y j ) be the centers of each cell of the histogram of X (Y ). Let p xi be the number of realizations of X in the interval x i , and let p yj be the number of realizations of Y in the interval y j . We represent all possible black arrows by a collection of coefficients γ ij . A γ ij value corresponds to the number of realizations in the cell x i that are transferred to realizations in the cell y j . We obtain the following two equalities: representing how the cell x j is split into each cell y j ; and representing how cell y j received the realizations from each cell x i . We depict the γ 1j coefficients in Fig. 1b. The black arrows represent the number of realizations γ 1j that are transferred to each y j .
The problem is to calculate the coefficients γ ij . For each displacement γ ij , we can associate a cost, which is the square of the length of the displacement, |x i − y j | 2 . This choice comes from the optimal transport theory, and will be highlighted in the next section. To correct γ ij realizations, we have a cost of γ ij |x i − y j | 2 . We thus obtain a global cost associated with the γ ij coefficients: Our bias correction method is defined by the γ ij coefficients minimizing the functional C. The γ ij obtained by minimizing C for our example are shown in Fig. 1c. Comparing with the quantile mapping in Fig. 1a, we can see that the obtained coefficients (the black arrows) are similar. Indeed, the coefficients induced by the quantile mapping are precisely those minimizing the functional C. Proofs of this statement can be found in Farchi et al. (2016, Appendix A) and Santambrogio (2015, chap. 2). In other words, even in the absence of CDF, a bias correction can be carried out by calculating the minimum of the function C.
The advantage of this approach is that the functional C can be written in the multivariate case by replacing |x i − y j | by x i −y j , where x i and y j are the center of multivariate cells, and · the Euclidean norm. We illustrate in Fig. 1d how the displacements are carried out in the case of two bivariate Gaussian distributions. The black arrows again represent the non-zero coefficients estimated by the OTC method (we only represent 30 arrows).
In the next section we present the mathematical theory behind this example with probability measures of X and Y . If we normalize the number of realizations of X and Y in each bin by the total number of realizations of X and Y , we obtain p xi and p yj . Therefore, the transport can be written as a transport of a fraction of mass, instead of a transport of the number of realizations.

Bias correction as a joint distribution
In the multivariate context we assume the existence of a transfer function T between X and Y . By construction, the random variables X and T (X) are dependent, and their associated joint law can be summarized by the function κ : The map κ connects the random variable X with its correc-

and given for all measurable sets
The critical property here concerns the margins of P T : the first (second) margin of P T is P X (P Y ). To understand why it is critical, let (P X , P Y ) be the set of probability measures on R d × R d for which P X is the first margin and P Y the second one. By definition, P T ∈ (P X , P Y ). Thus, any bias correction method defined by a transfer function is an element of (P X , P Y ). We argue that any probability distribution in (P X , P Y ) induces a bias correction method. For γ ∈ (P X , P Y ), γ (x, y) can be interpreted as the probability that y is the correction of x. Formally, the Jirina theorem (see, e.g., Strook, 1995, chap. 5) states that there exists a collection of probability laws γ x , x ∈ R d , such that γ x are the conditional laws of Y given X. In other words, for B ⊂ R d , γ x (B) is the probability that the correction y ∈ B, given X = x. The correction of x is then sampled from the law γ x . Thus, any γ ∈ (P X , P Y ) defines a bias correction method, through the conditional laws γ x . This highlights the stochastic part of this approach: all corrections are sampled from the laws γ x , and the corrected values follow the law P Y (by definition of a conditional law).
We note that the problem where X is constant is easily solved with this approach. The set (P X , P Y ) is reduced to one element: the independent law δ x × P Y , where δ x is the Dirac mass in x. Thus, γ x = P Y , and the correction of X is given by sampling each correction with the law P Y .
We have defined a bias correction method as an element of (P X , P Y ). However, this set can be very large. The goal of the next section is to present a criterion to select an element of (P X , P Y ).

Selection of a joint law with optimal transport theory
To select a probability law γ ∈ (P X , P Y ), we propose using a cost function on this set. The minimum of this cost function corresponds to an optimal bias correction method. We propose minimizing the energy needed to transform a realization x of X to its correction y, i.e., minimizing x − y 2 , weighted by γ (x, y). Thus, the cost function C is given by This cost function minimizes the square of the distance between x and its correction y. Our bias correction method is associated with the law γ that minimizes C. This cost function stems from optimal transport theory (Villani, 2008 We assume that (X 1 , . . ., X n ) and (Y 1 , . . ., Y n ) are two independent and identically distributed (i.i.d.) samples of the random variables X and Y. A first step is to estimate the empirical distributions,P X andP Y . We denote c i a collection of regularly spaced cells that partition R d and cover (X 1 , . . ., X n ) and (Y 1 , . . ., Y n ). The center of each cell is also denoted c i . With this notation,P X andP Y can be written as a sum of I and J Dirac masses: The scalar p X,i (or p Y,j ) is the empirical weight around c i (or c j ) and induced from the sampling of X (or Y). A natural estimator of γ ∈ (P X , P Y ) can be written aŝ The coefficients γ ij are the probabilities to transform c i (i.e., a x ∈ c i ) to c j (i.e., a y ∈ c j ). They are unknown, and they have to obey the marginal properties: Finally, the cost function defined in Eq. (1) can be approximated bŷ Finding γ ij , i.e., solving the problem defined by constraints of Eqs.
(2)-(3) and minimization of Eq. (4), is called a linear programming problem. It can be solved (for example) by the network simplex algorithm (see, e.g., Bazaraa et al., 2009). We use the python implementation of Flamary and Courty (2017). To correct X, we have to follow the plan of γ ij . For a realization X l of X, we take the cell c i that contains X l . Followingγ , c i is moved to c j with probability γ ij /p X,i (applying Eq. (2), the sum over j is 1). To determine c j , we randomly draw it according to the conditional lawγ X l = (γ i1 , . . ., γ iJ )/p X,i . Finally, we draw uniformly y in c j . This methodology is summarized in Algorithm A1, and we refer to it as optimal transport correction (OTC).
Note that the traditional one-dimensional quantile mapping preserves the ordering of quantiles. In the multivariate case, this type of property can be viewed as the Monge-Mather (1991) shortening principle (see, e.g., Villani, 2008, chap. 8). The idea is that the extremes of a multivariate distribution are moved to extremes, the boundary to the boundary, the level lines to level lines, etc.

Non-stationary bias correction
Climate models offer a valuable tool to study future realistic climate trajectories. Climate model outputs of the present period need to be bias corrected with respect to current observations. Future climate simulations also need to be adjusted. However, no observation is available for the future and clear assumptions have to be made to correct simulations for future periods. Table 1 displays the basic framework of bias correction. Future unobserved data, say Y 1 , should be inferred from the current reference vector, Y 0 , and two numerical runs, one in the present, say X 0 , and one in the future, say X 1 . Period 0 is called the calibration period, and period 1 the projection period. In the univariate case, denoting F i (G i ) the CDF of X i (Y i ), the CDF-t (CDF transform) method of Michelangeli Figure 2. Estimation of the unobserved random variable Y 1 . The random variables X 0 , X 1 and Y 0 are known. Plans γ and ϕ are the optimal joint laws in the sense of Eqs.
(2)-(4). ϕ is the evolution of Y 0 estimated from γ and ϕ. OTC is used to correct X 1 with respect to the estimation of Y 1 . et al. (2009) assumes that Recombining Eq. (5), the CDF of Y 1 is given by G 1 = G 0 •(F 0 ) −1 •F 1 , and can be used to perform a quantile mapping correction. Here, the fundamental hypothesis T Y 0 ,Y 1 = T X 0 ,X 1 means that the transfer functions to capture the temporal changes are identical in the model and observational worlds.
CDF-t learns the change between X 0 and X 1 , and transfers it to Y 0 to estimate Y 1 . In the multivariate case, following CDF-t, we want to learn the evolution (i.e., the change or the temporal evolution) between X 0 and X 1 , and apply it to Y 0 . This generates Y 1 , and OTC can then be applied between X 1 and Y 1 . Note also that the reverse hypothesis T Y 1 ,X 1 = T Y 0 ,X 0 could be considered, meaning that the bias is learned, and transferred along the dynamic. In this case, the correction of example given in Sect. 3 does not correspond to the reference (not shown), so we rejected this assumption. Thus, our definition of non-stationary bias correction assumes a transfer of the evolution of the model to the observational world. Indeed, climate change is one of the main signals that we want to account for in the projected corrections. However, the change in the observations can be different, and therefore the resulting corrections can also be different from observations. Nevertheless, this methodology is justified because different simulations can have different evolutions; e.g., the four RCP scenarios provide four different simulations, giving four different corrections. This is also true for different climate models, which can show different changes. This information is therefore kept in the corrections.
Using OTC, we define two optimal plans: the optimal plan γ , between X 0 and Y 0 , and the optimal plan ϕ, between X 0 and X 1 . The law γ is the bias between X 0 and Y 0 , whereas ϕ is the evolution between X 0 and X 1 . Our goal is to move ϕ along γ , defining a plan ϕ, to estimate Y 1 as the evolution of Y 0 , i.e., Y 1 = ϕ(Y 0 ). Then, we correct X 1 with respect to Y 1 = ϕ(Y 0 ), with the OTC method. This is summarized in Fig. 2.
The estimation of ϕ is performed in three steps: 1. transformation of ϕ into a collection of vectors, 2. transferral of these vectors along γ and 3. adaptation of these vectors to Y 0 .
To illustrate our methodology, Fig. 3 shows an example where the random variables X 0 , X 1 and Y 0 follow a bivariate Gaussian law. They are, respectively, centered at (0, 0), (10, 0) and (0, 10), with covariance matrices 4 × Id 2 , Id 2 /4 and Id 2 /4 (the matrix Id d is the d-dimensional identity matrix). Without loss of generality, we write the empirical distribution of X 0 , X 1 and Y 0 as a sum of Dirac masses, -Step 1. Transformation of ϕ. Using the OTC method, ϕ moves the bin c i ofP X 0 to the bin c k ofP X 1 . The vector v ik := c k − c i represents the evolution from c i to c k (i.e., the local evolution between X 0 and X 1 ). The collection of vectors v ik is an estimation of the process between X 0 and X 1 . In Fig. 3, the red arrow is an example of vector v ik . - Step 2. Transfer along γ . Using the OTC method, γ moves the bin c i ofP X 0 to the bin c j ofP Y 0 . Thus, the estimation of ϕ could be defined by the vector v ik applied to c j ; i.e., a realization of Y 1 is given by c j + v ik . The grey arrow in Fig. 3a depicts this operation. But the v ik can cross, and the correction is not coherent. This is due to normalizing issues and because the collection of vectors v ik applied to Y 0 does not define an optimal transport plan. The standard deviation decreases between X 0 and X 1 , whereas it increases between Y 0 and Y 1 in our example. Furthermore, the quantiles are inverted in this example (low values are moved to high values). Consequently, we have to adapt the vectors v ik toP Y 0 .
-Step 3. Adaptation of v ik . To solve this problem, we introduce a matrix factor D, which rescales the collection of vectors v ik . In the univariate case, Bürger et al.
(2011) proposed a factor σ Y 0 σ −1 X 0 , where σ is the standard deviation. The idea is to remove the scale of X 0 and to replace it by the scale of Y 0 . Bárdossy and Pegram (2012) and Cannon (2016)   In each panel we have a Gaussian law centered on (0, 0) with covariance 4Id 2 (P X 0 ), a Gaussian law centered on (10, 0) with covariance 1/4Id 2 (P X 1 ) and a Gaussian law centered on (0, 10) with covariance 1/4Id 2 (P X 1 ). The red arrow is the local evolution betweenP X 0 andP X 1 . (a) The probability distributionP Y 1 is the correction with OTC-t and D = Id 2 . The grey arrow is the estimation of the evolution ofP Y 0 . (b) The probability distributionP Y 1 is the correction with dOTC and D given by Eq. (6). The grey arrow is the estimation of the evolution ofP Y 0 . Biased random variable X 0 (red) and correction Z 0 with OTC (green). (c) Biased random variable X 0 (red) and correction Q 0 with quantile mapping (green). (d) Biased random variable X 1 (red) and references Y 1 (blue) for time period 1. (e) Biased random variable X 1 (red) and correction Z 1 with dOTC (green). (f) Biased random variable X 1 (red) and correction Q 1 with CDF-t (green). and Cho( ) its Cholesky decomposition, we multiply (in a matrix sense) v ik by the following matrix: The Cholesky decomposition only exists if is symmetric and positive-definite. Some covariance matrices do not have this property, e.g., highly correlated random variables. In such a case, must be slightly perturbed to be positive-definite (see, e.g., Higham, 1988;Knol and ten Berge, 1989). Furthermore, the Cholesky decomposition can be poorly estimated if the number of available data is too small compared to the dimension. Indeed, the inverse of a covariance matrix is highly biased. In this case, a pragmatic solution is to replace the matrix D by the diagonal matrix of the standard deviation, i.e., D = diag(σ Y 0 σ −1 X 0 ).
Finally, a realization of Y 1 is given by c j + D · v ik . Figure 3b shows an estimation of Y 1 . Visually, the shape of Y 1 appears coherent with the evolution between X 0 and X 1 . The mean of Y 1 is (2.53, 10). The standard deviation between X 0 and X 1 is divided by 4. The mean shift between X 0 and X 1 is (10, 0). This shift of 10 units is correctly taken into account in the rescaling of Y 0 by the standard deviation (equal to 4) between X 0 and X 1 : (2.53, 10) The value of the covariance matrix of Y 1 is Y 1 0.018 × Id 2 . It is close to the expected value (1/4)/16 × Id 2 0.015 × Id 2 . The shift of 10 units of the model is not followed. It is interpreted as a correction of the bias into the evolution of the model. However, depending on the hypotheses desired by the user, the dOTC method can easily provide corrections whose mean evolutions and trends are in agreement with those given by the simulations to be corrected, like in the EDQM bias correction method (Li et al., 2010). The complete method of correction is summarized in Algorithm A2. We refer to it as dOTC (dynamical optimal transport correction).
We first propose evaluating OTC and dOTC on an idealized case.

Model and methodology
To evaluate our bias correction method, we construct an idealized biased case, based on the Lorenz (1984) model. This three-dimensional system is generated by the differential equations The function ψ(t) is a linear forcing proposed by Drótos et al. (2015). Classically, ψ also contains a seasonal cycle (Lorenz, 1990), where the length of a "year" is fixed at t = 73 time units. Here we integrate this equation for the following forcing between 0 and 7 × 73 (i.e., 7 "years" of integration): The integration is performed with a Runge-Kutta (order 4) scheme with a time step of size 0.005. All trajectories of the Lorenz (1984) model converge on a unique subset of R 3 (called an attractor), and remain trapped on it. According to Drótos et al. (2015), the first 5 "years" correspond to the time required to trap the trajectories. One realization of random variable Y 0 (Y 1 ) is year 6 (year 7). Each year contains 14 600 (= 73/0.005) elements. According to Eq. (8), the linear forcing is applied during year 7. The non-stationarity is induced by the change between the two time periods.
We introduce a bias by multiplying each point of the trajectories by a triangular matrix S, and add a vector m, i.e., X = SY + m. The addition changes the mean, whereas the multiplication alters the covariances. The matrix S is chosen empirically such that the covariance matrices of X 0 , X 1 , Y 0 and Y 1 differ. We fix The random variables X and Y are plotted in Fig. 4a, d. The blue (red) curve of Fig. 4a is the trajectory of Y 0 (X 0 ). The mean is largely altered. We estimate the covariance matrices aŝ We estimate the empirical distributions P Y 0 , P Y 1 , P X 0 and P X 1 with a three-dimensional histogram. We cut a large cube around the trajectories into cells of size 0.2 × 0.2 × 0.2. Then we count the number of points in each cell.
Finally, we evaluate the quality of the correction by comparing the covariance matrices of Y 0 and X 0 , and the covariance matrices of Y 1 and X 1 .

Correction of the biased Lorenz (1984) model
We apply our method to correct X 0 and X 1 . The random variable X 0 is corrected with respect to Y 0 and using the OTC method. The random variable X 1 is corrected with respect to the estimation of Y 1 , coming from the dOTC method. The resulting random variables Z 0 and Z 1 are given in Fig. 4b, e. We show in Fig. 4c, f a univariate correction with quantile mapping for the period 0, generating the random variable Q 0 .
The same is shown for CDFt, period 1 and the random variable Q 1 .
The correction Z 0 is visually very similar to the reference in blue in Fig. 4a. The covariance matrix is almost perfectly reproduced: The correction Z 1 is depicted in green in Fig. 4d. It is visually hard to compare to Fig. 4b, but we recognize Y 1 . The covariance matrix is correctly rectified: Finally, the cost of transformation, given by Eq.
(1), of Z 1 into Y 1 is 93 % smaller than the cost between Y 1 and X 1 ; i.e., P Z 1 is more similar to P Y 1 than P X 1 . Furthermore, if we replace the Cholesky matrix of dOTC by the matrix of standard deviation, the maximum difference between covariance matrices increases to 0.22, but the cost is 85 % smaller. Thus, using the standard deviation slightly degrades the correction.
However, visually, it is very hard to distinguish the corrections with the Cholesky matrix or the standard deviation matrix. The figure corresponding to Fig. 4 with the standard deviation matrix is given in the Supplement. By contrast, Q 0 and Q 1 , depicted, respectively, in Fig. 4c, f, do not reproduce Y 0 and Y 1 . Thus, the multivariate correction is largely better than the univariate correction. This is confirmed by the covariance matrices, which reproduce exactly the covariances of X 0 and X 1 : We have performed a tri-variate correction on a nonlinear system exhibiting non-standard probability measures (i.e., non-Gaussian, non-exponential). In the stationary case, the OTC method works almost perfectly. In the nonstationary case, the dOTC method produces a probability distribution closed to the expected result. We propose now to apply OTC and dOTC to climate model simulations.

Data
The dataset used as a reference for the bias correction (BC) is the Systeme d'Analyse Fournissant des Renseignements Atmospheriques a la Neige (SAFRAN, Vidal et al., 2010) reanalysis. SAFRAN is a hourly reanalysis over France between 1958 and present, with a horizontal resolution of 8 km × 8 km. Quintana-Seguí et al. (2008) claimed that the daily mean of the surface atmospheric temperature (tas) and precipitation (pr) presents no bias compared to observations from the climatological database of Météo-France. This justifies the use of SAFRAN as a reference. We test our multivariate BC method on a simulation of the Weather Research and Forecast (WRF) atmospheric model (Skamarock et al., 2008) performed within the EURO-CORDEX initiative (Vautard et al., 2013;Jacob et al., 2014) with a 0.11 • × 0.11 • horizontal resolution. The boundaries of the simulation were forced by a historical simulation of the Institut Pierre-Simon Laplace (IPSL) coupled model (Marti et al., 2010;Dufresne et al., 2013). This EURO-CORDEX historical simulation will be called "WRF" in the following.
SAFRAN and WRF data are re-mapped onto the same grid, with a spatial resolution of 0.11 • × 0.11 • (i.e., ∼ 12 km × 12 km). The nearest neighbor interpolation is used. We only keep the land region comprised in 1.8-7.85 • E × 41.8-45.2 • N, i.e., covering the southeast of France. This region is characterized by a complex topography, which creates a strong spatial heterogeneity, especially for precipitation. For the present application, we extract 12 grid points regularly spaced (see Fig. 5a), with a one-to-one spatial correspondence between SAFRAN and WRF.
In both datasets, we will consider daily surface air temperatures and precipitation. The goal of this section is to correct the bias in tas and pr in the WRF data with respect to SAFRAN.

Cross-validation protocol
We focus on the daily timescale over the 1970-2000 period. We correct the warm season (May-September). The analysis and conclusions are available for the cold season, and the corresponding figure (i.e., Fig. 5) is given in the Supplement. We split that period into two sub-periods, 1970-1985 (2295 days) and 1985-2000 (2295 days), to perform a cross-validation. The SAFRAN (WRF) values over the first time period correspond to the random variable Y 0 (X 0 ), and are called the calibration period. The SAFRAN (WRF) values over the second time period correspond to Y 1 (X 1 ), and are called the projection period. SAFRAN during 1985-2000 (i.e., Y 1 ) is assumed to be unknown, and is used for crossvalidation.
1. For univariate correction, quantile mapping is used for the calibration period, and CDF-t for the projection period.
2. For 24-variate correction, OTC is used for the calibration period, and dOTC for the projection period. The spatial structure and the dependence between the two variables are used. Due to the dimension, the Cholesky matrix is poorly estimated. We replace it by the matrix of standard deviation in the rescaling step.
We estimate the empirical distributions by computing histograms with bins of size 0.1 in each dimension. Furthermore, CDF-t and dOTC can shift close to 0 values to negative values for precipitation. Thus, negative precipitation values are replaced by 0 after correction. We test the quality of the correction by plotting the evolution of the mean, the standard deviation, and the spatial and inter-variable covariance, i.e., the difference between the projection and calibration periods. These indicators are summarized in Fig. 5. During the calibration period, the goal is that the probability distribution of correction of the WRF simulation will be the probability distribution of SAFRAN. By construction of OTC, the correction is almost perfect, and we focus on the projection period. In the projection period, the goal is that the evolution of corrections will be close to the evolution of the WRF simulation.

Evolution analysis
As we have seen in the previous section, the corrections of X 1 and Y 1 are identical only if the evolution of SAFRAN is identical to the evolution of WRF. To analyze the evolution of WRF, SAFRAN and the corrections, we compute the difference of statistical indicators between the projection and the calibration period at each grid point. The indicators are the mean (Fig. 5b, f), the variance (Fig. 5c, g), the covariance between pr and tas (Fig. 5e) and the spatial covariance for each variable (Fig. 5d, h). The x axis of Fig. 5a-h is the evolution of the correction (i.e., E(Z 1 ) − E(Z 0 ),. . . ). The y axis of Fig. 5a-h is the evolution of WRF in red (i.e., E(X 1 ) − E(X 0 ),. . . ), and the evolution of SAFRAN in blue (i.e., E(Y 1 )−E(Y 0 ),. . . ). Furthermore, the red line is the linear regression between the evolution of the 24-variate correction and the evolution of WRF. The correlation (r-value), p-value and standard error of each linear regression are summarized in Table 2.
The linear regression between evolution of 24-variate correction and evolution of WRF (red line) shows a strong statistical link for all statistical indicators. The evolution of the mean is almost perfectly reproduced for the two variables (rvalues is at least equal to 0.98, with a maximal p-value at 10 −9 ). The evolution of variance of WRF is also reproduced, the linear regression being significant (maximal p-value is 5 × 10 −2 ).
The evolution of dependence structure is given by the evolution of spatial and inter-variables covariance. The minimal r-value for linear regression is equal to 0.59 with a maximal p-value equal to 2×10 −3 . This means that dOTC reproduces the evolution of WRF between calibration and projection period. Because the calibration period is perfectly corrected, the correction during projection period appears as the evolution of WRF, applied to SAFRAN.
A linear regression, the Spearman rank correlation between the evolution of SAFRAN, and the evolution of the correction with WRF do not show a significant statistical link (not shown). We conclude that the evolution of WRF is dif-ferent of the evolution of SAFRAN. This indicates it is not possible to reproduce SAFRAN during projection period using dOTC and WRF. For example, WRF predicts an increase between 0.2 and 0.4 K of the mean temperature, whereas SAFRAN gives an increase between 0.2 and 1 K.
The correction with CDF-t appears to be satisfactory for the temperatures, and very similar to the correction with dOTC. But for the precipitation, the structure is not coherent with WRF or SAFRAN. This dissimilarity is due to the difference between the probability distribution of temperatures (quasi-Gaussian) and precipitations (exponential/Gamma laws).
We conclude that the evolution of the 24-variate correction with dOTC between calibration and projection periods is close to the evolution of WRF. Furthermore, the evolution of SAFRAN is very different from the evolution of WRF. In particular, this example illustrates how the classical crossvalidation methodology does not differentiate the variations of SAFRAN and WRF, and that the correction can not be compared to the reference during the projection period.

Conclusions
We have developed a new method for multivariate bias correction, generalizing the quantile mapping in the multivariate case. To do so, we have developed a new theoretical framework to understand any bias correction (BC) method: any BC method is here characterized by a joint law between the biased dataset and the correction. This joint probability distribution is estimated based on optimal transport techniques, and the BC method is then referred to as optimal transport correction (OTC). A definition of non-stationary bias correction is also proposed: the evolution of the model is learned and transferred to the reference world. An extension of OTC called dynamical OTC (dOTC) has been developed to account for temporal non-stationarities.
OTC and dOTC methods have been tested on an idealized three-dimensional case based on Lorenz (1984) timedependent attractors, which induced changes in the correlation between variables. The bias correction appeared to perform very well in those idealized experiments.
Then, 12 grid points of a WRF simulation have been corrected with respect to SAFRAN reanalyses for precipitation and temperature in Southern France. A 24-variate correction was performed. The correction in stationary context was almost perfect. In the non-stationary case, the evolutions of WRF and SAFRAN were different, and, as expected, the correction with dOTC differed from SAFRAN. However, the correction presented a multidimensional evolution similar to that of WRF. We can therefore conclude that the correction is consistent with the definition proposed for the non-stationary case. This is consistent with the results of Maraun et al. (2017): the fundamental errors of a model are not corrected, but transferred to the world of observations. The dOTC method preserves the signal of climate change inferred from the model simulations. As suggested by Maraun and Widmann (2018), our cross-validation method does not compare the correction to the observations on the validation period, which can produce false positive or true negative due to internal variability of model or observations, but assesses whether the statistical evolution of the model is kept.
Furthermore, although the number of available data is very small compared to the dimensions (2295 days and 24 dimensions), the OTC and dOTC performed a correction without numerical problems, and, moreover, only in a few minutes on a personal computer.
The methods OTC and dOTC are able to correct the dependence structure (i.e., the joint law), and not only the intervariable and spatial correlations. In particular, the copula function (which contains the information about dependence) is corrected. In addition, dOTC proposes a definition of nonstationarity, and explicitly gives what the correction corresponds to (the evolution of the model applied to observations). In the particular case of the temperatures/precipitation correction, compared to, e.g., Piani and Haerter (2012) and Räty et al. (2018), the correction is at least as good during the calibration period, although the comparison is not done over the projection period, because the indicators are different.
As a perspective of improvement of the method, we note that the optimal plan can only be used to correct data points that are already known. If a new data point is obtained, and alters the estimate of the probability density function, then the plan needs to be recomputed. However, such a situation is relatively rare in bias correction. Indeed, the corrections usually have to be performed on climate model simulations that cover many years and decades. This means that the whole time series are available at once and are not continuously updated. One possibility would be to "smooth" the optimal plan that, thus, could be applied to new points without recalculating the plan. Finally, a promising application of this method is the post-processing of operational forecasts. In such a case, the question of internal variability (Maraun et al., 2017) would not affect the bias correction procedure as climate dynamics is consistently represented between the model and observations. Code and data availability. OTC and dOTC are implemented in two packages: ARyga (R) and Apyga (python3). These packages are available at https://github.com/yrobink/Ayga.git (Robin, 2019). The example of Sect. 3 is given in Apyga. SAFRAN and EURO-CORDEX data are, respectively, available at: http://www. drias-climat.fr (last access: 29 January 2019) and https://www. euro-cordex.net (last access: 29 January 2019).
Author contributions. YR performed the analyses. The experiments were co-designed by YR and MV. All the authors contributed to writing the manuscript.
Competing interests. The authors declare that they have no conflict of interest.