Stochastic bias correction of dynamically downscaled precipitation fields for Germany through Copula-based integration of gridded observation data

Dynamically downscaled precipitation fields from regional climate models (RCMs) often cannot be used directly for regional climate studies. Due to their inherent biases, i.e., systematic overor underestimations compared to observations, several correction approaches have been developed. Most of the bias correction procedures such as the quantile mapping approach employ a transfer function that is based on the statistical differences between RCM output and observations. Apart from such transfer functionbased statistical correction algorithms, a stochastic bias correction technique, based on the concept of Copula theory, is developed here and applied to correct precipitation fields from the Weather Research and Forecasting (WRF) model. For dynamically downscaled precipitation fields we used high-resolution (7 km, daily) WRF simulations for Germany driven by ERA40 reanalysis data for 1971–2000. The REGNIE (REGionalisierung der NIEderschlagshöhen) data set from the German Weather Service (DWD) is used as gridded observation data (1 km, daily) and aggregated to 7 km for this application. The 30-year time series are split into a calibration (1971–1985) and validation (1986–2000) period of equal length. Based on the estimated dependence structure (described by the Copula function) between WRF and REGNIE data and the identified respective marginal distributions in the calibration period, separately analyzed for the different seasons, conditional distribution functions are derived for each time step in the validation period. This finally allows to get additional information about the range of the statistically possible bias-corrected values. The results show that the Copula-based approach efficiently corrects most of the errors in WRF derived precipitation for all seasons. It is also found that the Copula-based correction performs better for wet bias correction than for dry bias correction. In autumn and winter, the correction introduced a small dry bias in the northwest of Germany. The average relative bias of daily mean precipitation from WRF for the validation period is reduced from 10 % (wet bias) to −1 % (slight dry bias) after the application of the Copula-based correction. The bias in different seasons is corrected from 32 % March–April–May (MAM), −15 % June–July–August (JJA), 4 % September–October–November (SON) and 28 % December–January–February (DJF) to 16 % (MAM),−11 % (JJA), −1 % (SON) and −3 % (DJF), respectively. Finally, the Copula-based approach is compared to the quantile mapping correction method. The root mean square error (RMSE) and the percentage of the corrected time steps that are closer to the observations are analyzed. The Copula-based correction derived from the mean of the sampled distribution reduces the RMSE significantly, while, e.g., the quantile mapping method results in an increased RMSE for some regions. Published by Copernicus Publications on behalf of the European Geosciences Union. 1788 G. Mao et al.: Copula-based bias correction for RCM precipitation

of the statistically possible bias-corrected values.The results show that the Copula-based approach efficiently corrects most of the errors in WRF derived precipitation for all seasons.It is also found that the Copula-based correction performs better for wet bias correction than for dry bias correction.In autumn and winter, the correction introduced a small dry bias in the northwest of Germany.The average relative bias of daily mean precipitation from WRF for the validation period is reduced from 10 % (wet bias) to −1 % (slight dry bias) after the application of the Copula-based correction.The bias in different seasons is corrected from 32 % March-April-May (MAM), −15 % June-July-August (JJA), 4 % September-October-November (SON) and 28 % December-January-February (DJF) to 16 % (MAM), −11 % (JJA), −1 % (SON) and −3 % (DJF), respectively.Finally, the Copula-based approach is compared to the quantile mapping correction method.The root mean square error (RMSE) and the percentage of the corrected time steps that are closer to the observations are analyzed.The Copula-based correction derived from the mean of the sampled distribution reduces the RMSE significantly, while, e.g., the quantile mapping method results in an increased RMSE for some regions.

Introduction
Most climate studies operate on a regional and local scale.Global climate models (GCMs), however, provide climatological information only on coarse scales, usually in a horizontal resolution of 100-300 km.Since they are not able to mimic the regional-and local-scale climate variability, further refinement is necessary.For dynamical downscaling, regional climate models (RCMs) are capable of bridging the gap between large-scale GCM data and local-scale information to conduct climate studies.Nevertheless, the RCM simulations usually do not agree well with observations even if downscaled to high spatial resolutions (Smiatek et al., 2009;Teutschbein and Seibert, 2010).Thus, they might not be useful for deriving hydrological impacts on local scales directly (Graham et al., 2007a, b;Christensen et al., 2008;Bergström et al., 2001).Therefore, further bias correction is often required.The impacts of biases on hydrological and agriculture modeling has been studied extensively (e.g., Kunstmann et al., 2004;Baigorria et al., 2007;Ghosh and Mujumdar, 2009;Ott et al., 2013).Precipitation is an important parameter in climate studies (Schmidli et al., 2006).RCMs tend to generate too many wet days with small precipitation amounts (Schmidli et al., 2006;Ines and Hansen, 2006).In addition, RCMs often contain under-and overestimations of rainfall as well as incorrect representations of the seasonality (Terink et al., 2010).Therefore, several bias correction methods have been developed.These methods range from simple scaling approaches such as the linear scaling approach (e.g., Lenderink et al., 2007) and local intensity scaling (e.g., Schmidli et al., 2006) to methods like quantile mapping (e.g., Ines and Hansen, 2006).Bias correction techniques usually employ the use of a transfer function that is based on the statistical differences between observed and modeled climate variables to adjust the modeled data under the assumption that these functions are stationary.A recent overview of bias correction methods for hydrological application is provided, e.g., by Teutschbein and Seibert (2012) and Lafon et al. (2013).
In this study, a Copula-based stochastic bias correction method is applied to correct each individual time step of a RCM simulation.This is different to the traditional transfer function-based statistical correction approaches.The strategy of this method is the identification and description of the underlying dependence structures between observed and modeled climate variables (precipitation) and its application for bias correction.It is known that the traditional measures of dependence (e.g., Pearson's correlation coefficient) can only capture the strength of the linear dependence as a single global parameter.Alternatively, Copulas are able to describe the complex nonlinear dependence structure between variables (Bárdossy and Pegram, 2009).Based on the identified dependence structure between observed and modeled precipitation and the identified respective marginal distributions, a set of realizations is finally obtained through Monte Carlo simulations.
Recently, Copulas are used for various applications in hydrometeorology (e.g., Gao et al., 2007;Serinaldi, 2008;van den Berg et al., 2011;Bárdossy and Pegram, 2012).Copula-based bias correction techniques have been originally introduced by Laux et al. (2011) and Vogl et al. (2012), and are extended in this study by investigating gridded precipitation fields instead of individual and unevenly distributed stations.The Copula models are estimated for each single grid cell instead of choosing the most dominant model for the whole domain.Bayesian Information Criterion (BIC) is implemented in addition to Kolmogorov-Smirnov test (K-S test) for marginal distribution goodness-of-fit test, as the large sample size makes the K-S test highly sensitive.The performance of the correction method is analyzed for different seasons to investigate seasonal variability.This study is based on data for a 30-year time period  of highresolution (7 km) dynamical downscaled precipitation fields using the Advanced Research Weather Research and Forecasting (WRF-ARW) model (Berg et al., 2013).REGNIE data from the German Weather Service (DWD) were used as the gridded observation data source.To achieve the same resolution, the REGNIE data are aggregated to 7 km.In the calibration period, only positive pairs (both REGNIE and WRF data indicate precipitation) are used to calibrate the model.Therefore, in the validation period only the days that belong to positive pairs are corrected and the other days are kept the same as the original WRF data.It is important to state that this study focuses on the bias correction of the RCM simulations of the past exclusively and does not deal with downscaling or bias correction of precipitation for future periods.Application for future climate scenarios requires further extensions.
The article is structured as follows: in Sect. 2 the data sets for this application are introduced.Section 3 briefly describes the basic theory of Copulas and the procedure of Copulabased conditional simulations to correct RCM precipitation.Results of application of the Copula-based approach for Germany are shown in Sect.4, followed by the summary and conclusions (Sect.5).

RCM data
Dynamically downscaled precipitation fields over Germany from a RCM simulation (Berg et al., 2013) with the nonhydrostatic WRF-ARW model (Skamarock et al., 2008) are used.For this data set, the WRF-ARW simulations are forced by ERA40 reanalysis data (Uppala et al., 2005) from 1971 to 2000 at the boundaries which implies large-scale circulation close to observations.Due to the coarse resolution of the GCM, a double-nesting approach is applied in Lambert conformal map projection.The coarse nest extends over all of Europe (42 km) and the fine nest covers Germany and the near surroundings (7 km).The model uses 40 vertical levels for both nests.For further details (e.g., parameterization schemes) on the applied WRF-ARW setup, we refer to Berg et al. (2013) and the references listed therein.

Observational data
As observations, we used the 1 km gridded daily data set REGNIE (DWD, 2011) from the DWD.The REGNIE product is available for the whole of Germany from 1951 to the present and the number of underlying stations is approximately 2000 stations.The statistical gridding approach of station data is based on the spatial interpolation of anomalies compared to long-term mean values.For the background climatological field a multi-linear regression approach is applied where the geographical position, elevation and wind exposure of the stations are taken into account.For the calculation of the daily precipitation fields, station values are first assigned to a grid point and divided by the background data to calculate anomalies.The anomalies are spatially interpolated using inverse distance weighted interpolations, and the results are finally multiplied by the background field.For the grid-cell-based bias correction, the 1 km REGNIE data set is up-scaled and remapped to the 7 km WRF grid such that precipitation amounts are conserved.Also, the time period is kept the same as WRF output .

Methodology
In this section the fundamentals of Copula theory are briefly summarized.Details about Copula theory are given, e.g., in Nelsen (1999).The basis of the Copula-based bias correction algorithm used in this study is a bivariate Copula model that allows for modeling the dependence structure between WRF and REGNIE data.The Copula model consists of two respective marginal distributions and a bivariate Copula function and is then used to generate bias corrected WRF data by conditional stochastic sampling.Details about the bias correction algorithm are described below.In the following section, X and Y refer to REGNIE and WRF data sets, respectively.

Copula theory
Let (X, Y ) be a pair of random variates with a realization (x, y) and the bivariate joint distribution F XY (x, y).Following Sklar's theorem (Sklar, 1959), a unique function C (Copula) exists such that where u = F X (x) and v = F Y (y).
The Copula functions provide a functional link between the two univariate marginal distributions F X (x), F Y (y).As the Copula function allows for modeling the pure dependence between the two variates X and Y , it is rather flexible to describe their relationship with full freedom to the choice of the univariate marginal distributions.This is especially advantageous in cases, where the dependence structure between the variates is too complex to be modeled by a multivariate Gaussian distribution, as it is often the case for hydrometeorological variables (Salvadori and Michele, 2007;Dupuis, 2007).

Copula models
As a consequence of Sklar's theorem, each complex and unknown joint distribution F XY (x, y) can be estimated by assuming specific parametric functions for F X , F Y and C in Eq. ( 1).The bivariate Copula model of the variates X and Y consists of two univariate parametric marginal distributions (F X (x) and F Y (y)) and a theoretical parametric Copula function C θ (u, v) that can be estimated separately based on the realizations x, y. Figure 2 visualizes the process of estimating a Copula model with a bivariate exemplary data set, i.e., realizations (x, y) of the two random variates X and Y .
A scatter plot of the two realizations (x, y) is shown in Fig. 2 (left panel).The Copula model for the data set consists of two marginals and the theoretical Copula.Therefore, the first step is to estimate the theoretical univariate distribution functions for the two variates X and Y (see Fig. 2, middle panel).
The next step is to estimate the theoretical Copula function C θ (see Fig. 2, right panel).Finally, the unknown joint distribution F XY (x, y) is fully determined by the marginal distributions and the Copula function, i.e., the dependence structure itself.Figure 2 visualizes the fact that different marginal distributions and theoretical Copula functions can be combined independently allowing to model highly complex interdependencies between the variables X and Y .This is especially beneficial if these interdependencies are nonlinear, asymmetric or the data show heavy-tail behavior.

Marginal distribution estimation
The Copula-based modeling of the dependence between X and Y requires the fitting of suitable marginal distributions for both data sets (REGNIE and WRF) for each grid cell.Generally, both non-parametric and parametric fitting approaches for the local precipitation distribution are found in the literature (Dupuis, 2007;Gao et al., 2007;Bárdossy and Pegram, 2009;van den Berg et al., 2011).In this study a parametric fitting of the precipitation distribution is applied also to allow for an illustration of the spatially distributed differences (provided as the fitted marginal distribution family maps) between WRF and REGNIE.This gives additional valuable information about differences in their statistical properties.
In this study, five different parametric distribution functions are tested (Weibull, gamma, normal, generalized Pareto and exponential).For all time series (REGNIE and WRF), the parameters of the respective distribution functions are estimated by a standard maximum likelihood estimation (MLE).The goodness-of-fit is evaluated in a two-stage process.Firstly, a K-S test is applied (Massey, 1951).As the K-S test is highly sensitive due to the large sample sizes (Serinaldi, 2008), the null hypothesis (the sample comes from the selected distribution) is rejected in some cases for all of the candidates.In other cases there might be more than one possible candidate for the best fit.For that reason, all candidates which are accepted by the K-S test are further inspected by using the BIC (Weakliem, 1999).If all of the candidates are rejected by the K-S test, only the BIC is relevant for the selection of the best fit.
The BIC selects the optimum within a finite set of models.It is based on the likelihood function and deals with the trade-off between the goodness-of-fit of the model and its complexity: where k denotes the number of the free parameters of the model, n is the sample size and L is the maximized value of the likelihood function of the estimated model.The smallest value of the BIC suggests the best fitting distribution.

Copula function estimation
The Copulas from different families describe different dependence structures.To increase the accuracy of the description of the dependence, different types of Copulas are considered, since one common Copula might be incapable of capture the dependence structure for all grid cells over the entire study area and for all seasons.Copulas In this study, four different one-parametric Copulas (see Table 1) are selected: the Gumbel Copula is able to describe an upper tail dependence structure, while the Clayton Copula allows one to express higher probability in the lower tail.The Frank Copula exhibits no tail dependence, and the Gaussian Copula describes a similar dependence as the Frank Copula, but with slightly higher densities in the lower and upper tails (Venter, 2002;Schmidt, 2007).For the Clayton Copula, the formulas of positive and negative dependence are different.The parameter θ can take values of −1 < θ < 0 to model negative dependence.In that case the formula is For the data used in this study, negative dependences could not be found.Therefore, it is θ ∈ [0, ∞] and the Clayton Copula is defined as described in Table 1.
For the Copula goodness-of-fit test we closely follow the approach as described in Laux et al. (2011) and Vogl et al. (2012).For brevity it is briefly summarized.
Since the dependence structure, i.e., the theoretical Copula function, between X and Y is in general not known in advance, the empirical Copula that can be calculated from the data is analyzed (Deheuvels, 1979).Let {r 1 (1), . .., r 1 (n)} and {r 2 (1), . .., r 2 (n)} denote the rank space values that are derived from the fitted theoretical marginal distributions.Then, the empirical Copula, a rank-based estimator of C θ , is defined as where u = F X (x), v = F Y (y) and 1(. . . ) is denoting the indicator function and n being the sample size.A visual inspection of C n allows one to choose promising candidates out of the set of available theoretical parametric Copula functions.
To estimate the unknown parameter θ ∈ R for each candidate, a MLE approach is used.To decide which Copula function is able to describe the dependence structure best, different goodness-of-fit tests (e.g., Genest and Rémillard, 2008;Genest et al., 2009) are available.In this study the goodness-of-fit test is based on the Cramér von Mises statistic (Genest and Favre, 2007), where the empirical Copula C n is compared to the parametric estimate C θ : The specific parametric bootstrap procedure to obtain the approximate P value is described by Genest et al. (2009).

Copula-based bias correction
The Copula-based bias correction applied for this study is based on the estimation of a Copula model for each pair of observed (X) and modeled (Y ) rainfall for each grid cell.
As soon as this Copula model (F X (x), F Y (y) and C θ (u, v)) is estimated, conditional random samples can be generated through Monte Carlo simulations (Gao et al., 2007;Salvadori et al., 2007).The procedure follows the algorithm detailed in Laux et al. (2011) and Vogl et al. (2012) to generate pseudoobservations conditioned on the modeled data.We purposely conditioned the RCM data since the method is the first step for correcting future climate projection (where no observations are available).This conditional simulation algorithm is based on a conditional distribution of the form: The complete Copula-based bias correction algorithm consists of the following steps: 1. estimate the theoretical marginal distributions F X (x) and F Y (y) for observation and RCM data, respectively; 6. generate the pseudo-observations in the rank space for each time step by using the conditional Copula distribution; 7. transform back the random samples to the data space by using the integral transformation.
The Copula-based conditional simulation is the critical step of this bias correction approach, as it forces a certain variable (observation) to take a value when another variable (RCM) is given.To assess the uncertainty associated with this prediction, the conditional prediction process (step 6 and 7) must be repeated for a large number of times This provides the possibility to obtain a large set of random realizations and additionally gives the information of a probability density function (PDF) for each corrected time step.From the PDF the spread of the distribution in form of the inter-quantile range can, e.g., be provided as an additional uncertainty criterion for the bias correction.

Correction strategy for continuous time series
The implementation of a bias correction for precipitation (a discrete variable) is more complex than a bias correction of a continuous variable, e.g., temperature.In general four cases have to be distinguished, namely, (0,0), (0,1), (1,0) and (1,1), where 0 denotes a dry day and 1 indicates a wet day (see Fig. 3).A threshold of rainfall amount of 0.1 mm day −1 was used to identify a wet day with respect to the usual precision of rain gauges (Dieterichs, 1956;Moon et al., 1994).Therefore, the four cases are defined as follows: 1. (1,1): REGNIE and WRF precipitation ≥ 0.1 mm; 2. (0,1): REGNIE < 0.1 mm, while WRF ≥ 0.1 mm; 3. (1,0): REGNIE ≥ 0.1 mm and WRF < 0.1 mm; 4. (0,0): both REGNIE and WRF < 0.1 mm.Different approaches exist in the literature to account for the intermittent nature of rainfall.For example the truncated Copula suggested in Bárdossy and Pegram (2009) and the Copula-based mixed model described in Serinaldi (2008).Both methods are able to produce time series that statistically hold the same proportion of the four different cases (0,0), (0,1), (1,0) and (1,1).These methods allow for the correction of the total number of dry days, but do not allow one to correct individual events in the (0,1) and (1,0) cases.
In this study, we aim for an event-based correction as described in the following: the Copula-based concept focuses on the correction of the (1,1) cases, i.e., the positive pairs.In order to generate a complete bias corrected time series of WRF output, the events that are not covered by the (1,1) case are left unchanged.For the (0,0) cases, there is no error.The errors that come from the (0,1) and (1,0) cases are not corrected by this method.To justify this strategy, we investigated the proportion of the four cases in the study area (see Fig. 4): the (1,1) cases take the highest proportion, followed by the (0,0) cases.The proportion of both the (0,1) and (1,0) cases are comparatively low.The average proportion of these cases are 40 % for the (1,1) cases, 29 % for the (0,0) cases, 19 % for the (0,1) cases and 12 % for the (1,0) cases.

Estimated marginal distributions
For both REGNIE and WRF data, five different distribution functions are employed for each grid cell separately: generalized Pareto distribution (gp); gamma distribution (gam); exponential distribution (exp); Weibull distribution (wbl) and normal distribution (norm).This guarantees the flexibility in selecting the most appropriate distribution for each grid cell.The goodness-of-fit tests (K-S test and the BIC; see Sect.3.3) reject the normal distribution in all cases, while the generalized Pareto distribution is accepted most frequently for both REGNIE and WRF (Fig. 5).The result shows a reasonable agreement of selected marginal distribution between REGNIE and WRF mainly in the eastern and southern parts of Germany.The patterns of the selected types follow the topography of Germany (see Fig. 1).In the northwest of Germany, the Weibull distribution function prevails as well as in the low mountain ranges.In general, this effect is stronger for WRF while the patterns are more patchy for REGNIE.
The coincidence between REGNIE and WRF marginals is shown in the confusion matrix.Each row of the matrix represents the distribution types of REGNIE, while each column represents that of WRF (in %).The major diagonal shows the fraction of concurring marginal types.The confusion matrix for the calibration period is shown in Table 2.It is found that for 42 % of grid cells, the generalized Pareto distribution is selected for both data sources concordantly.For the Weibull distribution this holds true for 16 % of the grid cells.Since the total number of grid cells where gamma and exponential distribution are fitted is very low, the percentage of hits in the diagonal of the confusion matrix is small.Summing up the major diagonal gives a measure for the overall agreement.For the complete calibration series about 59 % correspond.The failures of 21 % of grid cells, where REGNIE follows the generalized Pareto distribution and WRF follows the Weibull distribution, are predominately located in the northwest of Germany (Fig. 5).
In order to assess the annual variability in the precipitation time series, the marginal distributions are estimated for the different seasons (spring -MAM, summer -JJA, autumn -SON, winter -DJF).
For both REGNIE and WRF data, the seasonal representation of the different distribution types is shown in Fig. 6.It indicates that the choice of the optimal marginal distribution clearly depends on the season.In WRF, the winter (summer) season is dominated by exponential (generalized Pareto).The differences for REGNIE are not that obvious since the dominant distribution type is the generalized Pareto distribution for all seasons.For WRF data the effect of the underlying elevation on the identified distribution type is most prominent during winter and fall.In the low mountain regions the favorite marginal distribution change from fall (Weibull, generalized Pareto) to winter (exponential, Weibull).The seasonal confusion matrices are shown in Table 3.The results indicate the best agreement between WRF and REG-NIE (approximately 56 % of the grid cells) in summer, while in wintertime only approximately 30 % of the types agree.

G. Mao et al.: Copula-based bias correction for RCM precipitation
As mentioned above in Sect.3.3 the goodness-of-fit tests follow a two-step process due to the fact that the K-S test is highly sensitive to large sample sizes.For the annual marginal distribution identification for 99 % of the grid cells, the K-S test fails and only the BIC is used for REGNIE, while the number for WRF is 68 %.Since the sample size is reduced in seasonal analysis, the failures of K-S test are decreased dramatically.The results are shown in Table 4.

Identified Copula functions
For each grid cell the theoretical Copula function, which characterizes the dependence structure between REGNIE and WRF data, is identified separately.Four Copulas (Clayton, Frank, Gumbel and Gaussian) are investigated by applying the goodness-of-fit tests described in Sect.3.4.Figure 7 shows the results of the goodness-of-fit tests for the calibration period for the complete study area.It is found that for most of the grid cells in the study area, the Frank Copula can capture the dependence structure best, while for the northeast of Germany the Clayton Copula provides the best fit.In total the dependence structure of 72 % of the grid cells is modeled   by the Frank, 20 % by the Clayton, 7 % by the Gaussian and only 0.09 % by the Gumbel Copula.
In order to assess the annual variability of the dependence structures between REGNIE and WRF precipitation time series, the Copula functions are identified for the different seasons separately.The corresponding results are shown in Fig. 8.
While for spring, autumn and winter the Copulas that have no pronounced tail dependence (the Frank and Gaussian Copula) dominate (spring 49 % (Frank) + 22 % (Gaussian) = 71 %, autumn 53 % + 24 % = 77 % and winter 63 % + 28 % = 91 %), in summer the Clayton Copula provides the best fit for most of the grid cells (62 %).For all seasons the Gumbel Copula is only selected for few grid cells with a maximum number of hits in spring (5 % of the grid cells).In general the differences are most prominent for winter and summer (see Fig. 8).

Validation of the Copula-based bias correction
Based on the estimated Copula model (parametric marginal distributions and theoretical Copula functions), the conditional distribution of REGNIE conditioned on WRF is derived for each grid cell separately (see Sect. 3.5).To generate bias-corrected WRF precipitation, random samples of possible outcomes are drawn from this conditional distribution.We use a sample size of 100.The result can be interpreted as an empirical predictive distribution for corrected WRF (pseudo-observations) that is determined for all conditioning WRF precipitation values for each time step.While this stochastic bias correction method gives a full ensemble and the empirical predictive distribution of corrected WRF This can be regarded as a Copula-based regression by taking such a typical value as the estimator of the derived empirical predictive distribution of corrected WRF precipitation.It is noted that this typical value (e.g., the mean) can also be directly derived from the analytical Copula-based conditional distribution (mean regression curve, Nelsen, 1999).Figure 9 exemplarily shows WRF (red), REGNIE (green) and the bias-corrected WRF (blue) data for pixel 1 in Fig. 1 during wintertime 1986-1987 (positive pairs only).The box plot visualizes the spread of the generated random sample (100 members) indicating the uncertainty of the predicted bias-corrected precipitation, while the blue line shows the median of the respective empirical predictive distribution.
It can be seen from Fig. 9 that for most of the time steps the proposed Copula-based approach can successfully correct for biases in the modeled precipitation compared to observed values.
To investigate the spatial performance of the correction algorithm, the relative bias of RCM modeled mean daily precipitation (WRF) compared to gridded observations (REG-NIE) is compared to that of the bias-corrected model data (B.C. WRF) for Germany.
A comparison of corrected WRF data derived by the expectation, median and mode of the predictive distribution with observations indicates that the correction performs best for the expectation value (see Fig. 10).Both simulations based on the median and the mode tend to underestimate the precipitation values, thus causing a dry bias over the domain.Therefore, in the following, the results are shown and analyzed for the mean only.
Figure 11 (left panel) shows the bias between REGNIE and WRF, indicating wet biases in most of the study area.These wet biases are most prominent in high elevation areas following the topography of Germany.Wet biases are also detected in the northeast of Germany, where the elevation is low.Dry biases are found in the alpine and pre-alpine areas in the southeast of Germany as well as in the west of Germany.After the application of the Copula-based correction algorithm, the wet biases are corrected for most of the domain, except for a very small region in the northeast (see Fig. 11, right panel).It is also found that the dry bias can also be significantly reduced, but small dry biases are introduced in some areas in the west of the domain.The average of the bias for the whole study area is reduced from 10 to −1 %.
A performance analysis with respect to seasonal variations is shown in Fig. 12.It shows that the relative bias is even larger for different seasons.Figure 12 (left panel) shows the relative bias between uncorrected WRF mean daily precipitation and the REGNIE data set for the different seasons (MAM, JJA, SON, DJF, from top to bottom).The WRF model tends to generate too much precipitation in spring and winter for the majority of grid cells in the study area.For summer and autumn, there are also regions found where the model is too dry.These regions are mostly located in the north and in the south of Germany.This effect is found to be strongest in summer, while in autumn areas with an overestimation of precipitation are still found in the northeast and southwest of Germany.In all cases, the bias is influenced by the underlying terrain showing an overestimation especially in regions with higher altitude.The average of the bias from spring to winter are 32, −15, 4 and 28 %, respectively.tom panels).It can be seen that the Copula-based correction efficiently removes most of the biases indicating a comparable performance for all seasons.Figure 12 especially for spring and winter indicates that the correction is tending to be more suitable to correct for overestimation of the rainfall.The underestimation of precipitation, that is most prominent in summer, however, is still significantly reduced.In autumn and winter the Copula-based correction reduces the rainfall amounts too much for the west of Germany, introducing a small dry bias in that region.The average bias are reduced to 16, −11, −1 and −3, respectively, for different seasons from spring to winter.
In the following, it is further analyzed how well the model can reproduce the intra-annual variability of observed precipitation and how the performance for the different seasons is influenced by the Copula-based correction algorithm.
To investigate typical situations in detail, the results are shown for four specific grid cells in the study area (see Fig. 1): grid cells 1 and 3 are selected as they show the highest wet bias between WRF and the REGNIE.Grid cell 2 is located in the region where a dry bias was generated by the WRF in summer and autumn and a wet bias was generated in winter.Grid cell 4 represents a case where the agreement between uncorrected model data and REGNIE observations is already good (see Fig. 12).
The results for grid cell 1 in Fig. 13 confirm the fact that the RCM model results strongly overestimate the precipitation amount in that case.The annual variability of the observations is in general reproduced, except for a strong increase of the mean precipitation in August that is not found in the observations.This behavior is found also for grid cell 3 indicating a relatively too dry summer season.For grid cells 1 and 3, the Copula-based correction is found to be able to correct for the overestimation of precipitation amounts as well as for the effect of a too strong decrease of precipitation in August.However, the correction is introducing a slight under-  estimation mainly during summer and autumn instead.For grid cell 2, the correction shows a good performance by decreasing the rainfall amounts when the RCM is overestimating, and increasing the amounts when the RCM has underestimated it.The correction reduced the wet bias efficiently, while the dry bias is corrected less efficiently.The effectiveness of this correction is also highlighted by an analysis of the results for grid cell 4.Even if the performance of WRF was already satisfactory, the algorithm was still able to further improve the results.
Finally, in order to investigate the spatial coherence of the bias-corrected precipitation fields, the sequence of three selected days (from 9 to 11 January 1986) are exemplarily shown in Fig. 14.The left column from top to bottom are the observed precipitation fields for these 3 days.In the middle are the uncorrected (original) WRF simulated precipitation fields and the right column indicates the bias-corrected WRF precipitation fields: while correcting the absolute precipitation values, the spatial coherence of the precipitation patterns are retained after the application of the bias correction.

Comparison of Copula-based bias correction to the quantile mapping method
The quantile mapping method is often used in bias correction of RCM derived precipitation (Dosio and Paruolo, 2011;Gudmundsson et al., 2012).This method corrects the bias by rescaling the values of the RCM so that the distribution of the RCM matches that of the observations.It corrects all moments of the RCM precipitation distribution under the as-sumption of a perfect dependence among the ranks.This full dependence assumption is limited: in our study area, the rank correlation between the data sets varies between 0.3 and 0.6 (see Fig. 15).The quantile mapping correction has been performed for comparison to the Copula-based approach.The RMSE between the observed (REGNIE) and bias-corrected modeled data is calculated for both the Copula-based correction and the quantile mapping method.The original RMSE (between REGNIE and WRF) is also computed as a reference.For the Copula-based approach, we calculated the RMSE for all the simulations with respect to the mean, median and mode value.The changes of the RMSE by different corrections over the study area are shown in Fig. 16.The Copulabased correction derived from the mean regression reduces the RMSE significantly with an average of −12 % over the domain.The Copula-based correction derived from the median also reduces the RMSE, but to a lesser degree.The correction derived from Copula-based mode regression reduces the RMSE, but results in an increase of the RMSE in some regions.The same holds true for the quantile mapping approach.This is in agreement with our previous results (see Fig. 10) that the Copula-based correction derived from the mean regression performs best.Therefore, in the following analyses, the results focus on the Copula-based mean regression approach.
To further assess the performance of the Copula-based method, additional performance measures are analyzed.The RMSE for different magnitudes of observed precipitation (i.e., a quantile RMSE analysis) is done for the selected four   Furthermore, we also investigated the percentage of the corrected time steps that are closer to the observations compared to the quantile mapping method.The results are shown in Fig. 18.The values indicate the percentage of the successful corrections (i.e., closer to the observations) by the two bias correction methods.It can be seen that the results of the quantile mapping correction strongly depends on the rank correlation (see Fig. 15), while the Copula-based correction provides a stable correction efficiency over the entire domain.The average percentages of the successful correction are 55 % for the Copula-based correction and 46 % for the quantile mapping correction, respectively.

Summary and conclusions
In this study, a Copula-based stochastic bias correction technique for RCM output is introduced.The strategy of this method is the identification and description of underlying dependence structures between RCM and observed precipitation and its application for bias correction.Copulas are able to capture the nonlinear dependencies between variables (between RCM and gridded observed precipitation) including a reliable description of the dependence structure in the tails of the joint distribution.This is not possible, e.g., by using a Gaussian approach or methods based on the Pearson's correlation coefficient.Yet, another albeit more practical advantage of this approach is that the univariate marginal distributions can be modeled independently from the dependence function, i.e., the Copula.This provides more flexibility to construct a correction model by combining different marginal distributions and Copula functions, as many parametric univariate distribution and theoretical Copulas are available.
The conditional distribution derived from fitted Copula model forms the basis of the correction procedure.It provides the possibility to access all the possible outcomes of the corrected value and additionally gives the information of a PDF for each corrected time step.
This study is an extension of the two former studies of Laux et al. (2011) and Vogl et al. (2012) by applying the Copula-based bias correction technique to high-resolution RCM precipitation output and a gridded observation product.Compared to those two studies, this study is based on the following framework: -The grid cell base is worked on and the Copula model (marginal distributions and Copula function) is estimated for each grid cell separately rather than selecting, e.g., the most dominant model.Therefore, the statistical characteristics of observed (REGNIE) and modeled data (WRF) and their dependence structure are visualized spatially and analyzed for the first time.
-The BIC, as well as the K-S test, is implemented for the marginal goodness-of-fit test.From previous studies we found that very large sample sizes may bias the result of the K-S test, leading to the rejection of the null hypothesis (the sample comes from the selected distribution) most of the time.
-The Copula model is estimated for every season separately.Thus, different precipitation geneses types are not masked by the same models.This, in general, leads to stronger dependencies and more robust models.
For the dependence function it was detected that the fitted Copula families vary both in space and time (seasonally).The fact that different dependence structures exist for the different seasons indicates that the method corrects for different dominating precipitation types, i.e., convective and stratiform precipitation.
The assumption of this approach is that the dependence structure between observed and modeled precipitation is stationary over the period of interest.For the investigation of the spatial performance, the Copula correction based on the mean value is applied.The validation results show that the proposed approach successfully corrected the errors in RCM Copula (mean regression) Copula (median regression) Copula (mode regression) Quantile Mapping derived precipitation.It is also found that the correction method performs better for overestimation than for underestimation.By investigating the spatial coherence, the proposed method is found to be able to preserve the spatial structure of the WRF model output.This is due to the fact that the Copula-based approach is conditioned on the WRF simulation.The method adjusts the value of the WRF precipitation according to the fitted Copula model.Even though the Copula models are estimated for each grid cell, the spatial coherence is captured by the Copula model as both the Copula families as well as the marginal distributions are also spatially clustered.
When comparing to the quantile mapping correction, the Copula-based method has an improved performance in reducing the RMSE.It is also found that the Copula-based method allows for a better correction with respect to the percentage of the time steps that are closer to the observations after the correction.The Copula-based method is able to provide a stable correction efficiency over the entire domain, even if the rank correlations between the RCM and observed precipitation are low.
Apart from traditional approaches, such as the quantile mapping which is based on a bijection transfer function, the Copula-based stochastic bias correction technique provides the information of the full PDF for each individual time step.This additionally provides a quality criterion for the bias correction, e.g., expressed as the spread of the PDF in form of the inter-quantile range.Subsequent modelers using RCM derived precipitation data are potentially enabled to make use of the full PDF, especially if they are interested in other statistical moments or in estimating uncertainties arising from this approach.
In this study, the Copula-based bias correction is only applied for past precipitation time series.The method would need further modifications if applied to future climate scenarios.A suitable algorithm must be able to reflect changes in the marginal distributions as well as the joint distributions,  taking into account possible non-stationarity of precipitation time series.

Figure 1 .
Figure 1.Terrain elevation of Germany (digital elevation model).The numbers represent the position of the four specific grid cells for which the performance of the Copula-based algorithm is analyzed in Fig. 13.

Figure 2 .
Figure 2. Visualization of a bivariate Copula model consisting of two marginal distributions and a theoretical Copula function that describes the pure dependence.
2. transform the time series x 1 , • • •, x n and y 1 , • • •, y n to the rank space by taking u = F X (x) and v = F Y (y); 3. calculate the empirical Copula C n (u, v) as a rank-based estimator for the theoretical Copula function C θ (u, v); 4. estimate the Copula parameter θ and perform goodnessof-fit tests to identify the best theoretical Copula function C θ (u, v); 5. calculate the Copula distribution conditioned on the variate v representing the RCM time series in the rank space;

Figure 4 .
Figure 4.The proportion of the four cases over the study area for the validation time period (from 1986 to 2000).

Figure 5 .
Figure 5.Estimated marginal distributions of precipitation for Germany for both REGNIE (left panel) and WRF (right panel).The results are shown for the calibration period (1971-1985) and positive pairs only.

Figure 6 .
Figure 6.Estimated marginal distribution of precipitation for the different seasons for REGNIE (left column panels) and WRF (right column panels) in Germany.The results are shown for the calibration period (1971-1985) for positive pairs only.Spring (MAM), summer (JJA), autumn (SON) and winter (DJF) are illustrated from top to bottom.

Figure 7 .
Figure 7. Identified Copula functions between REGNIE and WRF precipitation in the calibration period (1971 to 1985) with positive pairs.

Figure 10 .
Figure 10.Relative bias map of mean daily precipitation for Copula-based correction by taking the expectation (left panel), median (middle panel) and the mode (right panel) as the estimator of the sampled distribution.The results are based on the validation period 1986-2000.

Figure 11 .
Figure 11.Relative bias of mean daily precipitation for uncorrected (left panel) and corrected WRF precipitation field (right panel).The results are based on the validation period 1986-2000.

Figure 12 .
Figure 12.Relative bias between uncorrected (left panels) and corrected (right panels) WRF mean daily precipitation and the REGNIE data set in Germany for the different seasons (spring -MAM, summer -JJA, autumn -SON, winter -DJF, from top to bottom).The results are derived for the validation time period (1986-2000).

Figure 13 .
Figure 13.Comparison of bias-corrected WRF mean monthly precipitation (blue) with REGNIE (green) and original WRF data (red) for the selected four pixel 1-4 in the validation period from 1986 to 2000.The number of the respective grid cell is noted in the upper left corner of each plot.

Figure 14 .
Figure 14.Daily precipitation fields over Germany for three consecutive days from 9 to 11 January 1986.
. The RMSE in different quantiles are represented by RMSE 0.1 , RMSE 0.2 , . . ., RMSE 1.0 , while the subscript indicates the magnitude level.RMSE 0.1 evaluates the errors in the dry part of the observation distribution, implying the (0,1) errors.From RMSE 0.2 to RMSE 1.0 the RMSE are calculated for equally spaced probability intervals of the observed empirical distribution of wet days.For example, RMSE 1.0 indicates the errors in the magnitude of the 10 % highest events.As it can be seen from Fig.17, the Copula-based correction performs equally or even better in terms of the RMSE in most of the quantiles.

Figure 15 .
Figure 15.The rank correlations between RCM and REGNIE precipitation over the domain in the validation period from 1986 to 2000.

Figure 16 .
Figure 16.The changes of the RMSE in the validation period (1986-2000) by different bias correction methods.The green color indicates a decrease of the RMSE, while the ocher color implies an increase of the RMSE.

Figure 17 .
Figure 17.The root mean square errors (RMSE) and the root mean square errors for specific probability intervals (RMSE 0.1 , RMSE 0.2 , . . ., RMSE 1.0 ) for different methods.The selected four pixels are the same as in Fig. 13.The black solid line indicates the errors without correction.The results are derived from the validation period from 1986 to 2000.

Figure 18 .
Figure 18.The percentage of the corrections that are closer to the observations.Left panel: Copula-based correction (mean regression); right panel: quantile mapping correction.The results are derived from the validation period from 1986 to 2000.

Table 1 .
Theoretical Copula functions used in this study.

Table 2 .
Confusion matrix between REGNIE and WRF for the different distribution types.

Table 3 .
Seasonal confusion matrix of fitted REGNIE and WRF precipitation distribution.

Table 4 .
The proportion of grid cells for both REGNIE and WRF that K-S test failed and only BIC is used in goodness-of-fit procedure.
precipitation, for practical reasons one can choose, e.g., the expectation, median or mode to get single corrected values.