Interactive comment on “ Improving pan-european hydrological simulation of extreme events through statistical bias correction of RCM-driven climate simulations ” by R .

The publication is well written, though lengthy at times and repetitive in some places. It is an interesting piece of work with extensive diagrams and analysis, though in some places long winded . It is suggested that the paper might be split into two papers, on to look at the bias correction of the data sets and discuss the various aspects, the second on the use of this data set in conjunction with LISFLOOD and more importantly the effect of climate change on extreme events as the paper suggests. Some of the diagrams are very small and difficult to read, the space on the pages is not used effectively, the


Introduction and scope
Europe has experienced heavy floods over the last decades, which have affected thousands of people and caused millions of Euros worth of damage.Even though it is yet impossible to link recent flooding events to global warming, climate projections indicate that in the future we can expect more extreme weather events triggering flooding.Managing risks from extreme flood events will be a crucial component of climate change adaptation.It is therefore of utmost importance to develop and implement techniques that enhance the confidence in projecting future trends in flood occurrence and intensity.
The basis for the definition of potential impacts of global warming are climate predictions.To date the most advanced tools to obtain those predictions are coupled Atmosphere-Ocean General Circulation Models (AOGCMs or in short GCMs) (Giorgi, 2005).Owing mainly to their coarse horizontal resolution (ca.100-300 km), however, downscaling procedures are required to feed small-scale impact models and to guarantee a correct representation of the hydrologic processes at a much finer spatial resolution (Fowler and Kilsby, 2007).Possible strategies to achieve this include statistical or dynamical downscaling.The former develops statistical relationships between large-scale climate information Published by Copernicus Publications on behalf of the European Geosciences Union.and regional variables (Wilby et al., 1999), whereas the latter considers the application of Regional Climate Models (RCMs) driven by boundary conditions obtained from GCMs (Fowler et al., 2007).Advantages and drawbacks of both downscaling techniques have been widely discussed in literature (see, e.g.Wilby and Wigley, 1997;Murphy, 1999;Hellström et al., 2001;Haylock et al., 2006;Schmidli et al., 2007) and, hence, they will not be repeated here.For excellent discussions of downscaling techniques with focus on hydrological applications the reader is referred to Wood et al. (2004) and Fowler et al. (2007).In this paper, the dynamical downscaling approach is employed.
Despite the fact that hydrological components (e.g.surface and subsurface runoff) can be directly obtained from RCMs at lateral resolutions that agree with meso-scale catchments (e.g. 25 × 25 km or 50 × 50 km), these components are hardly ever used to assess the local hydrology.This is mainly due to the poor representation of the land surface processes, which results in a poor agreement between the surface runoff simulated by the RCMs and observations (Giorgi et al., 1994;Evans, 2003).Instead, the detailed climate information obtained from the coupled GCM-RCM is typically employed to force off-line hydrological models and, thus, to obtain more accurate representations of meso-and small-scale hydrologic processes.In recent years, several studies have appeared that implement this technique to assess the impacts of climate change on hydrological extremes, either at the local (e.g.Middelkoop et al., 2001;Etchevers et al., 2002;Prudhomme et al., 2003;Kleinn et al., 2005;Wilby et al., 2006;Steele-Dunne et al., 2008;Thorne, 2011), regional (e.g.Graham et al., 2007) or continental (e.g.Lehner et al., 2006;Dankers andFeyen, 2008, 2009) scale.
Notwithstanding RCMs have considerably advanced in reproducing regional and local climate, they are known to feature systematic errors (see, e.g.Jacob et al., 2007;Lenderink, 2010;Suklitsch et al., 2010).These biases are likely explained by model errors caused by imperfections in the climatic model conceptualization, discretization and spatial averaging within cells, and uncertainties conveyed from the GCM to the RCM (Teutschbein and Seibert, 2010).Particularly, small-scale patterns of precipitation are highly dependent on climate model resolution and parametrization.At the same time, some RCMs show systematic biases with a clear tendency to enhance these biases in more extreme cold or warm conditions (van der Linden and Mitchell, 2009).In a pan-European context, it has been noticed that climate models tend to overestimate warm summers in south-eastern Europe whereas precipitation in winter is too abundant in northern Europe (Jacob et al., 2007;Christensen et al., 2008).It has also been found that areas with a warm bias during winter generally exhibit a wet bias, whereas areas with a cold winter bias show a dry bias (Jacob et al., 2007).At the same time, Kjellström et al. (2010) found a clear tendency toward a warm bias in northern Europe for daily minimum temperature.This potentially has a significant impact on the simula-tion of hydrological processes such as spring-flooding events driven by snow melting.A tendency to simulate too much (daily) precipitation by different RCMs, with this tendency being more pronounced for the upper-end percentiles (i.e.extreme precipitation events), has also been noticed (Kjellström et al., 2010;Lenderink, 2010).Also a persistent overestimation of the wet day frequency is generally observed (see, e.g.Leander and Buishand, 2007;Piani et al., 2010b).It should be highlighted that these biases can in part be explained by errors in the observational data set employed for comparison (Lenderink, 2010).
The presence of biases in the forcing data seriously limits its use in hydrological impact assessments (see, e.g.Wood et al., 2004) and it can result in unwanted uncertainty regarding projected climate change (van der Linden and Mitchell, 2009).More specifically, RCM outputs not corrected for biases tend to produce inaccurate probabilities for extreme events, thus rendering the extreme value analysis less reliable (Durman et al., 2001).Hence, some form of prior bias correction of the forcing data is required if a realistic description of the hydrology is sought.This procedure should aim at correcting climate simulated by the RCM during a control period to properly reflect the spatio-temporal patterns of the observed climate and, subsequently, use the "transfer function" between climate observations and simulations obtained for the control period to correct future climate simulations (Piani et al., 2010a).
In recent literature, the presence of bias in dynamically downscaled outputs has been amply recognized (see, e.g.Shabalova et al., 2003;Lenderink et al., 2007;van Pelt et al., 2009;Hurkmans et al., 2010).As a result, several techniques to correct potential bias in precipitation and temperature have been developed.Some use (a) monthly correction factors based on the ratio between present-day simulated values and observed values (Durman et al., 2001), (b) linear or nonlinear transformation functions which consider changes in the mean and the variance of the observed and simulated time series (Horton et al., 2006;Leander and Buishand, 2007;Leander et al., 2008), (c) probability distribution transfer functions derived from observed and simulated cumulative distribution functions (cdfs), which is also referred to as "quantile mapping" or "histogram equalization" (Déqué, 2007;Block et al., 2009;Piani et al., 2010a,b), and (d) empirical factors to tailor the RCM outputs considering normalization, tuning of the standard deviation and calculation of residuals (Engen-Skaugen, 2007).Themeßl et al. (2011) evaluated an ensemble of seven empirical-statistical methods to correct bias in daily precipitation of a high-resolution regional climate hindcast for the Alpine region.They found that quantile mapping shows the best performance, particularly at high quantiles, which is favourable for applications related to extreme precipitation events such as flooding.For a recent review of bias correction techniques with focus on hydrological applications the reader is referred to Teutschbein and Seibert (2010).
Most of the above bias removal techniques have been applied at the local catchment scale, where dense networks of meteorological stations are available to reconstruct recent climate.With the exception of the regional study of Graham et al. (2007), all large-scale hydrological assessments to date have lacked a consistent bias correction step aimed at ensuring a correct spatio-temporal description of observed present climatic conditions.We note that there have been applications of the delta-approach at pan-European scale (see, e.g.Lehner et al., 2006), where for the control period the hydrological model has been forced by observed climate and for future climate projections the climate signal is added to current climate, hereby avoiding the need for bias correction.However, Lehner et al. (2006) used monthly time series of observed climate at 0.5 • lat-long resolution (New et al., 2000) and only accounted for long-term trends and average changes in seasonal climate, neglecting a potential change in climate variability and extreme events.
The main reason advocated to skip bias correction in pan-European impact studies has been the lack of a high-quality and high-resolution meteorological data set with sufficient observation length with which to confront climate simulations of present climate.In this regard, the meteorological observations data set E-OBS (see Haylock et al., 2008), recently developed in the European project ENSEMBLES (van der Linden and Mitchell, 2009), provides the opportunity to create fully consistent bias corrected time series of precipitation and temperatures to force pan-European hydrological models.It consists of European land-only daily high-resolution gridded data for precipitation and minimum, maximum and mean surface temperature.This data set improves on previous products in its spatial resolution and extent, time period, number of contributing stations and attention to finding the most appropriate method for spatial interpolation of daily climate observations (Haylock et al., 2008).Also newly available is the global dataset of observed meteorological forcing data (Weedon et al., 2010) of the last 50 yr from the EU project WATCH (http://www.eu-watch.org).The data are derived from the ERA-40 reanalysis product via sequential interpolation to 0.5 • lat-long resolution, elevation correction and monthly-scale adjustments based on CRU and GPCC monthly observations combined with new corrections for varying atmospheric aerosol-loading and separate precipitation gauge corrections for rainfall and snowfall.
Based on the recently established pan-European and global meteorological datasets Piani et al. (2010a,b) have proposed a "statistical bias correction" method based on quantile mapping that corrects daily values of mean, maximum, and minimum temperatures and precipitation, with the latter respecting the original precipitation-to-snow ratio.This technique works by fitting pre-defined "transfer functions" between climate observations and simulations for a given control period.A clear advantage of this technique is the flexibility of the fitting procedure by reaching a tradeoff between robustness and goodness-of-fit of the alternative "transfer functions", and the preservation of seasonal statistics even if the bias correction is performed daily.To date, this technique has not been applied in any pan-European hydrological impact assessment.
This work presents an assessment of the benefits of correcting the bias in regional climate simulations for hydrological impact assessments at pan-European scale, with an emphasis on hydrological extreme events.To this end, we employ the bias correction method recently developed by Piani et al. (2010b).This method corrects for errors not only in the mean but also in the shape of the distribution.Dosio and Paruolo (2011) recently showed that this method is also capable of correcting the high-end percentiles of the distribution of precipitation for an ensemble of RCMs for Europe.It is therefore capable to correct for errors in the variability as well, which is crucial for extreme event analysis.As target for the bias correction we make use of the E-OBS data set (Haylock et al., 2008).We correct daily RCM fields of mean, maximum and minimum temperatures, as well as daily precipitation.The uncorrected and bias-corrected climate data are evaluated against observed climate with respect to average and extreme statistics relevant for flood simulation.The original and bias-corrected climate data are then used to force the hydrological model LISFLOOD.This spatiallydistributed model has been developed for operational flood forecasting at European scale (van der Knijff et al., 2010) and has recently been applied in pan-European climate change impact assessments (Dankers andFeyen, 2008, 2009;Feyen and Dankers, 2009).Employing extreme value analysis techniques, the probability of extreme discharges is estimated and compared to results derived from long time observed discharge series at 554 stations in Europe.
The remainder of this paper is arranged as follows.Section 2 provides details on the data and methodology used in this work.This includes a description of the observed and climate data, the bias correction method, the hydrological modelling framework and the extreme value analysis for evaluating changes in hydrological extremes.We report our results in Sect.3, provide a more in-depth discussion in Sect. 4 and offer concluding thoughts in Sect. 5.

Observed and simulated climate
Climate simulations are assessed on the basis of the highresolution gridded E-OBS data set (v3.0) (Haylock et al., 2008) (publicly available from http://eca.knmi.nl/).The aim of the E-OBS data set is to represent daily areal values in alternative grid-boxes (i.e.0.5 • and 0.25  (Haylock et al., 2008).The E-OBS data set has been specially designed to represent grid box estimates, instead of point values.This is essential to enable a direct comparison with results obtained from RCMs (see, e.g.Chen and Knutson, 2008).Daily climate simulations are obtained from the RCM HIRHAM5 (Christensen et al., 2007) of the Danish Meteorological Institute (DMI), driven by the GCM ECHAM5 (Roeckner et al., 2003) of the Max Planck Institute for Meteorology (MPI), downloaded from the FP6 project ENSEM-BLES website (http://ensemblesrt3.dmi.dk/).In the framework of ENSEMBLES, the HIRHAM5-ECHAM5 model was run for the period 1961-2100 with a lateral resolution of ca. 25 km (0.22 • rotated-pole grids) and forced according to the SRES-A1B scenario of the Intergovernmental Panel on Climate Change (IPCC) (Nakicenovic and Swart, 2000).We selected this climate model, on the basis of a preliminary evaluation of a large number of regional climate simulations from the ENSEMBLES project.In that context, the HIRHAM5-ECHAM5 model showed to be one of the most deficient in reproducing present climate conditions in the period 1961-1990 (defined as control period in this study) when compared to the E-OBS data set.

Statistical bias correction
The bias correction method employed in this work falls within the category "histogram equalization" and it has been described in detail in Piani et al. (2010a,b).In this technique, the corrected variable (x cor ) is a function of the simulated (x sim ) counterpart given as x cor = f (x sim ).The function f is defined such that the intensity histograms of both corrected (x cor ) and observed (x obs ) variables match.As demonstrated by Piani et al. (2010a), the function f (also referred to as "transfer function") can be obtained by estimating the cumulative distribution functions (cdfs) of x obs and x sim and, subsequently, associating to each value of x sim the value of x obs such that cdf sim (x sim ) = cdf obs (x obs ).
Following Piani et al. (2010b), two functional forms are used to perform the bias correction of precipitation at the grid-cell level, x cor = a + bx sim (1) where a, b, x 0 , and τ are parameters of the function to be fitted.In Eq. (1) (linear case), a corresponds to an additive correction factor whereas b is a multiplicative factor.Piani et al. (2010b) suggest that in some regions the transfer function is well approximated by a linear function at high intensities, but a systematic change of slope occurs at the lowest intensities.Based on this, they suggest Eq. ( 2), which represents an exponential tendency to an asymptote.Here, the asymptote is given by the linear factor (a + bx sim ), whereas τ defines the rate at which the asymptote is approached and x 0 is the "dry day correction" factor (value of precipitation below which x sim is set to zero), defined here as −a/b.In addition to Eqs. ( 1) and (2), Piani et al. (2010b) also proposed a logarithmic fit, which however turned out to be less suitable due to fit errors.
From a global analysis, Piani et al. (2010b) concluded that any of these two functions may do a good job in correcting climate simulations, usually showing little improvement when moving from a two-parameter function (Eq. 1) to a four-parameter function (Eq.2).They noted, however, that where fitting errors were high for Eq.(1), Eq. (2) performed better.As a consequence, the linear model is generally used in most cases resorting to the exponential tendency to an asymptote model (Eq.2) when the performance of the linear model is unsatisfactory, i.e. a trade-off between robustness and goodness-of-fit.
To perform the bias correction of precipitation, series of simulated and observed daily values within a "construction period" (Y ) of length n years are selected for every month m and for each grid cell.That is, x m sim (x m obs ) for month m (m = 1,...,12) is given by x m sim (x m obs ) : x ∈ Y i,m ; i = 1,...,n (for sake of clarity we are ignoring the spatial index).Subsequently, Eq. ( 1) or (2) are fitted using x m sim and x m obs and monthly correction transfer functions are obtained for each grid cell.Daily precipitation values are then obtained by interpolating monthly transfer functions into daily transfer functions, using as reference points the middle-day of each month.When the estimated transfer functions for two consecutive months are both linear or both exponential-type, the daily transfer functions are obtained by a linear interpolation of the parameters of both monthly transfer functions.On the contrary, when monthly transfer functions are of a different type, an interpolation scheme is implemented that preserves the characteristics of both linear and exponential-type transfer functions.For a detailed description of this interpolation scheme the reader is referred to Piani et al. (2010b).
Following Piani et al. (2010b), only wet days (i.e.days with more than 1 mm of precipitation) are considered to perform the fitting of one of the two proposed transfer functions (Eqs. 1 or 2) through ordinary least squares (OLS) or non-linear least squares (NLS), respectively.If the number of wet days is less than 20 in the observed record, or if the mean observed precipitation value is less than 0.01 mmd −1 , then a simple additive correction factor equal to the difference in the means between simulated and observed series is applied.In turn, if the number of wet days is greater than 20, and the mean observed precipitation value is greater than 0.01 mmd −1 , the linear transfer function (Eq. 1) is fitted.The Hydrol.Earth Syst.Sci., 15, 2599Sci., 15, -2620Sci., 15, , 2011 www.hydrol-earth-syst-sci.net/15/2599/2011/ exponential-type transfer function (Eq.2) is selected to perform the bias correction under two conditions.First, if for the linear fit a > 0, which is interpreted as the corrected precipitation being always greater or equal than zero (x cor ≥ 0), i.e. ignoring dry days entirely; and, second, when the multiplicative factor (slope of Eq. 1) is too extreme, with arbitrary values defined in the range b < 0.2 and b > 5.
For temperature, series of observed and simulated daily (mean, maximum and minimum) temperatures within a "construction period" for every month are employed to fit the monthly linear transfer functions.Piani et al. (2010b) suggest that independently correcting the mean, maximum and minimum temperature results in large relative errors in the daily temperature range (T max − T min ) and in the skewness (T mean − T min /T max − T min ) of the corrected series.For that reason, the fitting of the monthly linear transfer functions is performed on the mean (T mean ), temperature range (T rg ) and temperature skewness (T sk ).Subsequently, daily transfer functions are obtained by a weighted linear interpolation of the parameters of the contributing monthly linear transfer functions for each variable T mean , T rg and T sk .Using the daily transfer functions the corrected mean temperature (T c mean ) is directly obtained whereas minimum and maximum temperatures are obtained from daily corrected values of T c rg and T c sk as , respectively.Given the availability of observed gridded data for daily precipitation, average, maximum, and minimum temperature in the E-OBS data set, the bias correction was performed for these four forcing variables.As "construction period" to build the transfer functions we defined the control period 1961-1990, i.e. 30 yr of daily data.We employed two series for each month (observed values from the E-OBS data set and simulated values from the HIRHAM5-ECHAM5 climate model) of ca.900 values each to build the monthly transfer functions.The transfer functions obtained in this period were then applied to correct control  and future (2071-2100) climate simulations from the HIRHAM5-ECHAM5 model.In a parallel work, Dosio and Paruolo (2011) validated the transfer functions obtained in the control period for an independent (validation) period between 1991-2000 using an ensemble of 11 RCMs over Europe.We note here the important assumption of stationarity, which means that the corresponding form of the transfer function and its associated parameters are invariant over time.As a result, the transfer function estimated for present climate conditions is assumed to remain valid to correct biases in future climate simulations.As highlighted by Christensen et al. (2008), however, the stationarity assumption could be violated as biases can grow under climate change conditions and they depend on the values of the variables to be corrected.

Hydrological simulation
LISFLOOD is a GIS-based spatially-distributed hydrological rainfall-runoff model, which includes a one-dimensional hydrodynamic channel routing model (van der Knijff et al., 2010).Driven by meteorological forcing data (precipitation, temperature, potential evapotranspiration, and evaporation rates for open water and bare soil surfaces), LISFLOOD calculates a complete water balance at every (daily) time step and every grid cell defined in the modelled domain.Processes simulated for each grid cell include snowmelt, soil freezing, surface runoff, infiltration into the soil, preferential flow, redistribution of soil moisture within the soil profile, drainage of water to the groundwater system, groundwater storage, and groundwater base flow.Runoff produced for every grid cell is routed through the river network using a kinematic wave approach.Although this model has been developed aiming at operational flood forecasting at pan-European scale, recent applications demonstrate that it is well suited for assessing the effects of land-use change and climate change on hydrology (see, e.g. de Roo et al., 2001;Feyen et al., 2007;Dankers andFeyen, 2008, 2009).
The current pan-European setup of LISFLOOD uses a 5 km grid and spatially variable input parameters and variables obtained from European databases when available.Soil properties were obtained from the European Soil Geographical Database (King et al., 1994) whereas porosity, saturated hydraulic conductivity and moisture retention properties for different texture classes were obtained from the HYPRES database (Wösten et al., 1999).Vegetative properties and land use cover were obtained from the CORINE2000 data set (EEA, 2002) while elevation data and river properties were obtained from the Catchment Information System (Hiederer and de Roo, 2003).Parameters controlling snowmelt rates, overland and river flows, infiltration, and residence times in the soil and subsurface reservoirs have been calibrated against historical records of river discharge in 258 European catchments and sub-catchments.The calibration period varied for different catchments but all spanned at least 4 yr between the period 1995-2002.It may be argued that the selection of four years is too short for long-term applications, however, the selection of this period responded to a trade-off between computational time and the use of reliable and recent available information on discharges.Even if the calibration period is restricted to 4 yr, it must be stressed that LISFLOOD was calibrated having a particular interest in correctly reproducing the timing and magnitude of flooding events.The meteorological variables used to force LISFLOOD during the calibration were obtained from the Meteorological Archiving and Retrieving System (MARS) database (Rijks et al., 1998) and interpolated using an inverse weighted distance method over the 5 km grid.For catchments where discharge measurements were not available simple regionalization techniques (regional averages) were applied to obtain the parameters.The algorithm implemented for calibration corresponded to the Shuffled Complex Evolution (SCE) (Duan et al., 1992).A more detailed description of the hydrological processes and parameters included in LISFLOOD are given by van der Knijff et al. ( 2010) whereas Feyen et al. (2007Feyen et al. ( , 2008) ) discuss the calibration of LIS-FLOOD for different European catchments.
To drive the LISFLOOD model, the HIRHAM5-ECHAM5 daily simulations of temperature, precipitation, solar and thermal radiation, albedo, dewpoint temperature, humidity and wind speed were re-gridded to the 5 km grid used by LISFLOOD employing a nearest neighbour approach on the basis of the centre points of the 25 km grid cells of the HIRHAM5-ECHAM5 model.This resulted in forcing data on a grid fully consistent with the one employed to run LISFLOOD at pan-European scale.Two sets of forcing data were generated, one based on uncorrected regional climate simulations, and another based on the bias corrected fields of precipitation and temperature (avg, min and max).Daily information on solar and thermal radiation, albedo, dewpoint temperature, average, maximum and minimum temperatures, humidity and wind speed were used to calculate reference evapotranspiration employed by LISFLOOD using the Penman-Monteith model (Allen et al., 1998).Subsequently, fields of precipitation, average temperatures and Penman-Monteith-based evapotranspiration fields were used to force LISFLOOD in the control and future period.As a result, time series of daily discharge for each river pixel in the modelled domain (depicted in Fig. 1) were obtained for both bias corrected and uncorrected forcing data.
We assessed the benefits of correcting the bias in the climate simulations for hydrological impact assessments.For this purpose, the simulated discharges of LISFLOOD in the control period 1961-1990 were compared to observed discharges at 554 stations throughout Europe (see Fig. 1).For these stations at least 30 yr of daily discharge data were available with few exceptions of stations with 20 or 25 yr of daily discharge data in the control period.The distribution of these stations is somewhat uneven with a high concentration of stations in central Europe, whereas eastern-Europe shows the lowest station density.Despite this, these stations can be considered a good representation of different climatic conditions and hydrologic regimes with upstream areas contributing to the discharge between ca.1000 km 2 and ca.810 000 km 2 .

Extreme value analysis and uncertainty assessment
To estimate the probability of extreme discharge levels, a Gumbel distribution was fitted to the annual maximum discharges using the maximum likelihood estimation (MLE) method (see, e.g.Beirlant et al., 2004).The Gumbel distribution is a special case of the Generalized Extreme Value (GEV) distribution with a shape parameter (ξ ) explicitly set to 0 (Coles, 2001).Whereas the safest option would be to accept some degree of uncertainty about the value of ξ (Coles, 2001), by fitting a Gumbel distribution the uncertainty in the shape parameter is explicitly neglected.Dankers and Feyen (2008), however, showed on the basis of a likelihood-ratio test that the use of the three-parameter GEV is not justified in the majority of the river cells (ca.85 %) defined in the same European domain analysed in this work.In addition, they found no evidence that either the GEV or the Gumbel distribution produced consistently higher or lower estimates of return levels.
To obtain the 95 % confidence interval for the return levels we employed the profile-likelihood method (see, e.g.Coles, 2001;Beirlant et al., 2004).By capturing non-symmetric behaviour of confidence intervals, especially for return levels associated to long return periods, the profile-likelihood method is far more robust in assessing uncertainty compared to traditional approaches as the "Delta method" described in Coles (2001).
The profile-likelihood method works through reparametrization of the Gumbel model so that the return level z p is one of the model parameters, where µ and σ are the location and scale parameters of the Gumbel distribution, respectively, and 1/p is the return period.To obtain the profile-likelihood for a given return level, z p is fixed to a value and the corresponding log-likelihood is maximized with respect to σ .This is repeated for a range of values of z p .The profile-likelihood, l p (z p ), is built from the corresponding maximized values of the log-likelihood for different z p values.Using the deviance function (see, e.g.Coles, 2001, p. 34), a (1 − α) confidence interval for return level z p can be obtained as follows, where the deviance function is defined as D(z p ) = 2 l(ẑ p ) − l p (z p ) , with l(ẑ p ) being equal to the loglikelihood evaluated at the ML estimator of z p .

Precipitation
An assessment of precipitation simulated by the HIRHAM5-ECHAM5 climate model is shown in Fig. 2 (plates a, b, c, and d).These plates show the ratio between the uncorrected HIRHAM5-ECHAM5 simulations and the E-OBS observations for the average seasonal (DJF, MAM, JJA and SON) precipitation in the control period .Seasonal averages show a clear overestimation of precipitation in large parts of the Iberian Peninsula, Central Europe, Great Britain, northern Europe and Scandinavia, with values as high as 7-20 fold the observed precipitation.For the winter season (Fig. 2a) the HIRHAM5-ECHAM5 model tends to overestimate precipitation all across Europe, with some weak underestimation in the Scandinavian mountains, west coast of Italy, north-west coast of Great Britain and Balkan.A similar pattern is observed for the transition seasons (MAM and SON).In summer season (Fig. 2c), on the other hand, precipitation is overestimated in northern European regions, whereas too dry conditions are simulated in south-eastern parts.The summary statistics presented in Table 1 show that, averaged over the European domain, the simulated average annual precipitation almost doubles the observed precipitation.These results are in agreement with the findings of van Meijgaard et al. ( 2008) and Kjellström et al. (2010), who analysed a series of RCMs driven by lateral boundary conditions obtained from ERA40 reanalysis product.
Plates e, f, g, and h of Fig. 2 show the ratio between bias corrected HIRHAM5-ECHAM5 precipitation and observations.The range of over-and underestimation is reduced, with differences in seasonal average precipitation of approximately ±5 % of the observed precipitation.This is also reflected in the average over the European domain of the annual and seasonal corrected precipitation, which is nearly identical to that of the observation data set (see Table 1).For summer (JJA) an overestimation of the observed precipitation is still present in the most southern areas of the Iberian Peninsula, whereas a weak underestimation of precipitation can be observed in the Balkan.
In spite of the excellent performance of the bias correction, a few grid cells associated to mountain areas (e.g.Alps, Apennines, Balkan, and Carpathians) still show a pronounced overestimation up to 3-16 fold the observed precipitation after bias correction.However, the overestimation in these areas of the uncorrected precipitation amounts to 25-30 fold the observed annual precipitation.The seasonal analysis indicates that the residual overestimation after the bias correction mainly occurs in winter season, and to a lesser extent in the transition seasons.Analysis of the type of fitting function employed in these grid cells suggests that switching between the linear and exponential type of fitting during the winter months (DJF) could significantly alter the interpolation of daily values as they are influenced by the anterior and posterior (monthly) fitting function.This could potentially   The wet day frequency relates to the distribution of precipitation in time.The comparison presented in Fig. 3 shows a pronounced overestimation of the average annual number of wet days (daily precipitation ≥ 1 mmd −1 ) in northern and western parts of Europe, as well as in mountain regions, whereas in most of southeastern Europe and parts of the Scandinavian mountains too few wet days are simulated.This is conform to what most studies report, namely, that climate models tend to simulate too many days with weak precipitation, especially in humid climate zones (see, e.g.Leander and Buishand, 2007;Piani et al., 2010b;Iizumi et al., 2011).The bias correction drastically reduces the error in the number of wet days simulated with respect to the E-OBS data set.This improvement in the distribution of wet days frequency after bias correction is explained by the advantage of fitting both "a" and "b" parameters in Eq. ( 1) and ( 2), which are related to the dry day correction factor, and by using the portion of both time series (observed and simulated) that correspond to wet days only to estimate the transfer functions.
At the same time, Fig. 3 shows a slight underestimation of the wet day frequency after bias correction (plate b).From the analysis of an ensemble of RCMs for Europe, Dosio and Paruolo (2011) suggested that low-end percentiles of bias corrected precipitation are subject to large uncertainties due mainly to the choice of the transfer function (Eq. 1 vs. Eq.2) used to perform the bias correction.In addition, they found a systematic underestimation of the small values of bias corrected precipitation when compared to the pdf obtained from the E-OBS data set.As a consequence, the underestimation of the wet days frequency after bias correction is likely explained by the systematic underestimation of the low-end percentiles for the bias corrected precipitation observed by Dosio and Paruolo (2011), which increases the probability for a given day to be considered as a "dry day" for a constant x 0 .In addition, and even if the fitting of "a" and "b" helped to drastically reduce the excessive number of wet days observed before bias correction, there still exists the possibility that the number of dry days is slightly overestimated given that x 0 is obtained purely from a fitting process, thus, potentially differing from the actual number of dry days in the observed precipitation.
Rain-driven flooding can be caused by short-term heavy rain events (rapid-onset flooding such as pluvial and flash floods) or prolonged periods of intense rain (slow-onset flooding such as river or fluvial floods).To evaluate the possible effect of the bias correction on both types of phenomena we evaluate different precipitation indicators, namely, the 99th percentile of daily precipitation and the maximum precipitation amount in 3, 5 and 7 consecutive days.The comparison for the seasonal 99th percentile of daily precipitation (Fig. 4 plates a, b, c, d and Table 1) shows that daily extreme precipitation is overestimated in most parts of Europe, except for southeastern Europe, Italy and southern parts of the Iberian Peninsula in summer, and some areas in the Scandinavian mountains mainly in winter.The seasonal 3, 5 and 7-day maximum precipitation (see Fig. 5 plates a, b, c and d for the 5-day maximum precipitation) show a similar pattern (which corresponds very well to the average precipitation shown in Fig. 2, plates a, b, c and d), although compared to the 99th percentile of daily precipitation the underestimation in the Balkan is more pronounced here in the transition seasons.The general overestimation of the extreme indicators is in line with observations that many RCMs contained in the ENSEMBLES project simulate too much (daily) precipitation, a tendency that is more pronounced for the upper-end percentiles (i.e.extreme precipitation events) (Kjellström et al., 2010;Lenderink, 2010).The latter, does not preclude that some RCMs forced by different conditions may underestimate the high-end percentiles while overestimating the mean precipitation.
The bias correction considerably reduces the error in the seasonal 99th percentile to within the range ±25 % of the observed value.Similar as for the average precipitation, in some grid cells associated to mountain areas (e.g.Alps, Apennines, Balkan, and Carpathians) still a clear tendency to overestimation can be observed.Also for the 3, 5 and 7-day maximum precipitation, aside from the residual strong overestimation in the sparse mountainous grid cells, the error is considerably reduced.However, notwithstanding that the European-average statistics closely reproduce the observed statistics (see Table 1), the bias correction results in a slight underestimation of the maximum precipitation amount in 3, 5 and 7 consecutive days across most of Europe.Similarly to the case of wet days frequency, this is most likely related to the underestimation of the low-end percentiles of the bias corrected precipitation (see, e.g.Dosio and Paruolo, 2011), and a potential overestimation of the number of dry days obtained from the fitting of the transfer functions.The latter, in www.hydrol-earth-syst-sci.net/15/2599/2011/ combination with the fact that observed extremes likely undervalue true extreme precipitation may result in an underestimation by the hydrological model of observed extreme discharges.

Temperature
Figure 6 (plates a, b, c and d) shows on a seasonal basis the difference in 2 m temperature (T 2m ) between the HIRHAM5-ECHAM5 simulations and the E-OBS data set.A common feature, evident in all seasons, is the strong warm bias (up to 4 • C and more) in south-eastern Europe, which extents further north and west depending on the season.In winter, T 2m is overestimated nearly all over Europe, except for the British Isles, along the western Scandinavian coast, in the Alps and Pyrenees, with a somewhat mixed bias-pattern in the rest of the Iberian Peninsula.In the other seasons, simulated T 2m is too low for most of the western half of Europe, including the Iberian Peninsula, France, Great Britain, the Alps and western parts of Scandinavia.In summer, and to a lesser extent in spring, a cold bias up to 3 • C can be observed in northern Europe.After performing the bias correction of the HIRHAM5-ECHAM5 temperature simulations, the anomalies in T 2m are drastically reduced (see Fig. 6 plates e, f, g and h, and Table 2).For all seasons, discrepancies with the E-OBS data set have become marginally small, with maximum differences not exceeding 0.5 • C.
A comparison on annual basis of the daily maximum (T max ) and minimum (T min ) temperature is presented in Fig. 7.As maximum temperatures occur in summer season, for T max a similar bias pattern can be observed as for T 2m in summer (see Fig. 6c). Figure 7a shows for T max a warm bias in southeastern Europe with values higher than 1 • C that transitions into an underestimation in the western and northern half of the continent, where the cold anomaly ranges between 1 • C and 5 • C compared to the E-OBS data set.Minimum temperatures, on the other hand, are overestimated all over Europe, especially in the south-east, except for the Alps, western Pyrenees and western parts of Norway (see Fig. 7b).In southeastern Europe, the overestimation of T min is much stronger than that of T max .Hence, the   These observations on simulated temperature are in agreement with other studies that report a warm bias in winter and summer season, especially in southeastern parts of Europe (Jacob et al., 2007;Christensen et al., 2008), and a consistent cold bias in the Alpine region (Suklitsch et al., 2010).A strong underestimation of T max over northern Europe has been observed by Nikulin et al. (2011) when examining an ensemble of RCA3 simulations driven by six different GCMs (including ECHAM5), as well as by Kjellström et al. (2007) for 10 different RCMs driven by HadAM3H.Possible causes for temperature bias include missing processes such as a realistic description of dust aerosols or regional feedback processes.Especially in the summer season, such regional feedback processes include soil moisture exhaustion or decreasing cloud cover, which can impact the temperature developments.Also, the capability of RCMs to simulate the regional climate depends on the realism of the large-scale circulation that is provided as lateral boundary conditions, in particular in regions where the large-scale dynamic forcing plays an important role compared to local forcing.This is indeed the case over large parts of Europe.van Ulden and van Oldenborgh ( 2006) for example have shown that a warm bias in Central Europe of the order of degrees was induced by a bias in the large-scale circulation boundary conditions.Also in central Europe, Plavcová and Kyselý (2011) report a substantial underestimation of the diurnal temperature range throughout the year in ENSEMBLES RCM experiments.Besides the deficiencies in the simulation of atmospheric circulation, particularly too strong advection and overestimation of westerly flow at the expense of easterly flow in most RCMs, they suggest that biases in simulating anticyclonic, cyclonic and straight flow also contribute to the underestimated diurnal temperature range.Also, although to a lesser extent than for precipitation, inaccuracies in the observational datasets and interpolation of station data may explain part of the discrepancies between simulated and observed temperature (see, e.g.Lorenz and Jacob, 2010).

Impact of bias correction on the simulation of hydrological components
Flood generation is a highly non-linear process that depends on factors such as the intensity, volume and timing of precipitation as well as on antecedent conditions of the river basin (e.g.soil wetness, snow or ice cover).Because of the small to meso-scale character of these factors, the correct representation of temperature and precipitation, both spatially and temporally, are key to enhance the predictive capacity of the LISFLOOD model.Whereas the link between extreme (long or intense) precipitation and flooding is obvious, in this section we detail the effect of the bias correction on processes that indirectly affect flooding.
Evapotranspiration (ET) regulates the flow of moisture back into the atmosphere and as such affects storage capacity of soils.An overestimation of ET results in lower stocks of moisture in the soil, hence a larger amount of water that can be stored during subsequent wet periods.The effect of the bias correction on evapotranspiration is shown in Fig. 8.The run with uncorrected climate data results in higher ET amounts in northern-central Europe due to higher temperatures and precipitation during most of the year.In southeastern Europe, notwithstanding the strong warm bias, lower ET amounts are simulated, as the underestimation of precipitation in summer limits the water available for ET (see, e.g.Fig. 2c).In Scandinavia, on the other hand, lower ET amounts are simulated due to the cold bias in the uncorrected summer temperature (see, e.g.Fig. 6c).
It must be noted that variables such as dewpoint temperature, solar and thermal radiation that are employed together with the bias-corrected temperature fields to calculate the evapotranspiration terms driving LISFLOOD (see, e.g.van der Knijff et al., 2010) are not corrected for potential bias.This may violate the energy balance and potentially introduce bias in the subsequent hydrological simulations.
Many snow-dominated regions in Europe are subject to snowmelt-induced floods that result from rapid melting of the snowpack, sometimes amplified by rainfall.Ice-jam floods, related to freeze-up and break-up periods, also frequently cause winter and spring floods.The accumulation of snow during cold periods depends on the amount of precipitation, as well as on the temperature, which determines whether precipitation falls as rain or snow.A correct simulation of the (variability in) temperature also affects the possible onset of snowmelt or ice-jam floods.We show for the actual snowpack depths and the days with snow cover the difference between the simulation driven by uncorrected and bias corrected climate data (Fig. 9).From Fig. 9 we observe a thicker snowpack and more days with snow cover mainly in northern Europe and mountain areas when not correcting for bias in the climate simulations.The difference in snowpack in these areas ranges between 100 and 1000 mm snow water equivalent (SWE), and locally at high altitudes even up to 5000 mm and more.The more pronounced accumulation of snow and higher number of days with snow cover in northern Europe is due mainly to the strong overestimation of precipitation (see Fig. 2), which outweighs the effect of the warm bias in cold periods (which potentially reduces the ratio snow to rain).In central, central-eastern and south-eastern regions, the warm bias in winter reduces the number of days with snow cover (see Fig. 9b), but the wet bias in winter still results in a higher maximum snowpack.The considerable increase in snowpack in mountain ranges such as the Alps and Pyrenees follows from the wet and cold bias in cold periods.

Impact of bias correction on the simulation of hydrological extremes
Figure 10 shows observed versus simulated average discharge and average annual maximum discharge for each of the 554 validation stations (see Fig. 1).In general, the hydrological model driven by the uncorrected climate data largely overestimates average and extreme river flows (Fig. 10 plates a and c).This follows from the strong overestimation of the average and extreme precipitation in most regions of Europe.
In regions dominated by (spring) snowmelt, the pronounced over-accumulation of snow in winter with the uncorrected climate data also likely contributes to the overestimation of extreme discharges.LISFLOOD simulations forced by the bias corrected climate data (Fig. 10 plates b and d) show a strong amelioration in reproducing the observed discharge statistics.Visual inspection and the values for the coefficient of determination (r 2 ) and model efficiency (EF) show that the observed flow statistics are reasonably well reproduced after implementing the bias correction method, with a general tendency of better performance for average flows and with increasing catchment size.
Notwithstanding the overall good agreement between observed and simulated discharge statistics when employing bias-corrected forcing data, large discrepancies do occur at a small number of stations (see plates b and d of Fig. 10), where the relative errors can be 1 or 2 orders of magnitude (note the logarithmic scale).Deviations from the observation-based statistics can be attributed to errors in the hydrological model, its static input and in the calibration and regionalization of its parameters.Part of the disagreements can also be linked with man-made modifications of flow regimes in many catchments in Europe (see, e.g.Dynesius and Nilsson, 1994) that are not accounted for in the hydrological model.Also, albeit that the E-OBS data set is currently the best available for Europe, it is known to feature errors and uncertainties, which are translated to the corrected forcing fields during the bias correction step.More specifically, the E-OBS data set compares better to the mean of the variables than to the extremes, with larger differences for precipitation than for temperature (Haylock et al., 2008;Hofstra et al., 2010).This, in combination with the slight underestimation of the 3, 5 and 7-day maximum precipitation after bias correction (see Fig. 5) may also explain why for the annual maxima the greater part of the stations falls below the one-to-one line.
Figure 11 shows the results of fitting by MLE a Gumbel distribution to the annual maximum discharges for the period 1961-1990 in 20 selected stations.These stations are a representative subsample of the 554 validation stations (see Fig. 1) covering different hydro-morphologic and climatologic conditions and catchment size (ranging from 9948 to 80 700 km 2 ).In general, there is a strong overestimation of the empirical return levels for the annual maxima (black crosses) when the hydrological model is driven by uncorrected forcing data.The latter is more pronounced for higher return levels.A clear improvement in reproducing the empirical return levels is observed when LISFLOOD is run using bias corrected forcing data (see blue and red dashed-lines in Fig. 11).Although for some stations still a marked disagreement exists between the return levels obtained from empirical plotting positions and the return levels obtained from bias corrected forcing data, for 50 % of stations the confidence intervals more closely envelop the empirical return levels.This is also reflected in Table 3, which summarizes the 95% confidence intervals obtained for a return period of 100-yr for the stations shown in Fig. 11.The reduction in the confidence intervals reaches up to 70 % compared to the estimations based on uncorrected forcing data.It is worth mentioning that we employed a series of 30 annual maxima  to perform the Gumbel fitting, thus, any extrapolation beyond 30 yr could potentially be dominated by large uncertainties.Also, the confidence intervals only reflect the uncertainty in the Table 3. 95 % confidence intervals obtained using the profile-likelihood method for a return period of 100-yr for the gauging stations depicted in Fig. 1 for the control period .Observed column corresponds to Q 100 obtained from a Gumbel distribution fitted from the observations.All values in m 3 s −1 (values in parentheses show the percentage reduction).Figure 1: Return level plots of simulated discharge levels in the 20 selected gauging stations showed in Fig. 1, based on a Gumbel distribution fit to the annual maxima for the control period .Full lines represent hydrological simulations driven by uncorrected forcing data whereas dashed lines represent the bias corrected counterpart.Also included in the plates are the 95% confidence intervals (red dashed-and full-lines) derived using the profile-likelihood method.Black crosses represent return levels obtained from empirical plotting positions of observed annual maximum discharges at the selected stations.1, based on a Gumbel distribution fit to the annual maxima for the control period .Full lines represent hydrological simulations driven by uncorrected forcing data whereas dashed lines represent the bias corrected counterpart.Also included in the plates are the 95 % confidence intervals (red dashed-and full-lines) derived using the profile-likelihood method.Black crosses represent return levels obtained from empirical plotting positions of observed annual maximum discharges at the selected stations.extreme value fitting, and other sources of uncertainty from the hydrological modelling exercise are not accounted for.

Stations
In most regions of Europe the empirical return levels are overestimated when LISFLOOD is driven by uncorrected forcing data.This can be deducted from the analysis of the 20 stations above in combination with Fig. 12, which shows the ratio of 100-yr return level discharges between runs driven by the uncorrected and bias corrected forcing data for the control period.There is, at the same time, a well-defined area (north of the Carpathian mountains) where the 100-yr flood discharges based on bias corrected forcing data are slightly higher (ca. 25 %) than those obtained when using uncorrected forcing data.In the latter, this region is subject to a large overestimation of the temperature in all seasons, as well as to an underestimation of precipitation in summer and relatively moderate overestimations of precipitation in the other seasons.In the hydrological model, compared to the run with the bias corrected forcing data, this translates in higher evapotranspiration losses during growing season (see Fig. 8), less days with frost and snow cover, reduced snow accumulation in winter (see Fig. 9), and drier soils and depleted groundwater stocks after summer.

Future recurrence intervals
The previous sections have shown the benefit of bias removal for reproducing with the LISFLOOD model observed average and extreme discharge statistics in the control period .Based on the assumption of a stationary error model, future climate is corrected using the transfer functions derived from the control climate.As such, in accordance to the observations in the control period (Fig. 12) but not shown here, future flood magnitudes based on the uncorrected climate simulations largely overshoot those based on corrected climate in most regions of Europe.Hence, although not verifiable, they very likely provide a more biased estimate of future flood magnitudes.
Another interesting aspect, however, is the future recurrence interval of a given flood level observed in the control period, which does not depend on the absolute magnitude, but on the change thereof between the control and future period.Here we therefore show what could be the extent of the errors induced by analysing recurrence intervals based on hydrological simulations driven by uncorrected forcing data compared to its bias corrected counterpart.
We first calculated the discharges associated to a 100-yr flood event for the hydrological simulations driven by uncorrected and bias corrected climate data.This was done employing the Gumbel distribution fitted in the control period .Future recurrence intervals of a 100-yr event observed in the control period were then obtained by translating the respective control period 100-yr discharges using the Gumbel distribution fitted in the future time slice (2071-2100).This procedure was repeated for both simulations driven by uncorrected and bias corrected climate data.Deviations from the control period recurrence interval indicate whether the control period 100-yr event will be more or less frequent in a future climate.Figure 13a shows the future recurrence intervals of a control period 100-yr event, where red indicates that a control 100-yr flood will become more frequent (top scale of legend).In most rivers across Europe the return interval of what is currently a 100-yr flood may in the future decrease to 50 yr or less.A notable exception is the considerable increase in flood recurrence interval (or decrease in frequency) in the northeast, where warmer winters and a shorter snow season will likely reduce the magnitude of the spring snowmelt peak.Also in some other rivers in central and southern Europe an increase in recurrence interval is found.
Figure 13b shows the difference in recurrence interval between the runs driven by corrected and uncorrected climate data.River stretches in white show a good agreement in the change in recurrence interval, red implies that for the simulations driven by bias-corrected forcing data floods are projected to become more frequent compared to the uncorrected driven run, and blue the opposite (bottom scale of legend).Notable is that the simulations driven by uncorrected climate data tend to simulate less frequent floods compared to the simulations driven by corrected data in most of northern Europe.Analysis of the control and future snowpack maps reveals that this is caused by a stronger relative reduction in snowpack depth between control and future climate when the climate simulations are not corrected for bias.In the rest of Europe, except for the British Isles where both runs project similar changes in recurrence interval, the recurrence intervals for the uncorrected driven run deviate in both directions from those derived from the bias-corrected driven run.This suggests that estimated recurrence levels based on uncorrected forcing data might wrongly indicate future recurrence intervals of a 100-yr event observed in the control period.

Discussion
Despite the strong improvement after implementing the bias correction, some discrepancies between observations and corrected climate simulations of precipitation and temperature remain.This could be related to inherent limitations of the bias correction method employed, limitations of the RCM to properly simulate/capture local orographic effects given the lateral resolution employed in this work (25 × 25 km), or inadequacies in the observed gridded data set that was used as target to perform the bias correction of the climate simulations.
Within the first group of limitations we should note that by only correcting average, maximum and minimum temperatures, without considering radiative forcings and dewpoint temperature, the energy balance may no longer be preserved.At present, there seems to be no workaround to this prob-lem as there is no high-quality high-resolution gridded observed data set for these variables to perform the bias correction.Also, results shown herein suggest that any potential bias introduced due to the not-closure of the energy balance does not outweigh the improvement gained by using biascorrected temperature and precipitation fields.
As noted by Piani et al. (2010b), interpolation of monthly transfer functions to obtain daily values of precipitation or temperature may cause the monthly statistics to not match the observed statistics as daily values in one month are influenced by the values of preceding and following months.In our case, with the exception of few grid cells, temperature and precipitation statistics for the control period 1961-1990 for average conditions are in full agreement with observed statistics, and even if the bias correction of these variables is done on a daily basis, seasonal statistics show a good agreement as well.This, however, does not guarantee that explicit spatio-temporal correlations between different variables, e.g.temperature and precipitation, will be preserved.Dosio and Paruolo (2011) argue that correlations are appropriate measures of dependence only for variables following multivariate Gaussian or Elliptical distributions and that a more appropriate measure is given by Copula functions (see, e.g.Salvadori and De Michele, 2004).The dependence between precipitation and temperature can be expressed via copulas, which have the notable property of being invariant with respect to monotonically increasing transformations of random variables.In our case, the transfer functions used for bias correction (linear and exponential) are monotonically increasing transformations of precipitation and temperature, implying that uncorrected and bias corrected values have the same Copula function.Dosio and Paruolo (2011) suggest that based on the Copula function as a proper measure of dependence among fields, the univariate bias correction employed in this work preserves the joint prediction of precipitation and temperature and hence the bias correction does not alter the dependence structure between precipitation and temperature.We note here, however, that the discussion on the nature of the relationship between precipitation and temperature and ways of expressing this relationship is still ongoing and is far from settled as highlighted by Piani et al. (2010b).
Clear limitations of the RCMs relate to the poor description of land surface processes, the poor ability to capture the modulation of precipitation in areas of complex orography (Herrera et al., 2010), as well as unresolved/unexplained processes conveyed from the GCMs employed to drive the corresponding RCMs (Giorgi, 2005).
At the same time, it can be argued that the observed gridded data set (E-OBS) employed to perform the bias correction may not always properly match true temperature and/or precipitation due to over-smoothing effects introduced by insufficient station density for the cell-based interpolation (see, e.g.Hofstra et al., 2010;Boberg et al., 2010).Thus, part of the mismatch between the RCM simulations and Hydrol.Earth Syst.Sci., 15, 2599Sci., 15, -2620Sci., 15, , 2011 www.hydrol-earth-syst-sci.net/15/2599/2011/ observational dataset might actually not be related to model errors, but instead to errors in the observed gridded data set (Lenderink, 2010).In the bias correction, this can potentially corrupt properly simulated temperature or precipitation in areas with low station densities.Also, using an alternative observed gridded data set (e.g.CRU TS1.2 see Mitchell et al., 2004) can lead to different transfer functions.The latter, however, is expected to have limited influence as alternative observed data sets often present similar climatology despite differences in horizontal extent or station density (see, e.g.Rauscher et al., 2010).
Notwithstanding the considerable improvement in reproducing observed discharge statistics after implementing the bias correction, some notable discrepancies between simulated and observed average annual maxima are still present at a number of river stations.Several factors may contribute to this mismatch.Part of the disagreement can be attributed to the remaining error in the meteorological forcing fields after the bias correction.
The disagreement is in part also attributable to not accounting for river regulation in the current LISFLOOD setup.The main reason is a deplorable lack of relevant data at European scale, as well as the large uncertainty regarding future river regulation and land-use changes.As a consequence, results are likely to underestimate the human influence on high flows, and, for future climate, rather reflect the impact of climate change on natural flows.
Errors in the conceptualization and parametrization of the hydrological model further affect the simulated discharges.We do not make an attempt to account for hydrological uncertainty, as it is outside the scope of this study.Several other studies (e.g.Wilby, 2005) showed, however, that this layer of uncertainty is generally much lower than the uncertainty of the climate input to the hydrological model.
The relative contribution of these factors is difficult to quantify because observed events cannot be compared on an individual basis with simulations for the control climate, as it does not reproduce the historical weather.

Conclusions
In this work we assessed the benefits of removing bias in climate forcing data for pan-European hydrological impact assessment, with emphasis on extreme events.Results show that the bias correction method employed performed very well in removing bias in average, maximum and minimum temperatures.Even though daily maximum and minimum temperatures were corrected for indirectly, and the daily transfer functions were obtained on a monthly basis, observed annual and seasonal statistics were fully preserved.
For precipitation, the bias correction method was able to drastically reduce the strong overestimation generally observed across Europe.Only in certain mountain areas, e.g.Alps, Apennines, and Carpathian, a persistent overestimation remained, mainly in winter.This is likely due to the alternation between linear and exponential fitting-type functions during winter months in these areas, which can significantly alter the interpolation of daily values based on the anterior and posterior (monthly) fitting functions.
Validation of simulated discharge statistics at 554 gauging stations showed that LISFLOOD simulations driven by uncorrected forcing data strongly overestimated average and extreme discharge statistics at the majority of stations.Simulations with corrected climate simulations were more consistent with historical discharge records.A strong improvement was observed not only in average discharges but also in the annual maxima and the probability of flood levels derived by extreme value analysis.For the extremes, however, we observed a slight tendency to underestimate observed flow statistics.This can in part be explained by errors in the E-OBS data set used as target in the bias correction.In the latter, gridded extreme precipitation may undervalue true extremes due to under-catch of precipitation, especially in mountain stations, as well as to the over-smoothing effect in the grid-based interpolation in areas with low station density.Despite this, the E-OBS data set is currently the best available at pan-European scale.In small-scale studies such problems may be alleviated by using more dense data sets to reconstruct historical climate.
At a number of stations, the bias-correction could not (fully) remove the (sometimes considerable) discrepancies between observed and simulated discharge statistics.This suggests that they relate to other limitations of the hydrological modelling exercise, such as not accounting for river regulation, or errors in the conceptualization and parametrization of the hydrological model.
In accordance to the results for the control period, future flood magnitudes for the hydrological simulations driven by uncorrected climate data largely exceed those based on corrected climate.Although strictly not verifiable, this suggests that projections of future flood magnitude based on uncorrected climate simulations are likely unreliable.Moreover, results show that the recurrence interval of a 100-yr flood event observed in the control period, which depends on the change in flood magnitude rather than on the absolute value thereof, can substantially vary in both directions from its corrected-based counterpart.
This research has shown the benefits of bias removal in climate simulations for hydrological impact assessment at pan-European scale.Despite the potential limitations in the approach employed, the considerable improvement in the simulation of extreme events and their probability of occurrence in the control period 1961-1990 increases the confidence in the projections of future flood hazard.Our next steps involve the implementation of this technique to the full ensemble of climate simulations available in the FP6 ENSEMBLES project, and the hydrological impact assessment using fully consistent and bias corrected climate simulations for an ensemble of climate models.

Fig. 1 .
Fig. 1.Location of the 554 gauging stations used to assess simulated river discharges.These stations have at least 25 yr of daily data for the control period 1961-1990.

Fig. 2 .
Fig. 2. Ratio between HIRHAM5-ECHAM5 simulations and observations of seasonal average precipitation for the control period 1961-1990.First row, uncorrected seasonal precipitation; second row, bias corrected seasonal precipitation.

Fig. 3 .
Fig. 3. Difference of average annual wet day frequency (precipitation ≥ 1 mmd −1 ) between (a) uncorrected precipitation and (b) bias corrected precipitation with respect to the E-OBS data set for the control period 1961-1990.

Fig. 6 .
Fig. 6.Difference between HIRHAM5-ECHAM5 simulations and observations of seasonal average daily temperature for the period 1961-1990.First row, uncorrected seasonal temperature; second row: bias corrected seasonal temperature.

Fig. 7 .
Fig. 7. Differences for daily maximum (T max ) and minimum (T min ) temperature simulated by HIRHAM5-ECHAM5 and the E-OBS data set for the control period 1961-1990.First row, uncorrected annual temperature; second row: bias corrected annual temperature.

Fig. 9 .
Fig. 9. Difference for LISFLOOD hydrological simulations driven by uncorrected and bias corrected climate data for the control period 1961-1990 for (a) snowpack depth (mm SWE) and (b) days with snow cover.

Fig. 10 .
Fig. 10.Observed versus simulated average discharge (a, b) and average annual maximum discharge (c, d) for each of the 554 stations depicted in Fig. 1 for the control period 1961-1990.Left and right columns show hydrological simulations driven by uncorrected and bias corrected climate data, respectively.

Fig. 11 .
Fig. 11.Return level plots of simulated discharge levels in the 20 selected gauging stations showed in Fig.1, based on a Gumbel distribution fit to the annual maxima for the control period 1961-1990.Full lines represent hydrological simulations driven by uncorrected forcing data whereas dashed lines represent the bias corrected counterpart.Also included in the plates are the 95 % confidence intervals (red dashed-and full-lines) derived using the profile-likelihood method.Black crosses represent return levels obtained from empirical plotting positions of observed annual maximum discharges at the selected stations.

Fig. 12 .
Fig. 12. Ratio of discharges for a return period of 100 yr between hydrologic simulations driven by uncorrected and bias corrected forcing data for the control period 1961-1990.
Fig. 13.(a) Recurrence interval of a 100-yr event observed in the control period (1961-1990) for the long-term future (2070-2099), and (b) difference (yr) for the recurrence interval of a 100-yr event between hydrological simulations driven by bias corrected and uncorrected climate data.
Hofstra et al., 2010;Lenderink, 2010)heir global analysis.Aside from model deficiencies, the pronounced overestimation at high altitudes can in part be explained by the fact that observed precipitation typically underestimates true precipitation, especially in winter, due to poor station network den-sity (see, e.g.Hofstra et al., 2010;Lenderink, 2010)that does not allow to fully capture orographic effects on precipitation.Hence, the bias-corrected precipitation in these areas may more closely correspond to true precipitation amounts than the comparison with observed precipitation suggests here.Hydrol.Earth Syst.Sci.,15,2011www.hydrol-earth-syst-sci.net/15/2599/2011/
simulated diurnal temperature range is too narrow in most regions of Europe for the uncorrected temperatures simulated by the HIRHAM5-ECHAM5 model.After performing the bias correction, T max and T min show maximum differences not exceeding ±0.1 • C across Europe.A warm anomaly of maximum 0.5 • C is observed in the Scandinavian mountains, Balkan, and central Europe.