Climate model validation and selection for hydrological applications in representative Mediterranean catchments

Introduction Conclusions References


Introduction
Climate Models (CMs) are numerical tools to simulate the past, present and future of Earth's climate.Hence, evaluating the accuracy of CMs is a crucial scientific and applicative objective, not only for the role of models in reconstructing the past and projecting the future state of the planet, but also because of their increasing relevance in the process of policymaking.For the latter purpose, it is necessary to summarize and evaluate the results originating from an increasing number of Global Climate Models to build multi-model ensembles and study their statistics (mainly ensemble mean and spread).Note, however, that a multi-model ensemble is not statistically homogeneous (i.e.formed by statistically equivalent realizations of a process) and, therefore, using its mean to approximate the truth and its standard deviation to describe the uncertainty of the outputs, could be highly misleading (Lucarini, 2008;Annan et al., 2011).
In general GCMs are suited to provide large-scale climate predictions, not directly relevant to hydrological evaluations at a river basin level, which can be of interest for local policymaking.In order to refine GCM outputs, the most common approach is to use statistical and dynamical downscaling tools.Regional Climate Models (RCMs) are highresolution dynamical models that take advantage of detailed representations of natural processes at high spatial resolutions capable of resolving complex topographies and land-sea contrast.However, they are run on a limited domain and thus require boundary conditions from a driving GCM (e.g., Giorgi and Mearns, 1999;Wang et al., 2004;Rummukainen, 2010).Thus, (i) the GCM-nested nature of regional climate modeling implies that RCM climate reconstructions and projections can critically depend on the driving GCM (e.g., Christensen et al., 1997;Takle et al., 1999;Lucarini et al., 2007); and (ii) precipitation and other atmospheric quantities, like temperature, although useful in improving climate modeling by highlighting and explaining differences in GCM parameterization and representation of climate features, they can hardly be considered climate state variables (Lucarini, 2008).The definition of reasonable climate scenarios has been an issue of major research efforts by the international scientific community (e.g., Lucarini, 2002;Fowler et al., 2007;Räisänen, 2007;Moss et al., 2010).
A key role in CMs is played by the atmospheric part of the hydrological cycle, not only because of its strong impact on the energy of atmospheric perturbations, but also because of the contribution of hydrometeors to human activities and the evolution of the environment.These contributions range from space-time availability of water resources and land policy, to extreme events like mudslides, avalanches, flash floods and droughts (e.g., Becker and Grünewald, 2003;Roe, 2005;Tsanis et al., 2011;Koutroulis et al., 2013;Muerth et al., 2013).The significant impact of the hydrological cycle to Figures human communities and the environment is also reflected by the number of studies focusing on the use of CM results for hydrological evaluations and assessments (e.g., Sulis et al., 2011Sulis et al., , 2012;;van Pelt et al., 2012;Guyennon et al., 2013;Cane et al., 2013;Velázquez et al., 2013).The relevance of this subject has also led many investigations towards ranking and validating CM performances based on hydrological measures.
Intercomparison studies have shown that no particular model is best for all variables and/or regions (e.g., Lambert and Boer, 2001;Gleckler et al., 2008).Most intercomparison and validation studies focus on evaluating hydrologically relevant parameters like temperature, precipitation, and surface pressure (e.g., Perkins et al., 2007;Giorgi and Mearns, 2002).Murphy et al. (2004) evaluated the skill of a 53-model ensemble in simulating 32 variables (from precipitation to cloud cover and upper-level pressures) to determine a Climate Prediction Index (CPI) that could provide an overall model weighting.Based on Murphy et al. (2004) CPI, Wilby and Harris (2006) evaluated climate models used in hydrological applications to create an impact-relevant CPI.The latter was based on the average bias of effective summer rainfall, which was found to be the most important predictor of annual low flows in the basin studied.Perkins et al. (2007) ranked 14 climate models based on their skill to reproduce simultaneously the probability density functions of observed precipitation, and maximum and minimum temperatures over 12 regions in Australia.Using results from the Coupled Model Intercomparison Project (CMIP3), Gleckler et al. (2008) ranked climate models by averaging the relative errors over 26 variables (precipitation, zonal and meridional winds at the surface and different pressure levels, 2 m temperature and humidity, top-of-the-atmosphere radiation fields, total cloud cover etc.).They also showed that defining a single index of model performance can be misleading, since it shades a more complex picture of the relative merits of different models.Johnson and Sharma (2009)  Nevertheless, it is worth mentioning that CM results are currently tested only against observational (past) data, and the choice of the observables of interest is crucial for determining robust metrics able to audit effectively the models (Lucarini, 2008;Wilby, 2010).Unfortunately, data are often of nonuniform quality and quantity, due to e.g., the non stationarity and non homogeneity caused by changes in the network density, instrumentation, temporal sampling, and data collection strategies over time.
In the framework of ENSEMBLES project, there has been an effort to produce values for some of the hydrologically relevant variables (i.e.precipitation, temperature and sea level pressure), on a regular data grid, based on objective interpolation of the observational network.This initiative has continued as a part of the URO4M (EU-FP7) project, which made the observed data fields (E-OBS) available on different grids for the 1950-2011 time-frame (Haylock et al., 2008;van den Besselaar et al., 2011).Recently, the E-OBS fields were newly released on a rotated grid consistent with that used by ENSEMBLES RCMs over western Europe.Although limited (due to the non-uniform spatial density of the observations used to produce the gridded data), the E-OBS fields constitute a reference for evaluating the performance of different CMs in the European and Mediterranean areas; from a technical point of view, they are built for direct comparisons with ENSEMBLES RCM outputs.Solomon et al., 2007).The most complete dataset is given for the A1B scenario, which is considered as the most realistic.
In ENSEMBLES high resolution runs, each RCM is nested into a larger-scale field.The latter may originate from different GCMs, leading to different GCM-RCM combinations.For simplicity, we defined an acronym for each GCM and RCM considered in this study, listed in Tables 2 and 3, respectively.In all figures, we use symbols to display results from different RCMs, and colours to indicate runs forced by different GCMs. Figure 2 summarizes the combinations of symbols and colours used to display results from the 14 GCM-RCM combinations considered in this study.
Following the PCMDI/CMIP3 initiative, the ENSEMBLES project stimulated and guided several climatic centres towards standardization of model grids and outputs and, thus, promoted synergies across different research areas and interdisciplinary efforts.Nevertheless, when pre-processing the ENSEMBLES RCM outputs (i.e.before the validation phase), we experienced some minor discrepancies, requiring ad hoc treatments.Although all issues were manageable, they are worth mentioning, especially for scientists who are (or foresee) using these outputs to run hydrological models and perform impact studies: Figures -For most GCM-RCM combinations, the available time-frames cover the period from 1 January 1951 to 31 December 2100.However, some model outputs exhibit missing data on the last days of 2100, whereas for other models the missing values start at the end of 2099.In addition, two model simulations (i.e.BCM-HIR, HCS-HIR) stop in year 2050, while the BCM-RCA model run starts in 1961.
-Models HCH-RCA, HCS-CLM, HCS-HRM, HCL-HRM, HCH-HRM, HCS-HIR, and HCL-RCA use a simplified calendar with 12 months of 30 days each (i.e.360 days per year), whereas the remainder use a standard Gregorian calendar with 365 days per regular year, and 366 days in leap years.Additionally, among these last models, some do not account for the leap year exception in 2100.We also detected some missing or incomplete data.More precisely, for some models the values in the last days of the simulation period are simply set to zero, rather than to an unambiguous default flag for missing values.While this is not an issue when working with temperature fields expressed in K, it may create problems when considering precipitation fields (expressed in Kg m −2 s −1 ), since it is not apparent how to distinguish between missing data and zero precipitation (this is the case, e.g., for the last 390 days of data in the HCL-RCA simulation).
-For some models (see below), dry conditions are indicated by a very small positive or negative constant value, P min , whereas the sea level elevation is set to a constant value, z sea , different from zero.More precisely, z sea ≈ 0.046 m for HCH-RCA; model parameterizations, routines used to create the netcdf files in the ENSEM-BLES archive, etc.), one should properly treat them before using CM outputs to perform climate change impact studies.For example, unless properly identified, a minimum precipitation threshold may bias rainfall statistics (e.g. the annual fraction of dry periods) and, from a practical point of view, influence hydrological and meteorological analysis (e.g.drought analysis).
For each considered catchment, the selected set of climate model data was validated using the CRU E-OBS dataset from the ENSEMBLES EU-FP6 project, made available by the ECA&D project (http://www.ecad.eu)and hosted by the Climate Research Unit (CRU) of the Hadley Centre.E-OBS datafiles are gridded observational datasets of daily precipitation and temperature, developed on the basis of a European network of high-quality historical measurements.In particular, we used version 5.0 of the E-OBS dataset that covers the period from 1 January 1950 to 30 June 2011, and is available at four different grid resolutions: 0.25 and 0.5 degree regular latitude-longitude grids, and 0.22 and 0.44 degree rotated pole grid.In our analysis, we use the rotated grid at 0.22 degree resolution, which matches exactly the grid of ENSEMBLES RCMs.Having an almost perfect point-to-point correspondence between ENSEMBLES RCM results and E-OBS reference data, greatly simplifies intercomparison, validation and calibration activities, since no interpolation or re-gridding of the data is needed.
There are several additional reasons why the E-OBS dataset is considered to be the best available source for temperature and precipitation estimates in the considered catchments to pursue model validation: be attributed to different orographic representations and, also, to remove biases introduced by elevation differences in the E-OBS and corresponding RCM grid points.For example, to account for different surface elevation models used by ENSEMBLES RCMs and E-OBS, and before calculating CM performance metrics, we used the corresponding model orographies and a monthly constant lapse rate to translate surface temperatures at different elevations to those observed at sea level, as discussed in Sect. 4 below.For a more detailed description of the E-OBS dataset, the reader is referred to Haylock et al. (2008).

Metrics for validation
Several studies (see Introduction) have focused on metrics to assess the accuracy of climate model results.Note, however, that performance metrics should depend on the specific use of climate data.In this study, we focus on providing reliable climatic forcing for hydrological applications at a river basin level.For this purpose, one needs to downscale the CM fields at resolutions suitable to run hydrological models and assess climate change impacts.In what follows, we summarize the main aspects of the downscaling procedure in order to make more clear our validation setting.A detailed description of the downscaling procedures, together with obtained results, will form the subjects of an upcoming communication.
Precipitation downscaling was performed using the multifractal approach described in Deidda (2000) and Badas et al. (2006), starting from areal averages of daily precipitation.The latter were obtained by averaging rainfall values over a 4 × 4 stencil of ENSEMBLES grid-points centred in each catchment, covering an approximate area of 100 km × 100 km.This particular choice allowed to embed all catchments inside equally sized spatial domains.The size of the latter is within the range of space-time scale invariance of rainfall indicated by several studies (Schertzer and Lovejoy, 1987;Tessier et al., 1993;Perica and Foufoula-Georgiou, 1996;Venugopal et al., 1999;Deidda, 2000;Kundu and Bell, 2003;Gebremichael and Krajewski, 2004;Deidda et al., Introduction Conclusions References Tables Figures

Back Close
Full  2004, 2006;Gebremichael et al., 2006;Badas et al., 2006;Veneziano et al., 2006;Veneziano andLangousis, 2005, 2010) and, thus, it can be used to define the integral volume of rainfall to be downscaled to higher resolutions of a few square kilometers.That said, the validation metrics of ENSEMBLES RCM simulations versus E-OBS data are calculated based on areal rainfall averages over a regular 4 × 4 grid-point stencil.
Temperature fields from ENSEMBLES RCMs were downscaled based on the procedure described in Liston and Elder (2006), which combines a spatial interpolation scheme (Barnes, 1964(Barnes, , 1973) ) with orographic corrections.Although, in the case of temperature, downscaling starts from the ENSEMBLES resolution (24 km × 24 km), we decided to adopt the same validation setting as that for precipitation (i.e.calculate areal averages of daily temperatures over a regular 4 × 4 grid-point stencil).Since temperature fields are particularly sensitive to elevation, in order to make homogeneous comparisons, we first reduced surface temperatures at different elevations to sea level, and then calculated areal averages of daily temperatures over a regular 4×4 grid-point stencil centered in each catchment.For the former, we used a standard monthly lapse rate for the Northern Hemisphere (Kunkel, 1989) and the corresponding ENSEMBLES model and E-OBS orographies.This appears to be a reasonable choice since, after reduction to sea level, the temperature field becomes quite smooth.
In summary, all metrics defined below are used to validate ENSEMBLES RCM simulations vs E-OBS data, using spatial averages of temperature and precipitation over a 4×4 grid-point stencil.This choice allows to better study climatic forcing at a common catchment scale.An additional advantage is that spatial averaging smooths and filters out some local biases present in E-OBS fields.These originate from the low density of observations, which cannot efficiently capture orographic effects on precipitation and temperature and, also, cannot account for the high spatial variability of daily rainfall.
It is worth mentioning that E-OBS dataset is based on observations obtained from a network of land-based stations.Hence, all E-OBS data over sea were masked and set to default missing values.Nevertheless, missing values were also found at some grid points over land, where the network density is low.An analysis of the data density Introduction

Conclusions References
Tables Figures

Back Close
Full over time showed that there are E-OBS grid points over land where the availability of interpolated data depends on the period studied.This is due to changes in the observation network between 1950 and 2011.To account for this issue, when calculating areal averages of daily values over the 4 × 4 grid-point stencil, we maintained those points with less than 6 yr of missing data (i.e. 10 % of the 60 yr validation period from 1951 to 2010).To that extent, all validation metrics below were calculated using those grid points available for both E-OBS data and ENSEMBLES RCM simulations.Following the discussion above, let us define X m (s, y) to be the monthly spatial average of variable X (i.e.X = T , P for monthly temperatures and rainfall intensities, respectively) over an area of approximately 100 × 100 km (i.e. a 4 × 4 stencil of ENSEM- 1951, 1952, • • •, 2100).According to this notation, let us also denote by m = 0 the index for E-OBS.The s-th monthly mean and standard deviation of X m (s, y) over a N y yr climatological period starting in year y 0 are given by:  1) and ( 2), such error measures for the monthly climatological means and standard deviations of X m , over the 60 yr period 1951-2010 are defined as: In addition, errors in the marginal distribution of X m can be quantified by averaging the absolute differences between the quantiles x m (α i ) of the observed (m = 0) and simulated The above defined error metrics provide information on the reliability of a single variable, whereas for subsequent hydrological modeling we need to identify those models that perform best for a specific set of variables.As a minimum requirement for hydrological modeling, we seek for CMs that provide reliable estimates of precipitation and temperature: precipitation is the main source of water in the catchment, whereas temperature controls evaporation and evapotranspiration processes.Thus, in order to compare the performances of different models m (m = 1, . .., 14) in reproducing, simultaneously, the statistics of the observed precipitation and temperature fields, we introduce the following dimensionless measures:

Conclusions References
Tables Figures

Back Close
Full where M = 14 is the number of models.Equations ( 6) and ( 7) can be used to assess the relative performance of different models in reproducing the monthly mean and standard deviation of the observed temperature and precipitation series.

Seasonal distribution of precipitation and temperature
When assessing models' skills, it is important to examine their ability to reproduce the annual averages and seasonal variability of precipitation and surface temperature.These two variables specify the climate type according to the Köppen-Geiger climate classification system.Thus, a first skill measure is the ability of the models to reproduce the local climatic characteristics of specific basins.
Figures 3 and 4 show the mean monthly precipitation and temperature for each catchment, calculated using Eq. ( 1), for the N y = 60 yr verification period .
For the E-OBS observational reference, we use a dotted black line, whereas ENSEM-BLES RCM results are plotted using the reference introduced in Fig. 2 In comparing model results on precipitation with E-OBS observational data, Fig. 3 clearly shows significant discrepancies between ENSEMBLES RCMs and E-OBS climatologies, both in terms of magnitude and, in some cases, the observed seasonal cycle.In almost all catchments, several RCMs produce higher annual precipitation than that observed.More in detail: for Thau, Riu Mannu and Noce, 11 out of 14 models simulate higher precipitation amounts for at least ten months of the year.The same problem is detected for 9 RCMs in Kokaeli and 8 RCMs in Chiba catchments.An additional observation is that, in all catchments except Gaza, RCMs exhibit a larger relative error for high precipitation amounts in summer months.This positive bias is amplified in catchments with Mediterranean climate, where RCM precipitation can be up to ten times higher than the observed one.In Gaza, on the other hand, models are typically biased towards lower values of precipitation, with the exception of a handful of models, which are instead biased towards larger precipitation amounts.Note, also, that one model, BCM-RCA, produces unrealistic results.
A catchment where models' skills are problematic is Riu Mannu: while E-OBS indicates almost no precipitation during the summer months from June-August, some models (HCL-RCA, BCM-HIR, BCM-RCA, HCH-HRM) display relatively high amounts of summer precipitation.Although the skill of RCMs in reproducing seasonal precipitation is generally weaker during summer, especially for the drier southern and eastern European regions (Frei et al., 2006;Maraun et al., 2010), this finding is still surprising since several studies indicate that RCMs tend to underestimate precipitation during summer (e.g., Jacob et al., 2007).directly following it.This pattern indicates a humid subtropical climate, typical of most lowlands in Adriatic Italy, south of the Noce region.A possible reason is that, at 24 km resolution, RCMs are limited in resolving small-scale features necessary to correctly reproduce orographic precipitation.Moreover, the observed differences between CM results and E-OBS observational data might stem from the irregular and sparse network of E-OBS stations, especially in view of the fact that most stations are located at low altitudes.Unsurprisingly, RCMs tend to perform better in modeling surface temperature.Figure 4 shows that all RCMs reproduce a reliable yearly cycle of monthly climatological temperatures (previously reduced to sea level, see Sect. 4).Nevertheless, when compared to E-OBS, several RCMs exhibit significant biases, often larger than 5 K, while the inter-model spread can be as large as 10 K (especially during summer).More in detail: for Noce, HCL-RCA exhibits a negative bias larger than 5 K during winter; for Kokaeli, ECH-HIR exhibits a positive bias of about 4 K during winter; whereas for Chiba, HCL-HRM overestimates temperature during summer by 5 K.However, given the good representation of the seasonal temperature cycle, these discrepancies can be properly reduced using simple bias-correction techniques.

Selection of best performing ENSEMBLES models
Error metrics introduced in Eqs. ( 3) and (4) were calculated for each ENSEMBLES RCM for the 60 yr verification period from 1951 to 2011, and then plotted for each catchment in a separate scatterplot (Fig. 5).One sees that, in all catchments except Noce, most models (i.e.11 out of 14 models in Thau basin; 13 models in Riu Mannu and Kocaeli catchments, and all 14 models in Gaza and Chiba) exhibit errors lower than 1.5 mm d −1 in their monthly means, E µ P , and lower than 1 mm d −1 in their monthly standard deviations, E σ P .For Noce basin, the corresponding errors are significantly larger.
Concerning the temperature error metrics E µ T and E σ T in Fig. 6, one notes that the error in the monthly means is much larger than the one in the standard deviations: 9122 Figures

Back Close
Full for almost all models and most catchments (i.e.all models in Thau, Riu Mannu and Chiba basins, 13 out of 14 models in Kokaeli catchment, and 12 out of 14 models in Noce and Gaza), E µ T and E σ T are smaller than 3 K and 0.5 K, respectively.Again, one concludes that Noce is the catchment where ENSEMBLES RCMs tend to perform worse even for temperature: only 4 out of 14 models exhibit errors smaller than 1.5 K in E µ T .Note that in Gaza (the second less performing catchment), 8 out of 14 RCMs exhibit errors smaller than 1.5 K in E µ T .Concerning Noce basin, it is worth noting that one cannot easily conclude to what extent the calculated temperature and precipitation errors originate from limitations of the observation network at high elevations in the Alps.
Figures 5 and 6 also show that model performances can vary significantly from one catchment to another.For instance, HCS-CLM is one of the two worst performers for precipitation in Thau, but it performs best in Noce (see Fig. 5).For all models, precipitation errors in Gaza are quite small (see Fig. 5), while temperature errors are significant (see Fig. 6).This means that, as expected, among all considered RCMs it is not possible to identify a subset of models that performs best for both variables in all catchments.An option is to base model selection on dimensionless metrics capable of weighting the errors in the variables of interest, even when the choice is driven by additional constraints as discussed in Sect. 2. With this aim, we introduced the dimensionless error metrics in Eqs. ( 6) and ( 7) to account for both precipitation and temperature RCM performances in reproducing E-OBS observational data.Results are presented in Fig. 7, where one sees that the selection of HCH-RCA, ECH-RCA, ECH-RMO, ECH-REM models (marked with an additional black circle) are the best choices for Thau, Riumannu, Kokaeli and Chiba catchments, under the additional constraint of maintaining two different RCMs nested in the same GCM, and two different GCMs forcing the same RCM.For Noce and Gaza, where other choices would have been slightly better, the selected 4 models still display good performances.
In order to check the general behaviour of the probability distributions of the simulated precipitation and temperature fields against E-OBS, we used Eq. ( 5) to compute Introduction

Conclusions References
Tables Figures

Back Close
Full the mean absolute errors in the quantiles at 100 uniformly spaced probability levels.
The results are presented in the scatterplots of Fig. 8.One sees that the 4 selected models display reasonable performances in all considered catchments, thus confirming the selection.To assist the reader in identifying the 4 selected models in all figures, we have drawn thicker the corresponding lines in Figs. 3 and 4, and added black circles in all scatterplots (Figs.5-8).
Last but not least, it is interesting to analyze and intercompare the variability of the mean annual precipitation and temperature over the five 30 yr climatological periods between 1951 and 2100.In Fig. 9, where results for precipitation are presented, one clearly observes a very large variability in the simulations of the 14 models, with some models predicting mean annual precipitation three times higher than others.One also gets an idea of how drastically the behaviour of a single model can change in different catchments.These findings support the need for extensive analyses, as that presented here, before proceeding with hydrological modeling in specific catchments.For example, HEC-HIR model gives the largest annual precipitation in Thau and Noce, greatly overestimating E-OBS, while the same model in Riumannu and Gaza gives a very small annual precipitation, greatly underestimating E-OBS.The five 30 yr climatologies for the 4 selected models are shown in Fig. 9 inside vertical rectangles.Again, one visually observes that the selection of the 4 models is a reasonable compromise for all catchments.
An additional observation one makes is that, for each catchment, the variability in the 30 yr climatological periods for a single model is much smaller than the variability among different models.Figure 10 shows similar results for temperatures, where the variability among different models, although still larger than the variability among the 30 yr climatological periods for a single model, is of comparable magnitude.Introduction

Conclusions References
Tables Figures

Back Close
Full

Conclusions
Validation of climate models is typically performed using observational data, by studying the skills of different models in reproducing climate features in the study area.One major task is the choice of such observations.This study focuses on providing reliable climatic forcing for hydrological applications at a river basin level and, therefore, precipitation and surface temperature were chosen as verification variables since: (i) they are used to specify the climate of a region in several climate classification systems, like the Köppen-Geiger one, (ii) they represent a minimum requirement for hydrological modeling, being respectively the main source of water in the catchments and the main control parameter for evaporation and evapotranspiration, and (iii) precipitation and temperature observation networks are the most dense and readily available ones (in contrast to, say, radiation or evapotranspiration measurements).
Another basic problem of model validation is that of selecting appropriate metrics to weight the relative influence of different variables.Since we needed to compare model performances in simultaneously reproducing the statistics of temperature and precipitation fields, we introduced dimensionless normalized metrics.
RCMs have been used in two major scientific projects, PRUDENCE and ENSEM-BLES, to produce future climate projections for EU.The corresponding model simulations have been studied and validated in several recent papers (i.e., Christensen and Christensen, 2007, for PRUDENCE;and Lorenz and Jacob, 2010, for ENSEMBLES).
While most validation studies examined model performances by focusing on mediumto-large scale areas (e.g.Christensen et al., 2010, studied ENSEMBLES RCM results by dividing Europe in several large areas), RCMs were found to only partially reproduce climate patterns in Europe (Jacob et al., 2007;Christensen et al., 2010).
Our study suggests that, when interest is at relatively small spatial scales associated with hydrological catchments, as it is the case of CLIMB project, validation of CM results should be conducted at a single-basin level, rather than at macro-regional scales.In this case, it is necessary to check models' skills in reproducing prescribed observations Introduction

Conclusions References
Tables Figures

Back Close
Full at specific river basins, since averaging over quite large areas might bias the assessment.For example, for Riu Mannu, Thau and Chiba catchments, model performances can vary significantly (see Sect. 5), even though these catchments are included in the same large-scale area in Christensen and Christensen (2007) study.
In this work we validated RCM results at scales suitable to run hydrological models and conduct climate impact studies for representative Mediterranean catchments.We found that the performance of a single RCM in reproducing observational data can change significantly in different river basins.This finding highlights the need for extensive analyses of climate model outputs before proceeding with hydrological modeling in specific catchments.
Another important finding is that, at least for temperature and precipitation studied at a river basin level, the variability in the 30 yr climatological periods for a single model is much smaller than the variability among different models, as Figs. 9 and 10 clearly show.We also stressed that the validation process in complex terrains, as is the case of the Alps (Noce catchment), may be significantly affected by weaknesses of model grids and the representativeness of the observational network.Actually, it can be problematic to interpret the differences between model outputs and observations, since they may originate from a combination of issues: (a) the model grid is too coarse (e.g., the Noce case, located in the Alps, where the maximum elevation considered by the models is approximately 2500 m, quite lower than real orography); and (b) the observational network is too sparse to provide a proper basis for models' validation.
Projects like ENSEMBLES and PRUDENCE stimulated and guided several climatic centres towards standardization of procedures, model grids and outputs and, thus, promoted synergies across different research areas and interdisciplinary efforts.Nevertheless, our study indicates that errors and inconsistencies are still present, suggesting basin-specific pre-processing of CM outputs before proceeding with hydrological modeling and climate impact assessments.
To what concerns hydrological applications, we note that while our validation-based model selection provides a reasonable indication of the four best performing models Introduction

Conclusions References
Tables Figures

Back Close
Full in the considered catchments, some model deficiencies still need to be addressed.Specifically, by applying bias-and quantile-correction techniques, we were able to reduce the differences between observed and modelled probability distributions.In addition, by accounting for the effects of orography on precipitation, temperature and other variables, and making proper use of downscaling tools, we reproduced local climate attributes and, to some extent, the observed small scale variability.These results, as well as hydrological modeling projections in the considered catchments (currently addressed by several working groups of the CLIMB project) will form the subjects of forthcoming communications.Introduction

Conclusions References
Tables Figures

Back Close
Full  Full  Full  Full    2 and 3.

(
GCMs), providing climate projections over the whole planet.A common practice is Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | derived the VCS (Variable Convergence Score) skill score to compare the relative performance of a total of 21 model runs from nine GCMs and two different emission scenarios in Australia, to their ensemble mean.They applied the VCS score to eight different variables and found that pressure, temperature, and humidity received the highest scores.Discussion Paper | Discussion Paper | Discussion Paper | Following these recent initiatives, the European Union has funded the Climate Induced Changes on the Hydrology of Mediterranean Basins project (CLIMB; http: //www.climb-fp7.eu), with the aim of producing a future-scenario assessment of climate change for significant hydrological basins of the Mediterranean (Ludwig et al., 2010), including: the Noce and Riumannu river basins in Italy, the Thau coastal lagoon Discussion Paper | Discussion Paper | Discussion Paper |in France, Izmit bay in the Kokaeli region of Turkey, the Chiba river basin in Tunisia, and the Gaza aquifer in Palestine.The Mediterranean lands constitute an especially interesting area for hydrological investigation by climate scientists, given the high risk predicted by climate scenarios, and the pronounced susceptibility to droughts, extreme flooding, salinization of coastal aquifers and desertification, as a consequence of the expected reduction of yearly precipitation and increase of the mean annual temperature.The general goal of CLIMB project is to reduce the uncertainty of the process of assessing climate change impacts in the considered catchments.A major source of uncertainty is certainly related to the wide scattering of climate model outputs.That said, our investigation focuses on reducing the uncertainty of climate model forcing, by intercomparing the performances of different RCM results from the ENSEMBLES project, and selecting a common subset of 4 models to drive hydrological model runs in the catchments.More precisely, this paper uses the newly released E-OBS fields, to: a) evaluate the performance of ENSEMBLES RCMs in dealing with hydrologically relevant parameters in six Mediterranean catchments, and b) provide validated data to be used for hydrological modeling in successive steps of the CLIMB project.Section 2 introduces CLIMB project in the context of the hydrological basins of interest, and Sect. 3 provides detailed information on the RCM datasets used.Section 4 describes the methods applied to audit ENSEMBLES past climate simulations, and Sect. 5 presents the obtained results, setting them in the context of previous research.Finally, Sect.6 summarizes the main conclusions of this study.Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

z
sea ≈ 0.300 m for BCM-HIR and HCS-HIR; z sea ≈ 0.732 m and P min ≈ −9.0 × 10 −8 Kg m −2 s −1 for ECH-RMO; z sea ≈ −0.002 m and P min ≈ 1.7×10 −18 Kg m −2 s −1 for ARP-HIR and ECH-HIR; z sea ≈ −0.321 m and P min ≈ −1.5 × 10 −11 Kg m −2 s −1 for BCM-RCA, ECH-RCA and HCL-RCA.Also, for the last three models, missing temperature data at the end of the simulation period are indicated by a minimum temperature of 0 K, whereas HCL-HRM simulations exhibit some temperature values on the order of 10 25 K.While the origin of the aforementioned discrepancies in the data cannot be easily identified (e.g.numerical errors, spurious effects of Discussion Paper | Discussion Paper | Discussion Paper | (1) E-OBS data have been obtained through kriging interpolation, which belongs to the class of best linear unbiased estimators (BLUE); (2) the original data (i.e.prior to interpolation) have been properly corrected to minimize biases introduced by local effects and orography; (3) the 95 % confidence intervals of the obtained estimates are also distributed, shedding light on the accuracy of the calculated areal averages, and (4) the surface elevation fields used for E-OBS interpolations are available as well.The latter can be used to assess which counterpart of the observed differences between ENSEMBLES RCM and E-OBS climatologies can 9115 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | time-window for validation is set to 60 yr from 1951 to 2010, entirely covered by E-OBS data.Validation of climate model outputs over that period requires comparing specific statistics of X m (m = 1, • • •, 14) to those of E-OBS (i.e.X m=0 ).To do so we introduce average error measures for the absolute differences between statistics of the observed (m = 0) and modelled (m = 1, • • •, 14) time series.Setting y 0 = 1951 and Introduction Discussion Paper | Discussion Paper | Discussion Paper | N y = 60 in Eqs. ( Discussion Paper | Discussion Paper | Discussion Paper | . Based on the E-OBS seasonal variation of precipitation and temperature, Thau, Riu Mannu, Kokaeli and Chiba are characterized by a Mediterranean climate, with precipitation maxima in winter, and minima in summer.Gaza, although exhibiting a seasonal cycle of precipitation similar to the previous basins, is classified as semi-arid due to its low annual precipitation and high mean annual temperature.Noce, instead, has a humid continental climate, receiving regular precipitation during the year, with a single maximum during summer.In essence, although based on a sparse network, E-OBS data repro-Discussion Paper | Discussion Paper | Discussion Paper | duce quite reasonably the expected seasonal climatological patterns in the considered catchments.

Figure 3
also shows that, despite the aforementioned biases, almost all models predict a reliable seasonal precipitation cycle for Thau, Riu Mannu, Kokaeli, Chiba and Gaza catchments.On the contrary, in Noce catchment, RCMs fail to capture the seasonal cycle of precipitation: instead of a single maximum during summer, most models display a bimodal behaviour, with one maximum before the summer season and one Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | van den Besselaar, E., Haylock, M., Klein-Tank, A., and van der Schrier, G.: A European daily high-resolution observational gridded data set of sea level pressure, J. Geophys.Res., 116, D11110, doi:10.1029/2010JD015468,2011.9110 van Pelt, S. C., Beersma, J. J., Buishand, T. A., van den Hurk, B. J. J. M., and Kabat, P.: Future changes in extreme precipitation in the Rhine basin based on global and regional Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Fig. 2 .
Fig. 2. Combinations of GlobalClimate Models (GCMs) and Regional Climate Models (RCMs) considered in this study.In all figures, we use the same color (symbol) to refer to the same GCM (RCM).Model acronyms are introduced in Tables2 and 3.

Figure 5 :
Figure 5: Scatterplot of errors in the mean and standard deviation of monthly precipitation (mm/d) over the 60-yr verification period (1951-2100), computed using Eqs.(3)-(4) for each of the 14 ENSEMBLES models with respect to E-OBS observational reference.Each subplot displays areal averages over a 4×4 grid-point stencil centered in each catchment.

Fig. 5 .Figure 6 :
Fig. 5. Scatterplot of errors in the mean and standard deviation of monthly precipitation (mm d −1 ) over the 60 yr verification period (1951-2100), computed using Eqs.(3) and (4) for each of the 14 ENSEMBLES models with respect to E-OBS observational reference.Each subplot displays areal averages over a 4 × 4 grid-point stencil centered in each catchment.

Fig. 7 .
Fig.7.Same as Fig.5, but for the dimensionless error metrics accounting for both precipitation and temperature, as defined in Eqs.(6) and (7).

Figure 8 :
Figure8: Scatterplots of mean absolute errors in the quantiles of daily precipitation and temperature distributions, computed using Eq.(8) for each of the 14 ENSEMBLES models with respect to E-OBS observational reference.Quantiles are calculated at 100 uniformly spaced probability levels.Each subplot displays areal averages over a 4×4 grid-point stencil centered in each catchment.

Fig. 8 .
Fig.8.Scatterplots of mean absolute errors in the quantiles of daily precipitation and temperature distributions, computed using Eq.(8) for each of the 14 ENSEMBLES models with respect to E-OBS observational reference.Quantiles are calculated at 100 uniformly spaced probability levels.Each subplot displays areal averages over a 4 × 4 grid-point stencil centered in each catchment.

Figure 9 :
Figure 9: Variability of the mean annual precipitation (mm/yr) over the five 30-yr climatological periods between 1951 and 2100.Models HCS-HIR and BCM-HIR stop in year 2050.The reference values of E-OBS climatologies in the two 30-yr periods between 1951-2010 are indicated with empty circles and horizontal lines.Each subplot displays results based on areal averages over a 4×4 grid-point stencil centered in each catchment.

Fig. 9 .Figure 10 :
Fig. 9. Variability of the mean annual precipitation (mm yr −1 ) over the five 30 yr climatological periods between 1951 and 2100.Models HCS-HIR and BCM-HIR stop in year 2050.The reference values of E-OBS climatologies in the two 30 yr periods between 1951-2010 are indicated with empty circles and horizontal lines.Each subplot displays results based on areal averages over a 4 × 4 grid-point stencil centered in each catchment.

3 Climate models and reference dataset
ble 1 summarizes their main characteristics.The areas of the catchments range from 250 to 3500 km 2 .Since the horizontal resolution of all ENSEMBLES RCM outputs is approximately 24 km, all catchments can be embedded within a 4 × 4 stencil of model grid-points.From Table1one sees that the catchments differ in terms of their over-

Table 1 .
Main topographic and 60 yr (1951-2100)climatological characteristics of the considered catchments: area (S), mean elevation (z), mean annual precipitation (P ) and sea level temperature (T ), and minimum and maximum values of the monthly averages of precipitation and sea level temperatures.

Table 2 .
Acronyms of the Global Climate Models (GCMs) used as drivers of ENSEMBLES Regional Climate Models (RCMs) considered in this study.
Acronym Climatological center and model HCH Hadley Centre for Climate Prediction, Met Office, UK HadCM3 Model (high sensitivity) HCS Hadley Centre for Climate Prediction, Met Office, UK HadCM3 Model (standard sensitivity) HCL Hadley Centre for Climate Prediction, Met Office, UK HadCM3 Model (low sensitivity) ARP Meteo-France, Centre National de Recherches Meteorologiques, France

Table 3 .
Acronyms of the Regional Climate Models (RCMs) considered in this study.