Interactive comment on “ Improving soil moisture and runoff simulations over Europe using a high-resolution data-assimilation modeling framework ” by Bibi S

General comments The manuscript aims to demonstrate that a high resolution dataassimilation modelling framework allows improving soil moisture and runoff simulations at a continental scale. Thus, it addresses a question within the scopes of the journal. Aims of the work are overall clearly outlined and supported by references. I suggest to better justify the choices of models and datasets and temporal domain (2000-2006). Data-assimilation results are compared to open-loop simulations to quantitatively assess this improvement basing on root mean square error and mean bias error estimates with respect to CCI SM data. Overall results are well supported by figures and graphs. However, I would suggest the Authors to give a more detailed explanation for differences in overestimate and underestimate between the regions and between the


Introduction
Soil moisture (SM) constitutes a key variable in major processes of the hydrologic cycle related to infiltration and runoff generation, root water uptake and plant transpiration, and evaporation (Vereecken et al., 2016).Thus, soil moisture strongly influences the partitioning of incoming radiative energy into latent and sensible heat and significantly affects the land surface energy and water budgets.Consequently, accurate estimates of SM at large scale are needed for: (1) hydrologic predictions 30 (such as soil moisture and discharge) (Western et al., 2002) and water resource management and planning (e.g.groundwater Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-24Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 23 March 2018 c Author(s) 2018.CC BY 4.0 License.recharge, mitigation of droughts) (Dobriyal et al., 2012;Andreasen et al., 2013;Sridhar et al., 2008), (2) identifying regions susceptible to extreme events such as droughts and floods (Seneviratne et al., 2010), (3) numerical weather predictions (Drusch, 2007), and (4) irrigation management and agriculture practices (Shock et al., 1998;Bolten et al., 2010).
At continental space and inter-annual time scales, SM typically exhibits large variability (Brocca et al., 2010), depending on rainfall distribution, topography, soil physical properties, vegetation characteristics, and human impacts, such as irrigation.5 However, monitoring of soil moisture remains challenging due to the scarcity of in situ SM observations networks.Recent advancements in satellite-based sensors offer great potential to monitor SM over large scales for continental water resources assessment, particularly in areas where ground observation networks are sparse (Mohanty et al., 2017).Conventionally, satellite observations have been used in global water balance studies to provide information on the water cycle components, such as precipitation, evapotranspiration, soil moisture, water storage and runoff (Running et al., 2004;Kiehl and Trenberth, 10 1997;Vinukollu et al., 2011;Trenberth et al., 2007).However, sparse data coverage in satellite observations limits their ability to provide spatially and temporally consistent time series of water balance estimates.Another approach to facilitate studies at a regional to global scale is to estimate water budget components using land surface models forced with precipitation and other atmospheric data (such as the Community Land Model (CLM) (Lawrence et al., 2011), the Variable Infiltration Capacity (VIC) model (Liang et al., 1994;1996), or the Joint UK Land Environment Simulator (JULES) (Best et 15 al., 2011;Clark et al., 2011).The simulated soil moisture distributions from the land surface models provide spatially and temporally continuous information, yet their accuracy is limited by model deficiencies, and uncertainties in both model parameters and atmospheric forcing variables (Draper et al., 2009;Chen et al., 2013).Estimates from observational and modeling approaches can be merged by data assimilation to provide improved estimates of hydrologic variables at large scales (Lahoz and De Lannoy, 2014).Recent studies showed that assimilation of soil moisture data into hydrologic modeling 20 could improve water balance predictions such as evaporation and runoff (e.g.Crow et al., 2017;Mohanty et al., 2013;Brocca et al., 2012;Matgen et al., 2012;Draper et al., 2011;Pauwels et al., 2002;2001).However, many of these studies mainly focused on improved predictions at watershed scales using in situ observations.Only few studies demonstrated the potential of using satellite observations to improve runoff estimates at regional and global scales (e.g.Liu and Mishra, 2017;López López et al., 2016;Renzullo et al., 2014;Crow and Ryu, 2009;Pan et al., 2008).25 Several studies evaluated the assimilation of satellite SM observations into land surface models to produce optimal soil moisture estimates (e.g., Reichle and Koster, 2005;Han et al., 2014;Lievens et al., 2015;De Lannoy and Reichle, 2016) these important advancements, the current spatial resolution of these global scale studies is too coarse to provide locally relevant information (Wood et al., 2011, Bierkens et al., 2015).For example, predicting water cycle processes for scientific and applied assessment of the terrestrial water cycle requires a high-resolution modeling framework on the order of 10 0 km.
The spatial mismatch between coarse-resolution satellite data and high-resolution hydrologic models constitutes a great challenge.To address this issue, the spatial mismatch between observations and modeling approaches needs to be taken into 5 account either in the data assimilation algorithm (Sahoo et al., 2013;De Lannoy et al., 2012), or through pre-processing of satellite products to match the model resolution (Merlin et al., 2010;Verhoest et al., 2015).Another challenge is the availability of computational resources, since the computational burden increases (non-) linearly with increasing model resolution, the number of ensemble members in the data assimilation system as well as the complexity of simulated processes.10 The objective of this study is to investigate the performance of a high-resolution, continental land surface model in simulating soil moisture and runoff for different climatic zones using data assimilation to incorporate coarser resolution satellite soil moisture data.We used CLM3.5 coupled to the parallel data assimilation framework (PDAF) library (Kurtz et al., 2016;Nerger and Hiller, 2013).PDAF is computationally efficient due to its parallelization of data assimilation routines and is suitable for applications at large spatial scales and high-resolution over long time periods (Kurtz et al., 2016).The 15 remainder of this paper is organized as follows: the model and observational data sets as well as methods, including the CLM-PDAF setup and experimental design are described in Sect.2; the results, including model validation and analysis of simulated soil moisture and runoff are documented in Sect.3; while the conclusions are presented in Sect. 4. Description 20 In this study, the Community Land Model version CLM3.5 (Oleson et al., 2004) was applied to represent land surface processes such as surface and subsurface runoff, snow, soil moisture evolution, evaporation from soil and vegetation, transpiration and interception of precipitation by vegetation canopy, throughfall and infiltration.Specifically, runoff is parameterized using a simple TOPMODEL-based scheme (SIMTOP; Niu et al., 2005).Soil water is calculated by solving the one-dimensional Richards equation (Zeng and Decker, 2009).Groundwater table depth and recharge to groundwater 25 from the soil column is updated dynamically using the algorithm described in Niu et al. (2007).The snow model in CLM explicitly simulates multilayer snow depending on the total snow depth, and includes processes such as snow-melting, surface frost and sublimation, liquid water retention and thawing-freezing processes, (Dai et al., 2003, Dickinson et al. 2006;Stöckli et al., 2008).Total runoff is calculated as the sum of the subsurface runoff, surface runoff and runoff generated from lakes, glaciers, and wetlands (Oleson et al., 2004).30 CLM3.5 offers significant improvements in estimating the subcomponents of the terrestrial water cycle compared to earlier versions (Oleson et al., 2008), including improvements in soil water availability and resistance terms to reduce the soil Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-24Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 23 March 2018 c Author(s) 2018.CC BY 4.0 License.evaporation which was overestimated in earlier versions (Niu et al., 2005;Oleson et al., 2008;Yang and Niu 2003).For a detailed description of CLM3.5, the readers are referred to Oleson et al. (2008).CLM3.5 was used in this study, instead of its most recent version, to keep the modeling framework consistent to Kurtz et al. (2016).

Data assimilation framework
The Parallel Data Assimilation Framework (PDAF) (Nerger and Hiller, 2013) was used to assimilate satellite soil moisture 5 into CLM3.5.PDAF provides data assimilation methods such as the ensemble Kalman filter (EnKF) (Evensen, 2003;Burgers et al., 1998) and the local ensemble transform Kalman filter (LETKF) (Hunt et al., 2007).In this study, we used the EnKF, which is a relatively simple and flexible technique for assimilating satellite data into land surface models (e.g.Draper et al., 2012;Reichle et al., 2002Reichle et al., , 2008;;Kumar et al., 2008Kumar et al., , 2009;;Pipunic et al., 2008;Crow and Wood, 2003;) Where  ! is the perturbed observation vector and  ! is the Kalman gain vector defined as: 15 where  !! is the transpose matrix of the observation model at time ,  ! is the measurement error matrix, which is defined a priori based on the expected measurement error of the ESA CCI soil moisture product and  ! is the state error covariance 20 matrix of the model predictions calculated as: where  is the vector which contains the ensemble average soil moisture contents for the different grid cells and N is the number of ensemble members.Kurtz et al. (2016) recently provided a framework to couple PDAF with the land surface-subsurface part of the Terrestrial Systems Modelling Platform (TerrSysMP; Gasper et al., 2014;Shrestha et al., 2014).They showed the efficient use of parallel computational resources by TerrSysMP-PDAF, which is needed to simulate predicted states and fluxes over large 30 spatial domains and long simulations.In this study, we used the CLM-PDAF setup, in which PDAF is coupled with the stand-alone CLM3.5 for soil moisture assimilation.Readers are referred to Kurtz et al. (2016) for technical descriptions of coupling and model performance.

Land surface data and atmospheric forcing
The land surface static input data used in this study consist of topography, soil properties, plant functional types, and 5 physiological vegetation parameters (Fig. 1).Digital elevation model (DEM) data were acquired from the 1km Global Multiresolution Terrain Elevation Data 2010 (GMTED2010) (Danielson et al., 2010) as shown in Fig. 1a.The land use data was based on the Moderate Resolution Imaging Spectroradiometer (MODIS) data set (Friedl et al., 2002) (Figure 1b), where the land use types are transferred to Plant Functional Types (PFT).The properties of each of the sub-grid land fractions, such as the leaf area index, the stem area index, and the monthly heights of each PFT, were calculated based on the global CLM3. 5 10 surface data set (Oleson et al., 2008).To provide soil texture data in the model (Fig. 1c and 1d), sand and clay percentages were prescribed based on pedotransfer functions from Schaap and Leij (1998) for 19 soil classes derived from the FAO/UNESCO Digital Soil Map of the World (Batjes, 1997).
The essential meteorological variables applied in this study, such as barometric pressure, precipitation, wind speed, specific humidity, near surface air temperature, downward shortwave radiation and downward longwave radiation were downloaded from the German Weather Service (DWD; ftp://ftp-cdc.dwd.de/pub/REA/).The COSMO-REA6 reanalysis is based on the COSMO model and available at 0.055° (~6 km) covering the CORDEX EUR-11 domain (Gutowski et al., 2016).COSMO-20 REA6 was produced through the assimilation of observational meteorological data using the existing nudging scheme in COSMO with boundary conditions from ERA-Interim data.Bollmeyer et al., 2015 compared the COSMO-REA6 precipitation data with the precipitation data from Global Precipitation Climatology Centre and showed that COSMO-REA6 performed well compared to observations with small underestimations of precipitation in mid and southern Europe and overestimations of precipitation in Scandinavia, Russia and along the Norwegian coast.Additionally, Springer et al. (2017) 25 assessed the closure of the water budget in the 6-km COSMO-REA6 and compared to global reanalyses (Interim ECMWF Reanalysis (ERA-Interim), Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2)) for major European river basins.In their study, Springer et al. (2017) found that the COSMO-REA6 closes the water budget within the error estimates whereas the global reanalyses underestimate the precipitation minus evapotranspiration deficit in most river basins.A more comprehensive assessment of the precipitation of the HErZ reanalysis can be found in Wahl et al. 30 (2017), albeit based on the 2 km data product, only available for central Europe.

ESA CCI microwave soil moisture
The European Space Agency (ESA) Climate Change Initiative (CCI) program provides daily soil moisture (CCI-SM) at 0.25° spatial resolution for approximately the top few millimeters to centimeters of soil from 1978 to 2016.The daily CCI-SM product (v03.2) is produced at 0.25° spatial resolution from the microwave retrieved surface soil moisture data and is merged from multiple sensors (Dorigo et al., 2017;Liu et al., 2012;Liu et al., 2011;Wagner et al., 2012; http://www.esa-5oilmoisture-cci.org).For the study period of 2000 to 2006, the CCI-SM data are based on passive microwave observations (i.e.DMSP SSM/I, TRMM TMI, Aqua AMSR-E and Coriolis WindSat; Owe et al., 2008), whereas the active data products are based on observations from the C-band scatterometers on board of the ERS-1 and ERS-2 (Wagner et al., 2013;Bartalis et al., 2007) satellites.In this product, the absolute soil moisture was re-scaled against the 0.25° land surface modeling soil moisture (GLDAS-NOAH, Rodell et al., 2004) using cumulative density function matching.In this study, we used the 10 merged product of active and passive soil moisture data which showed better accuracy than either of the passive or active data alone (Liu et al., 2011).To match the spatial resolution of our CLM3.5 setup, the original SM values were re-sampled and re-gridded to 0.0275° using the first-order conservative interpolation method (Jones, 1999) which is based on the ratio of source cell area overlapped with the corresponding destination cell area.The conservative regridding scheme preserves the physical flux fields between the source and destination grid.15 The CCI-SM dataset shows large gaps in data availability over the European continent during the four seasons (December -February (DJF; Winter), March-May (MAM; Spring), June-August (JJA; Summer), and September-November (SON; Autumn); Fig. 2b).According to Fig. 2b, the temporal coverage (i.e. the ratio between the number of days and the total number of days in a season).isgenerally low during the winter and spring seasons, ranging from less than 30% (Scandinavian regions) to about 60% in southern Europe.SM observations show the highest temporal coverage during the 20 summer and autumn.Due to the sparseness of the SM data at daily temporal resolution, 100 grid cells were randomly selected covering the complete model domain (Fig. 2a).The satellite CCI-SM daily soil moisture data at these locations were assimilated in the data assimilation framework.However, the number of observations for each day ranged between 2 to 75 depending on the availability of the daily CCI-SM data.As shown in Fig. 2c, there is a higher level of noise in the CCI-SM data for the first two years (2000 and 2001) which might be due to the fact that data from other sensors such as AMSRE-E 25 and Windsat become available after 2002.Moreover, availability of selected observations was lower during winter and spring, while summer soil moisture was well covered during years 2003 to 2006.This seasonal difference in data availability is related to the occurrence of soil freezing events and snow cover.provides monthly pan-European runoff estimates from 1950 to 2015 at 0.5° resolution.The monthly runoff rates were generated using a collection of streamflow observations from small catchments combined with gridded precipitation and temperature data using a machine learning approach (Gudmundsson and Seneviratne, 2016).Monthly runoff was estimated using a regression model, which was trained with a subset of observed runoff rates and E-OBS precipitation and temperature.
The fitted model was subsequently applied to all grid cells of the E-OBS data to derive pan-European estimates of monthly 5 runoff (Gudmundsson and Seneviratne, 2016).Using this cross-validation method, Gudmundsson and Seneviratne (2016) reported higher accuracy in central and western Europe, while accuracy was lower in other regions due to low density of available stations.In the current study, the half degree monthly runoff rates were resampled and re-gridded to 0.0275° using the first-order conservative interpolation method for comparison with the CLM3.5 simulated total runoff.

CLM-PDAF experimental design 10
The assimilation experiments were performed for the time period of January 2000 to December 2006.A spinup of 45 years, by simulating the time period from 1997 to 2006 five times, was performed in order to obtain equilibrium initial state variables.In this study, we implemented CLM3.5 for the EURO-CORDEX domain with a spatial resolution of 0.0275° (~ 3km), inscribed into the official EUR-11 grid at 0.11° spatial resolution.The model was run with 1h time step and the time window for soil moisture updates was set to 1 day.In this study, we assumed a spatially uniform observational error of 0.02 15 mm 3 /mm 3 for CCI-SM in the CLM-PDAF setup.
The outputs of a land surface model are sensitive to both atmospheric forcings and soil characteristics.To account for uncertainties in atmospheric forcing and soil texture, precipitation and soil texture (%sand and %clay) were perturbed in this study.Log-normally distributed, spatially homogeneous and temporally uncorrelated multiplicative perturbations were added to precipitation.The mean and standard deviation of the applied perturbation factors for precipitation were equal to one and 20 0.15, respectively.Sand and clay content were perturbed using a random noise with a standard deviation of 10%.In order to guarantee the physical meaning of the soil parameters, the sand and clay content were constrained to have a sum of 100%.
The initial ensemble size was set to 12 for the precipitation and soil texture in the simulation/assimilation experiment to update the volumetric soil water content (SWC) of the top soil layer (~ 2cm).
Our main experiment consists of two CLM-PDAF simulations: (a) an open-loop simulation (no data assimilation, CLM-OL) 25 and (b) an ensemble simulation with data assimilation of ESA CCI-SM data (CLM-DA) at 100 random locations (Figure 2a).
We evaluated the results of both simulations by a cross-validation with ESA CCI-SM data as shown in Figure 2a.The soil moisture validation of the CLM-DA and CLM-OL simulations used all the available CCI-SM data in the time period of 2000 to 2006.This approach also allowed us to independently cross-validate the SM values over grid cells that were not used in the data assimilation.For SM comparison, the average of simulated SWC in the top two layers (i.e. at 0.007 and 0.03 m 30 depth) was used.Additionally, the monthly runoff dataset E-RUN as described in Sect.To assess the skill of the assimilation experiments, the root mean square error (RMSE) and the mean bias error (BIAS) were used as validation measures.
where n is the total number of time steps;  ! and  !"#,! represent the simulated ensemble mean and observation values at time 5 step , respectively.

Results and discussion
In this section, the impact of assimilating the ESA CCI-SM data into CLM3.5 on the terrestrial hydrologic cycle is analyzed focusing on soil moisture and runoff.The results are presented for the complete CORDEX EUR-11 domain and for 8 predefined analysis regions from the "Prediction of Regional scenarios and Uncertainties for Defining European Climate change 10 risks and Effects" (PRUDENCE) project (Christensen et al., 2007) as shown in Fig. 1a.We refer to these regions as the "PRUDENCE" regions.

Seasonal mean comparison
Figure 3 shows a comparison of the seasonal mean volumetric SWC (mm 3 /mm 3 ) from the CLM3.5 experiments (CLM-OL, 15 CLM-DA) with the satellite seasonal mean CCI-SM data.The CLM-OL simulation exhibits higher SWC in all seasons over most part of Europe compared to the CLM-DA simulations.Seasonally, the spatial distribution of SWC in summer and autumn is better reproduced in the CLM-DA simulations than in the CLM-OL when compared with the CCI-SM (Fig. 3c and   3d).Furthermore, Fig. 4 shows the comparison of 2000 to 2006 temporally averaged SM estimated by CLM-OL and CLM-DA with the CCI-SM dataset over PRUDENCE regions.Generally, CLM-OL overestimated the SWC values for all sub-20 regions and in all seasons.This overestimation of soil moisture in CLM was also reported by Cai et al. 2014 when compared with other land surface models and observations over Continental US.However, using data assimilation, this overestimation was reduced consistently in all sub-regions, as shown with CLM-DA.Noticeably, assimilation also helped to reduce the spatial variability, as indicated by the narrow spread of quartiles of CLM-DA estimated SWC compared to CLM-OL in Fig. 4. Similarly, validating the simulations with CCI-SM data, the improvements of the CLM-DA vary within PRUDENCE regions and seasons.Improvements were more prominent for the UK, France, and Central Europe (for all seasons), while for other regions SWC was slightly overestimated in spring (Fig. 4b) and underestimated in summer and autumn (Fig. 4c and Fig. 4d).The underestimation of SWC was particularly pronounced over the Iberian Peninsula and the Mediterranean regions in summer (Fig. 4c).In order to validate the skill of CLM-DA relative to CLM-OL, a cross-validation with CCI-SM observations was performed and RMSE and BIAS for soil moisture were calculated using daily values for each PRUDENCE region and each season, as shown in Fig. 5. Note, that for calculating these statistics, model data were only used for the days when satellite data were 5 available.Over Scandinavia, the Alpine, the Mediterranean and Eastern Europe PRUDENCE regions, CLM-DA showed a consistently lower RMSE than CLM-OL for all seasons, except for winter, where improvements in SWC were comparatively small (Fig. 5a and Fig. 5b).The BIAS for CLM-OL indicates a clear overestimation of soil moisture relative to satellite CCI-SM observations (Fig. 5c), whereas the BIAS for soil moisture from CLM-DA is significantly reduced (Fig. 5d).The mean BIAS dropped from 0.1 mm 3 /mm 3 (CLM-OL) to 0.004 mm 3 /mm 3 (CLM-DA) for all regions.However, data assimilation 10 introduced a dry BIAS in summer (up to −0.03 mm 3 /mm 3 ), as indicated by CLM-DA in Fig. 5d.Wang (1987) showed that X-band 20 data have a very shallow soil penetration depth of a few millimeters and sensitive to vegetation cover.After implementing the C-band radiometer data of AMSR-E in 2002 and Windsat in 2003 into CCI-SM, noise level and bias was reduced (Dorigo et al., 2017).Overall, the daily soil moisture values estimated by the CLM-DA show a slightly better agreement with the CCI-SM data for the summer and autumn seasons than for the spring and winter seasons.The winter season bias is pronounced particularly over Scandinavia, the Alpine region and Eastern Europe.The poorer performance of CLM-DA in 25 these regions might be due to the limited amount of CCI-SM data in the winter season (Fig. 2b), dense vegetation, frozen soil (e.g. in the Scandinavian regions) and/or CLM3.5 model errors related to simulating soil moisture in colder regions (Oleson et al. 2008, Decker andZeng, 2009).Additionally, the magnitudes of the bias and variance of the CCI-SM observational error could be important.As indicated by Dorigo et al. (2017), the CCI-SM error variance is low where the satellite track density increases and the error variance is high in areas with more data gaps.Note that the setup of CLM-DA in this study 30 assumed a spatially uniform observational error for CCI-SM.The overestimation in the CLM-OL simulations was more pronounced in the summer and spring seasons (Fig. 7b and Fig. 5 7c).Compared to CLM-OL, regional runoff patterns simulated by CLM-DA agree better with runoff observations.At the seasonal scale, CLM-DA shows similar spatial runoff distributions in winter and spring as the E-RUN runoff data set (Fig. 7a and Fig. 7b).However, CLM-DA underestimates runoff in summer and autumn particularly in Mid-and Southern Europe (Fig. 7c and Fig. 7d).It is obvious that the assimilation of soil moisture led to an overall improvement in the simulated total runoff for all regions as shown in Fig. 8.However, improvements in runoff are more prominent over the alpine, Scandinavia 10 and Eastern Europe regions, where CLM-DA minimized the difference to E-RUN from 1.41 mm/day, 1.22 mm/day and 0.85 mm/day, to -0.5 mm/day, -0.06 mm/day and -0.03 mm/day, respectively (Table 1).The improvements over other regions, such as the UK, Iberian Peninsula, France and Mediterranean, were comparatively small.These findings indicate the potential of satellite soil moisture assimilation in CLM3.5 to improve other terrestrial components of the water cycle as a basis for more accurate water balance analyses.15

Regional evaluation of seasonal runoff
Figure 9 shows that simulations based on CLM-OL have higher RMSE and BIAS values than CLM-DA simulations (with respect to E-RUN data).The RMSE and BIAS for CLM-OL for mean monthly runoff over all regions and in all seasons vary 20 between 0.4 and 1.5 mm/day and -0.11 to 2.5 mm/day, respectively.CLM-DA reduces the range of both RMSE and BIAS of mean monthly runoff to 0.2 to 1 mm/day and -1.5 to 0.27 mm/day, respectively, across all regions (Fig. 9b and Fig. 9d).
However, the data assimilation in CLM-DA also introduces a negative BIAS for most regions, which indicates underestimation of monthly runoff relative to E-RUN.This underestimation might be related to model limitations to correctly represent saturation excess runoff processes at the large spatial scales, particularly in the arid to semi-arid regions.25 In the dry regions, assimilation of soil moisture data may result in reduction of soil moisture values close to the residual water content values which may lead to small runoff generation.Currently, the CLM-PDAF setup only performs state updates.In future, joint update of states and hydraulic parameters related to soil texture and runoff generation may further improve runoff estimates (Huang et al., 2013).
[Figure 9] 30 The time series of monthly runoff, as illustrated in Figure 10, show that CLM-OL highly overestimates the magnitude of runoff.The CLM-DA reduces these biases over all regions, but at the same time induces a dry bias for some regions.At the regional scale, the CLM-DA performs better when compared to E-RUN in Mid-Europe, Scandinavia, Alpine and Eastern Europe regions in capturing peaks and low runoff.In the UK, Iberian Peninsula, France and the Mediterranean regions, however, peak runoff in winter is underestimated whereas low runoff in summer is in correspondence with observed monthly runoff data.The relatively poor performance of CLM-OL may be related to several limitations in the CLM model (Li et al., 2011).For example, Li et al. (2011) showed unrealistic behavior of subsurface runoff and high runoff peaks in CLM4.0, which they attributed to the exponential form of the surface runoff parameterization.Additionally, the assumption of 5 topographically controlled surface runoff generation in the CLM model (Oleson et al. 2008) is problematic in areas with flat topography, thick soils, or deep groundwater (Li et al., 2011).Another reason may be uncertainties of E-RUN runoff data used in this study, which are derived from gridded atmospheric variables at coarser resolution (0.5° x 0.5° grid resolution) and flow observations.In future, additional observational data need to be explored in assimilation experiments and assessment of the results.10 [Figure 10]

Uncertainties and limitations
This study demonstrates that the assimilation of coarse-scale satellite CCI soil moisture data is beneficial and improves the high-resolution CLM model simulations of soil moisture and runoff over a large spatial domain.However, we note a number of limitations in this study.The spatial mismatch between the coarser resolution CCI-SM and our high-resolution land 15 surface model was tackled by rescaling the CCI-SM data to the model resolution (~3km) without any bias correction.An adequate bias correction of CCI-SM data for high resolutions may require more elaborated methods and techniques to account for the increased variability over time.A further possibility is the multiscale assimilation of the CCI-SM data, which would allow to update various model grid cells covered by a satellite observation (Montzka et al., 2012).In multiscale assimilation, the average soil moisture content for the group of grid cells covered by the satellite measurement is compared 20 with the satellite-based soil moisture content which may result in slightly improved CLM simulation results, but was beyond the scope of this study.
In addition to discrepancies at the spatial scale, uncertainties in soil moisture estimations may result from data gaps in satellite soil moisture retrievals, which are limited in regions of pronounced topography, standing water, areas of dense vegetation and snow covered areas and frozen soil.Additionally, CCI-SM is a merged product from a variety of sensors 25 leading to inconsistencies due to differences in viewing angle, sensor characteristics and soil moisture retrieval algorithms (Dorigo et al., 2017).In future, more observations are needed to independently validate model and assimilation experiments.
In this work, the CCI-SM dataset was also used for verification over grid cells that were not used in the data assimilation.
However, it would be preferable to validate with another independent dataset at the continental scale.The problem is that at the model grid scale only very limited independent (in situ) soil moisture data are available.30 Furthermore, in order to reduce model errors from parameter uncertainties, this study only allows to account for uncertainties in the soil texture parameters.In data assimilation, it is preferable to account for additional model parameter uncertainties towards runoff that shows a high sensitivity.Alternatively, prior model calibration can be considered to constrain model parameters better and reduce systematic biases and uncertainties in CLM3.5 before applying the assimilation framework.For improvement of hydrologic predictions, joint assimilation of additional datasets such as river discharge and snow data may also be considered in future research.

Summary and conclusions
A soil moisture data assimilation framework at the continental scale was applied to generate long-term daily soil moisture 5 and runoff estimates as part of a terrestrial systems monitoring framework for Europe at approximately 3 km resolution for the years 2000 to 2006.An ensemble was generated by perturbing precipitation and soil texture properties.This ensemble was used as input in the CLM-PDAF data assimilation framework (Kurtz et al., 2016) and used to assimilate CCI-SM soil moisture data.The impact of satellite soil moisture assimilation on daily soil moisture and runoff was evaluated and crossvalidated with CCI-SM data and gridded runoff from E-RUN observations at regional and seasonal scales.Using this high-10 resolution CLM-PDAF setup, the conclusions of this study are: 1.This study showed that assimilation of satellite SM improved the soil moisture simulations over most parts of Europe relative to open-loop simulations.Open loop simulations overestimated SM in most parts of Europe and in all seasons.For the study domain, on average, the mean bias in soil moisture was reduced from 0.1 mm 3 /mm 3 in open-loop simulations to 0.004 mm 3 /mm 3 with SM assimilation.15 2. Regionally, significant improvements were achieved for soil moisture across most regions, except over Scandinavia and Eastern Europe.The low performance of CLM-OL and CLM-DA in these regions might be due to lack of data in space and time, as caused by track changes, radio-frequency interference, dense vegetation, and frozen soil limiting the assimilation of soil moisture data in land surface processes simulations.Similarly, both CLM-OL and CLM-DA performed poorly for years 2000 and 2001, which 20 appears to be related to large data gaps and higher noise levels in CCI-SM satellite data in these years.This indicates adequate suitability of ESA CCI soil moisture for data assimilation studies from 2002 onwards, whereas the accuracy and noise levels of earlier periods data is not appropriate for this purpose.

3.
At the seasonal time scale, the CLM-DA simulations performed better in the summer and autumn seasons than in the winter and spring seasons.This might be again related to large data gaps in the winter season or model 25 limitations to correctly represent complex cold region processes such as frozen soil.

4.
The assimilation of CCI-SM data into CLM3.5 also improved the overall performance of the CLM3.5 model in simulating total runoff (i.e.surface and subsurface runoff).The largest improvements were achieved in the simulation of peak runoff as result of soil moisture assimilations.The improvement in peak runoff could be particular importance in the management of extreme events such as flooding.30 The results from this study are not only useful as a standalone, high-resolution product for evaluating trends in soil moisture patterns across Europe, but can also be used as an independent dataset for validation of other land surface models.

5
Table 1 . For example, Rains et al. (2017) assimilated SMOS data into CLM over Australia for drought monitoring purposes.Han et al. (2014) evaluated the joint state and parameter estimation method for the coupled CLM and Community Microwave Emission Model (CMEM) (de Rosnay et al., 2009) through assimilation of synthetic microwave brightness temperature data.30 Similarly, Liu and Mishra (2017) also used assimilation of satellite SM data at the global scale to evaluate the performance of the community land surface model (CLM4.5) in simulating hydrologic fluxes such as SM, ET and runoff at 0.5° spatial resolution.They found that assimilating satellite SM data into the CLM4.5 model improved the soil moisture simulations, which also lead to better representation of other hydro-meteorological variables in the model, such as ET and runoff.Despite Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-24Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 23 March 2018 c Author(s) 2018.CC BY 4.0 License.

[Figure 5
The long-term (January 2000 to December 2006) daily SM averaged over PRUDENCE regions (Fig. 1) in Europe, as simulated by CLM-OL and CLM-DA, and observed by CCI-SM are shown in Fig. 6.The assimilated CCI-SM data 15 improved the simulations of surface soil moisture in CLM-DA.The daily soil moisture patterns simulated by CLM-DA show a strong agreement with the CCI-SM observations, with peaks and troughs generally coinciding for all regions and over the European domain except for the years 2000 and 2001.The CCI-SM observations show increased variability and drier soil moisture values for the years 2000 and 2001 compared to the full period.This can be explained by the strong contribution of the X-band passive microwave data of SSM/I and TRMM to the final CCI-SM product.

Figure 7
Figure 7 shows the runoff estimates of the two experiments, i.e.CLM-OL and CLM-DA, compared to the E-RUN observational product.CLM-DA reduces the runoff bias compared to CLM-OL.CLM-OL simulates higher magnitudes of runoff (on average 1.42 mm/day) over most parts of Europe compared to CLM-DA (on average 0.25 mm/day) in all seasons.

:
Monthly mean bias (CLM minus E-RUN) in mean seasonal runoff (mm/day) for CLM-OL and CLM-DA for all PRUDENCE regions and all seasons, i.e. winter (DJF), spring (MAM), summer (JJA) and autumn (SON).

Figure 2 :
Figure 2: Satellite ESA-CCI soil moisture data resampled to 0.0275° resolution for the time period of 2000 to 2006 over EU-CORDEX.(a) Temporally averaged soil moisture content for different seasons, (b) fraction of days that soil moisture observations were reported during different seasons, and (c) number of selected observations with valid data for the respective day over the 2000-2006 period used for assimilating SM in data assimilation experiment.Black circles in (a) indicate the location of grid cells 5

Figure 4 :
Figure 4: Box plots showing the spread of seasonally averaged soil water content (mm 3 /mm 3 ) over the 2000 -2006 time period and in the PRUDENCE regions for (a) DJF, (b) MAM, (c) JJA and (d) SON seasons.The boxplots illustrate the spatial distribution of SWC with quartiles, median and extreme values marked by solid lines.

Figure 7 :
Figure 7: Temporally averaged monthly runoff (mm/day) over different seasons simulated by CLM-OL and CLM-DA for the years 2000 -2006 for a) DJF, (b) MAM, c) JJA and d) SON seasons.Temporally averaged monthly runoff from E-RUN is shown for comparison.

Figure 8 : 5
Figure 8: Boxplots of temporally averaged runoff (mm/day) over the years 2000 -2006 for all PRUDENCE region and seasons, i.e.(a) DJF, (b) MAM, (c) JJA and (d) SON.The boxplots indicate the spatial distribution of monthly averaged runoff over each region.

Figure 9 :
Figure 9: Root mean square error and mean bias error for runoff (mm/day) over different seasons in the PRUDENCE regions for (a,c) CLM-OL and (c,d) CLM-DA simulations over the years 2000-2006.

Figure 10 :
Figure 10: Monthly time series of runoff from CLM-DA and CLM-OL simulation and compared with E-RUN runoff observation data for the years 2000 -2006 over the PRUDENCE regions.
. It used ensembles of model states to approximate the model state error covariance matrix in order to optimally merge model 10 predictions with observations.The EnKF calculates the ensemble of updated states variable  !! at each time  of the model estimated state variable  !, as Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2018-24Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 23 March 2018 c Author(s) 2018.CC BY 4.0 License.