Spatial horizontal correlation characteristics in the land data assimilation of soil moisture

Remote sensing images deliver important information about soil moisture, but often cover only part of an area, for example due to the presence of clouds or vegetation. This paper examines the potential of incorporating the spatial horizontal correlation characteristics of surface soil moisture observations in land data assimilation in order to obtain improved estimates of soil moisture at uncovered grid cells (i.e. grid cells without observations). Observing system simulation experiments were carried out to assimilate the synthetic surface soil moisture observations into the Community Land Model for the Babaohe River Basin in northwestern China. The estimation of soil moisture at the uncovered grid cells was improved when information about surrounding observations and their spatial correlation structure was included. Including an increasing number of observations for covered and uncovered grid cells in the assimilation procedure led to a better prediction of soil moisture with an upper limit of five observations. A further increase of the number of observations did not further improve the results for this specific case. High observational coverage resulted in a better assimilation performance, depending also on the spatial distribution of observation data. In summary, the spatial horizontal correlation structure of soil moisture was found to be helpful for improving the surface soil moisture data characterization, especially for uncovered grid cells.


Introduction
The estimation of soil-atmosphere exchange fluxes of water and energy can be improved by assimilating in situ measurements and remote sensing products (Li et al., 2007;Moradkhani, 2008;Reichle et al., 2008).Li et al. (2010), Liu et al. (2011), Reichle et al. (2007), Tian et al. (2010) and Yang et al. (2009) proved that the assimilation of real passive microwave remotely sensed surface soil moisture or brightness temperature could improve the model estimation of soil moisture.The soil moisture products of scatterometer were also used in data assimilation (Dharssi et al., 2011;Mahfouf, 2010;Brocca et al., 2010b).Pan and Wood (2010) analyzed the impact of satellite spatial availability on the soil moisture assimilation in a synthetic experiment.Montzka et al. (2011) and Nie et al. (2011) showed that soil moisture observations could be used to optimize the parameters of a land surface model.Ghent et al. (2010) showed that the assimilation of surface temperature from three remote sensing products could improve the modeled surface temperature, soil moisture, latent and sensible heat fluxes.Reichle et al. (2010) also studied the benefit of surface temperature assimilation in improving the model estimation of surface temperature.Xu et al. (2011) showed that the characterization of the sensible and latent heat fluxes was improved by the assimilation of remotely sensed surface temperature.
These studies showed that remotely sensed surface soil moisture and surface temperature have become the important data sources in regional land data assimilation applications.Examples are the surface soil moisture products of the microwave sensors AMSR-E (Advanced Microwave Scanning Radiometer for EOS) (Njoku and Chan, 2006), ASCAT (Advanced Scatterometer) (Naeimi et al., 2009), SMOS (Soil Moisture and Ocean Salinity) (Kerr et al., 2010), the upcoming SMAP mission (Soil Moisture Active Passive) (Entekhabi et al., 2010), and the land surface temperature of thermal infrared sensor MODIS (Moderate Resolution Imaging Spectroradiometer) (Wan and Li, 2008).However, soil moisture retrieval based on microwave measurements is often hampered by the presence of vegetation cover (Dorigo et al., 2010;Njoku and Chan, 2006) or topography (Flores et al., 2009;Matzler and Standley, 2000;Pellenq et al., 2003).Moreover, passive microwave sensor records can be contaminated by radio frequency interferences and accordant pixels have to be excluded from the analysis (Anterrieu, 2011;Skou et al., 2010).The land surface temperature retrievals can also be influenced by the cloud cover (Coll et al., 2009;Wan and Li, 2008).All these effects may lead to incomplete spatial coverage of the soil moisture and surface temperature fields from remotely sensed measurements for the study area of interest leading to an exclusion of these areas for data assimilation purposes as no data are available.However, land surface variables like soil moisture or surface temperature show a spatial horizontal correlation (we will use "spatial correlation" as a synonym for "spatial horizontal correlation").The presence of such correlations implies that land surface variables at the uncovered area can be related to land surface variables at the covered area.Brocca et al. (2010a) and Ryu and Famiglietti (2006) studied the spatial correlation of soil moisture at different spatial scales.The results showed that the spatial correlation pattern of soil moisture could be modeled with the help of geostatistical approaches and the regional soil moisture content could be estimated using a fixed number of samples.This provides the opportunity to improve the estimation of land surface variables by propagating the information of observations from covered areas to uncovered areas.De Lannoy et al. (2009) and Reichle and Koster (2003) have studied the impact of horizontal error correlation in the model forecast covariance matrix for soil moisture assimilation.On the other hand, due to the inaccuracies in the spatial registration of remote sensing products (Townshend et al., 1992), and the difference between the spatial resolution of remote sensing platforms and the land surface model, remotely sensed observations cannot be mapped directly onto the model grid cell.This means it is difficult to find the observation data located at the same spatial location as the model grid cell.There are offsets between the observation location and the model location.
In land data assimilation, these two common problems should be considered jointly: uncovered model grid cells because of the observational spatial coverage and the distance between the observation location and model location because of the inaccuracies in the spatial registration of remote sensing products.We need to assess whether the neighboring surrounding observations can be used to improve model estimations or not.In order to incorporate the spatial correlation pattern of observations into a data assimilation system, we chose the local observation selection technique (Greybush et al., 2010;Houtekamer and Mitchell, 1998;Hunt et al., 2007;Whitaker et al., 2008), in which the observations near a model grid cell can be used for updating with data assimilation.
The local observation selection limits the impact of observations further away from the grid cell by filtering out the small correlations associated with these observations.The benefits and performance of the local observation selection were discussed by Greybush et al. (2010) and Hunt et al. (2007) in the framework of Local Ensemble Transform Kalman Filter (LETKF).Presently, geostatistical methods which can characterize the spatial correlation structure of state variables such as soil moisture have not yet been used in the local observation selection technique of data assimilation.In particular, the geostatistical semivariogram model can be used to model the horizontal spatial dependence among observations (Chiles and Delfiner, 1999;De Lannoy et al., 2006;Lakhankar et al., 2010).
The objective of this study is to evaluate the potential of incorporating the spatial correlation characteristics of observations into LETKF in order to improve the soil moisture profile estimation.The organization of this paper is as follows.Section 2 presents a review of the study area, the Community Land Model, as well as the details of model input data.Section 3 presents the experimental design and the explanation of the methodologies used.Results in Sect. 4 are derived from the observing system simulation experiments.Section 5 provides a brief summary and discussion of the key results.

Model and input data
The new developed Community Land Model 4 (CLM) (Oleson et al., 2010) was used to describe the soil-vegetationatmosphere transfer of water, energy and matter.CLM represents several aspects of the land surface including surface heterogeneity and consists of components or submodels related to land biogeophysics, hydrologic cycle, biogeochemistry and ecosystem dynamics (Oleson et al., 2010).The model of the Babaohe River Basin consists of 15 soil layers, and the model grid resolution is 1 km.There are 3640 active grid columns.
The 25 km and 3 hourly atmospheric forcing data from the Global Land Data Assimilation System (GLDAS) project (Rodell et al., 2004), in which several forcing products from the model reanalysis and remote sensing were combined together, were interpolated on a 1 km grid with a temporal resolution of 1 h.Precipitation was interpolated in time using a linear disaggregation method provided by the Global Soil Wetness Project 2 (GSWP2).The temporal interpolations of incident solar radiation, incident longwave radiation, wind speed, relative humidity, air pressure and air temperature were based on the cubic spline method (Dai et al., 2003).The quasi-physically based high resolution meteorological interpolation model MicroMet, in which the spatial interpolations are performed using the Barnes objective analysis scheme (Liston and Elder, 2006) and the 1 km SRTM Digital Elevation Model were used in the spatial interpolation of the forcing data.
We used the MODIS 500 m Plant Functional Type (PFT) scheme of product MCD12Q1 to replace the CLM 0.5 • PFT data.The MODIS PFT was projected into Longitude-Latitude Projection and resampled to 1 km using MRT (MODIS Reprojection Tool).Next, the MODIS PFT (12 types) was translated to CLM PFT (17 types) using the 1 km global climate data WorldClim (Hijmans et al., 2005) and the methods proposed by Bonan et al. (2002).The mean temperature for the warmest and coldest season of the year, the annual precipitation, the precipitation of the driest, warmest and coldest seasons are necessary input data for the PFT translation and were taken from the WorldClim database.Additionally, the growing-degree days above 5 • C, calculated from the GLDAS forcing data, are necessary input to reclassify MODIS PFT to CLM PFT (Bonan et al., 2002).The 4-day MODIS leaf area index (LAI) data (MCD15A3) at 1 km resolution was used to calculate the monthly LAI, and the stem area index (SAI) was estimated based on the methods proposed by Lawrence and Chase (2007).
In CLM, the soil color determines dry and saturated soil albedo.The soil thermal and hydraulic properties are estimated from sand, clay, and organic matter content.The maximum fractional saturated area is used to determine surface runoff and infiltration.The 1 km soil color data was calculated with help of the method of Lawrence and Chase (2007), and the soil texture data.The soil sand fraction, clay fraction, organic matter density and soil bulk density were from the Harmonized World Soil Database v1.1 (HWSD) (FAO, X. Han et al.: Spatial horizontal correlation characteristics in the land data assimilation of soil moisture 2010).The HWSD is a 30 arc-second raster database with over 16 000 different soil mapping units that combines existing regional and national updates of soil information worldwide (SOTER, ESD, Soil Map of China, WISE), with the information contained within the 1:5 000 000 scale FAO-UNESCO Soil Map of the World.
The maximum fractional saturated area, which is defined as the cumulative distribution function of the topographic index when the grid cell mean water table depth is zero, was calculated based on the methods proposed by Niu et al. (2007) using the 100 m GDEM (GDEM is a product of METI and NASA) and 1 km SRTM DEM (Jarvis et al., 2008).

Experiment design
The proposed methods were evaluated with help of an Observing System Simulation Experiment (OSSE).A reference run of CLM (single CLM) was driven by the unperturbed forcing and soil parameters from 1 June 2007 to 31 August 2008.The soil moisture profile results from 1 June 2008 to 31 August 2008, calculated by the reference CLM run, were selected as the ground truth.The daily surface soil moisture (5 cm depth) results from 1 June 2008 to 31 August 2008 at 07:00 Z were selected to be used as the model values that will be updated by data assimilation.There are 92 soil moisture observation data in the time series used for the assimilation.Through the correlation analysis of the soil moisture, we find that the correlation lengths range from several kilometers to hundreds of kilometers.In this synthetic study, the corrupted observations were generated by adding spatially correlated noise to the synthetic truth.The spatially correlated noises were generated using a range value of 10 km.To be more precise, a spatially correlated Gaussian random field with mean 0.0 and an exponential semivariogram model with nugget 0.0 (m 3 m −3 ) 2 , variance 0.0016 (m 3 m −3 ) 2 (i.e. the standard deviation of soil moisture observation is 0.04 m 3 m −3 , which is the volumetric accuracy of the SMOS mission; Kerr et al., 2010 as well as the upcoming SMAP mission; Entekhabi et al., 2010), and range 10 km was added to the surface soil moisture fields from the reference run in order to obtain the synthetic surface soil moisture observation data.The random fields of spatially correlated noise were generated using the geoR package (Ribeiro Jr. and Diggle, 2001) of the statistical data analysis software R (http://www.r-project.org/).
The spin up period of CLM for the 10 ensemble members was from 1 June 2007 to 31 May 2008 using a temporal resolution of one hour.The open loop run for the 10 ensemble members was from 1 June 2008 to 31 August 2008.The 10 ensemble members had different perturbed forcing data and soil parameters (details will be provided in Sect.3.4).

Spatial correlation and geostatistics
The presence of horizontal spatial dependence of land surface properties is typically identified with help of a semivariogram analysis (Goovaerts, 1997;Ryu and Famiglietti, 2006).Several semivariogram models are often used to characterize the spatial dependence such as the Gaussian model, the exponential model, the spherical model and the Matern model (Goovaerts, 1997;Minasny and McBratney, 2005).In this paper, we only considered these semivariogram models to characterize the spatial dependence of surface soil moisture.These models are defined as follows: where c 0 is nugget, c is equal to sill minus nugget, h is the separation distance, a is the (effective) range parameter, K v is a modified Bessel function of second order v, is the gamma function and v (kappa) is called "smoothness parameter" (v > 0).The normalized semivariogram γ (h) Nor is defined as γ (h) Nor = γ (h)/(c 0 + c) and the correlogram is then given by 1 − γ (h) Nor .The nugget value describes the unresolved variance at the scale smaller than the smallest lag distance whereas the sill describes the variance of the observed data as they become spatially independent.The calculated correlogram is normalized using the maximum correlogram value, which means that the correlogram value is equal to one at the observation locations.It gradually reduces towards 0.0 as the distance from the analysis grid cell increases, and is null beyond the specified correlation range.The observation selection scheme in LETKF is as follows: spatial correlations between the model location and the observation locations are calculated according to the fitted semivariogram; the observations whose correlograms are larger than a predefined threshold are selected for each model grid cell.
In order to represent the vegetation or cloud cover impacts on the observation data, we derived the mask data using the valid grid cells from the MODIS land surface temperature products of Terra (MOD11A1-Daytime) on 13 April 2008, 1 May 2008 and 26 June 2008, respectively.In the assimilation experiments, it is assumed that only for the valid grid cells from the remote sensing image is meaningful information available, whereas for the other grid cells due to cloud cover and vegetation no meaningful information could be extracted.The positions of the extracted valid grid cells are used as the mask and the synthetic observation data located at these grid cells were selected as the observation data to be assimilated.Assimilation experiments with three different masks were performed which are shown in Fig. 1b-d, respectively.In total 2559, 1697 and 1143 valid observation grid Fig. 2. Flow chart of data assimilation procedure for covered and uncovered grid cells, "Obs Num" is the number of observations.cells were selected covering 70, 47 and 31 % of the model active grid cells of this study area.The maximum distance considered when fitting the semivariogram of observation data was set as 10 km and pairs of locations with separation distances larger than this value were ignored.Isotropic semivariograms of surface soil moisture observations were fitted using the geoR package in R.

Local ensemble transform Kalman filter
The local ensemble transform Kalman filter proposed by Hunt et al. (2007) is a new variant of the ensemble Kalman filter (EnKF).The main difference compared to the various kinds of implementations of EnKF, such as the stochastic EnKF (Burgers et al., 1998), the deterministic methods (Whitaker and Hamill, 2002), the ensemble adjustment Kalman filter (EAKF) (Anderson, 2001), the ensemble transform Kalman filter (ETKF) (Bishop et al., 2001) and the Ensemble Square Root Filter (EnSRF) by Whitaker and Hamill (2002), is the local analysis scheme used by LETKF (Houtekamer and Mitchell, 1998;Hunt et al., 2007;Kalnay et al., 2007;Miyoshi and Yamane, 2007;Whitaker et al., 2008).
Here, the spatial correlation characteristics of the observations were considered in the framework of the LETKF local analysis scheme.It only considers the observations located in a local region surrounding the analysis model grid cell.The observations were selected for each grid cell before data assimilation.The grid cell by grid cell analysis method can easily be parallelized and is useful to decrease the computational burden for the large scale data assimilation.For the detailed introduction of LETKF, please refer to Hunt et al. (2007).

Ensemble generation
The uncertainties in the model calculations were represented by the perturbed forcing data, soil parameters and the initial data (from the spin-up period).The precipitation data were multiplied by log-normal distributed noise (mean = 0, std = 0.7) (Yilmaz et al., 2011).We generated the random noises of shortwave radiation, longwave radiation and air temperature using a multi-normal distribution, in which the cross correlation errors were imposed with mean of ([0.0, 1.0, 0.0]) and standard deviation matrix of ([2.0, 0.4, 0.4], [0.4,0.3, −0.6], [0.4,−0.6, 20]) (Reichle et al., 2010).This cross correlation was based on the atmospheric balance between radiation, clouds, and air temperature (Reichle et al., 2010).Using this multivariate sampling, we can generate physically consistent perturbations like a positive perturbation of the shortwave radiation, a negative perturbation of the longwave radiation and a positive perturbation of air temperature.The random noises imposed on air temperature, shortwave radiation and longwave radiation were additive, multiplicative and additive, respectively.Initial soil moisture was perturbed using an additive normal distributed noise (mean = 0.0, std = 0.02 m 3 m −3 ).The soil sand fraction, soil clay fraction, soil organic fraction and maximum fractional saturated area were perturbed using the multiplicative truncated normal distributed noise (mean = 1.0, std = 1.0, lower = 0.8, upper = 1.2).As the perturbation of the soil parameters resulted in an adequate ensemble spread, the model prognostic variables were not perturbed, as was for example done in Kumar et al. (2009).This resulted in an ensemble of model outputs which is the input for the LETKF algorithm.

Data assimilation strategies
Three assimilation strategies were evaluated, in which different local observation selection options were used to include the spatially correlated observations into the analysis scheme: (1) only one observation was used for each grid cell (1 Obs); only the closest observation was included; (2) no more than five observations were used for each grid cell (5 Obs); in addition to the closest observation the next closest four observations were included; (3) no more than nine observations were used for each grid cell (9 Obs), eight additional observations were included in the data assimilation procedure.Figure 2 summarizes the data assimilation procedure for the covered and uncovered grid cells.
The above three strategies were used for both covered grid cells and uncovered grid cells.All uncovered grid cells with sufficient correlated observations in the neighborhood were updated, and the model grid cells without sufficiently correlated observations were not updated.The impacts of three different observational coverages (as represented by the three different masks) on the assimilation results were also analyzed.
The soil moisture assimilation was only concerned with updating the liquid soil moisture content and the synthetic observations were also generated from the liquid soil moisture.The update of soil moisture content was constrained such that the sum of liquid soil moisture content and ice content was never larger than the soil porosity.The soil moisture of the surface soil layer (5 cm depth) was updated using the corresponding observations.The soil moisture contents of the other nine unobserved lower soil layers were updated based on the cross correlation between the observed upper soil layer and the unobserved layer.
It should be noted that some of the observations used in the analysis at one grid cell are also used in the analysis of another neighboring grid cell.This imposes a smoothing effect from one grid cell to the next.The threshold value of correlogram to include observations was set equal to 0.1; it means that all the observations with a correlogram larger than 0.1 were considered to be included in the data assimilation.The small threshold of 0.1 guarantees that almost all grid cells have their associated correlated observations.In this study, the spatial correlogram values of the nine observations range

Results and discussion
In order to evaluate the quality of the obtained results, the Root Mean Square Error (RMSE) and the Nash-Sutcliffe model efficiency (NSE) coefficient were calculated.Because RMSE values are usually affected by the mean bias or the mean variation differences, we added the NSE as another measure (Reichle et al., 2010).NSE values represent the correlation between the estimation and the observation.RMSE and NSE values for each grid cell were calculated according to: where "Estimated" is the ensemble mean of open loop run or the ensemble mean after assimilation.N is the number of time steps, it is 2208 for this study.The smaller the RMSE value and the larger the NSE value are, the better the assimilation results will be.
First we calculated the RMSE and NSE values of each grid cell in the whole time window.Then the mean RMSE and mean NSE for the covered and uncovered grid cells were calculated separately.The 95 % confidence intervals of the mean RMSE (NSE) were also calculated on the basis of the RMSE (NSE) values at individual grid cells using the Bayesian methods (Oliphant, 2006).
Figure 3 shows the mean soil moisture RMSE values for open loop simulations and the assimilation results under different amounts of local observations (1, 5 or 9), both for covered and uncovered grid cells at 10 cm depth (Fig. 3a, c and e) and 30 cm (Fig. 3b, d and f) depth.These results are given for 70 %, 47 % and 31 % observational coverage and include the 95 % confidence intervals.Figure 4 shows the corresponding results at 50 cm depth and 80 cm depth.
Figures 5 and 6 present results for the same simulation setups, observational coverages and soil depths as Figs. 3 and 4, respectively, but now the mean soil moisture NSE values are given.
The results of Figs. 3 and 4 illustrate that the characterization of soil moisture contents both at the covered and uncovered grid cells were improved with data assimilation.The decrease of RMSE at the covered grid cells is larger than that at the uncovered grid cells.Comparing the results for different observation strategies shows that 5 Obs and 9 Obs perform better than 1 Obs for 70 % observational coverage and 47 % observational coverage.5 Obs performs better than 1 Obs and 9 Obs for 31 % observational coverage.5 Obs is always the best strategy for all cases.The NSE values reported in Figs. 5 and 6 also demonstrate that the 5 Obs strategy yields the best results.
The basin scale mean RMSE values and mean NSE values for all grid cells (including all covered and uncovered grid cells) can give us additional insight in the performance of the different assimilation strategies.Figure 7 shows the mean basin scale soil moisture RMSE values for open loop simulations and the other three assimilation results of 1, 5 and 9 local observations used at all grid cells for 10 cm (Fig. 7a), 30 cm (Fig. 7b), 50 cm (Fig. 7c) and 80 cm depth (Fig. 7d).These results are presented for 70 %, 47 % and 31 % observational coverages.Figure 8 shows the corresponding mean basin scale soil moisture NSE values.
The RMSE values reported in Fig. 7 illustrate again that 5 Obs is the best assimilation strategy.A comparison of the results for the 5 Obs strategy for different observational coverages illustrates that the results for 70 % observational coverage are superior to the other ones.Surprisingly, the results for 31 % observational coverage are slightly better than the ones for 47 % observational coverage.The results for the NSE values (Fig. 8) are consistent with the ones for Fig. 7.
In order to compare the spatial distribution of changes of RMSE values, we show the basin scale 10 cm depth soil moisture RMSE values from the open loop simulations and the other three assimilation results for 1, 5 and 9 local observations at 70 % observational coverage (Fig. 9). Figure 10 shows the spatial distribution of 10 cm soil moisture RMSE values from the open loop simulations and the other three assimilation results for 5 observations at 70 %, 47 % and 31 % observational coverages.In Fig. 9, there are still some regions with increased RMSE-values for the 1 Obs strategy (Fig. 9b), but these regions disappear when 5 or 9 observations were used in the data assimilation.Results for 5 Obs and 9 Obs are very similar.Figure 10 shows high RMSE regions at the uncovered grid cells for 47 % and 31 % observational coverage.Nevertheless, the RMSE values are smaller than those of the open loop simulations (Fig. 10a).
The results for the surface soil moisture characterization illustrate that the estimations for covered and uncovered grid cells can be improved when for a certain grid cell surrounding observations were included in the estimation.The estimations at the uncovered grid cells were also improved when the surrounding correlated observations were used.Results are better if more observations were included in the data assimilation, but more than five observations of surface soil moisture did not much improve the estimations in this case.
The lower soil layers (10 cm, 30 cm, 50 cm and 80 cm) were also updated based on the cross correlation between the surface soil layer and lower soil layer in the LETKF.The improved results in the lower layers prove that the vertical correlations among the soil layers can be useful to transfer the surface observation to the lower layers.The improved results at the uncovered grid cells also prove that the horizontal correlations among the observation data can be useful to transfer the observation data from the observed grid cells to the unobserved grid cells.Therefore, the combination of the vertical correlations and the horizontal correlations in the soil moisture data assimilation is helpful for improving the characterization of soil moisture heterogeneity.
In addition to including the spatial dependence of observed state variables, it may be worthwhile to also include temporal dependence of land surface observations in data assimilation schemes for an improved estimation of state variables.Experimental data clearly shows that land surface variables, such as soil moisture and soil temperature are correlated  in time.Temporal stability of soil moisture (Brocca et al., 2010a;De Lannoy et al., 2006) has been extensively studied.Dunne and Entekhabi (2005) studied the ensemble Kalman smoother in the soil moisture assimilation, but the potential of temporal correlation has probably not yet been fully explored in the framework of data assimilation schemes.
Moreover, land cover and soil type-dependent spatial correlations may improve the findings of this study.
The spatial interpolation methods for the forcing data and the spatial resolution of the input data also influence the spatial correlation characteristics of the model.The 1 km spatial resolution we chose in this study is finer than the common spatial correlation used in the land data assimilation.The results of this work rely on the specific spatial correlation characteristics of input data.The spatial correlation range of high resolution data is shorter than that of coarse spatial resolution data.It is necessary to study what could be the contribution of spatial correlation for soil moisture assimilation in (very) high resolution land surface models.The synthetic experiment typically overestimates the performance of the method because conceptual uncertainty and model uncertainty are neglected from the analysis (Kumar et al., 2009).Therefore, a real-world case study with more data is desirable.
The masks used in this study cannot represent the true spatial coverage of microwave sensors, which is easily influenced by the spatial distribution of vegetation.More than 90 % of the grid cells in this study area are covered by short grass and we think that the masks used in this work are more suitable for representing the proposed idea to evaluate the impacts of spatial correlations.
An operational application of the presented spatial correlation patterns may be feasible for the assimilation of surface soil moisture in weather forecast models like the ECMWF Integrated Forecast System (Drusch et al., 2009).However, the computational issue for the large scale high resolution data assimilation will be challenging, because we have to fit the semivariograms for soil moisture at each time step.The fitting algorithm will increase the computation remarkably.

Summary
We carried out a set of synthetic observing system simulation experiments to explicitly include the spatial correlation characteristics of soil moisture in the land data assimilation for updating surface soil moisture at locations without remote sensing observations.Two different spatial correlation schemes were evaluated: (1) for the grid cells covered with observations, additional observations around the grid cells were included in the data assimilation; (2) for the grid cells without observations, observations surrounding these grid cells were used.First, we characterized the spatial correlations of soil moisture on the basis of a geostatistical semivariogram analysis.Second, the fitted semivariogram models and the corresponding correlogram values were used to choose the correlated observations for each model grid cell in LETKF during assimilation.Third, the selected observations were sorted according to their correlogram values and the simulation experiments considered different numbers of observations to be used in the data assimilation.Our study showed that including the horizontal spatial dependence of surface soil moisture observations in the data assimilation scheme improved the estimation.The model estimations at the grid cells without observations can be improved using the geostatistical model and LETKF.In this particular case, best results were obtained by assimilating five observations.The best assimilation performance was achieved for a high observational coverage, but the results also depend on the spatial distribution of the observation data.For example, the results for 47 % observational coverage are worse than the results for 31 % observational coverage, because the spatial distribution for the 31 % observational coverage is more uniform than the one for the 47 % observational coverage: for the 47 % observational coverage we can find a large gap between the patches of observations (Fig. 1c).Therefore, the lower observational coverage can be offset by the horizontal spatial correlation.
In summary, the proposed spatial correlation method will be helpful to the soil moisture data assimilation in two ways: increasing the spatial availability of observations and offsetting the impact of spatial registration error in the remote sensing data processing.Since the spatial correlations are the intrinsic characteristics of many land surface variables, it is worth extending the proposed methods to the land data assimilation of other observation data.

Fig. 9 .
Fig. 9.This figure shows the basin scale 10 cm soil moisture RMSE values for open loop simulations (a) and three assimilation strategies of 1 observation used (1 Obs) (b), 5 observations used (5 Obs) (c) and 9 observations used (9 Obs) (d) in case of 70 % observational coverage.

Fig. 10 .
Fig. 10.This figure shows the basin scale 10 cm depth soil moisture RMSE values for open loop simulations (a) and three assimilation strategies of 70 % observational coverage (b), 47 % observational coverage (c) and 31 % observational coverage (d) for the 5 Obs strategy.