Comparing the ensemble and extended Kalman filters for in situ soil moisture assimilation with contrasting conditions

Two data assimilation (DA) methods are compared for their ability to produce an accurate soil moisture analysis using the Météo-France land surface model: (i) SEKF, a simplified extended Kalman filter, which uses a climatological background-error covariance, and (ii) EnSRF, the ensemble square root filter, which uses an ensemble background-error covariance and approximates random rainfall errors stochastically. In situ soil moisture observations at 5 cm depth are assimilated into the surface layer and 30 cm deep observations are used to evaluate the root-zone analysis on 12 sites in south-western France (SMOSMANIA network). These sites differ in terms of climate and soil texture. The two methods perform similarly and improve on the open loop. Both methods suffer from incorrect linear assumptions which are particularly degrading to the analysis during water-stressed conditions: the EnSRF by a dry bias and the SEKF by an oversensitivity of the model Jacobian between the surface and the root-zone layers. These problems are less severe for the sites with wetter climates. A simple bias correction technique is tested on the EnSRF. Although this reduces the bias, it modifies the soil moisture fluxes and suppresses the ensemble spread, which degrades the analysis performance. However, the EnSRF flow-dependent background-error covariance evidently captures seasonal variability in the soil moisture errors and should exploit planned improvements in the model physics. Synthetic twin experiments demonstrate that when there is only a random component in the precipitation forcing errors, the correct stochastic representation of these errors enables the EnSRF to perform better than the SEKF. It might therefore be possible for the EnSRF to perform better than the SEKF with real data, if the rainfall uncertainty was accurately captured. However, the simple rainfall error model is not advantageous in our real experiments. More realistic rainfall error models are suggested.


Introduction
Soil moisture has an important influence on heat and water exchanges between the land and the atmosphere, which makes it an important factor in Numerical Weather Prediction (NWP) (Dharssi et al., 2011). It is also important for a variety of other applications, including drought monitoring, crop irrigation and water management. 5 The main objective of data assimilation (DA) for land surface models is to assimilate observed surface soil moisture to produce an analysis of root-zone soil moisture. Rootzone soil moisture is usually of more interest than surface soil moisture because it has a much greater water capacity and a far longer memory. The interest in soil moisture DA is partly driven by the wealth of satellite data available from low-frequency microwave instruments, which can be used to retrieve global-scale surface observations. Recent satellite launches have considerably improved coverage over the last decade, namely the Advanced Scatterometer (ASCAT) instrument on board the METOP satellites (Wagner et al., 2007), the Soil Moisture and Ocean Salinity (SMOS) Mission (Kerr et al., 2001) and the Soil Moisture Active Passive (SMAP) Mission (Entekhabi et al., layer model (Decharme et al., 2011) has been developed to replace the current 3 layer force-restore land surface model. The use of the multi-layer model will have implications for the computational costs of the DA methods over a large domain. Given that the SEKF requires a model Jacobian calculation for each prognostic layer, introducing multiple layers will substantially increase the computational burden of these calculations. It may in fact be more cost-effective to implement an EnKF method.
The EnKF has extensively been compared with the EKF on land surface models for assimilating soil moisture observations. Experiments have been conducted with both synthetic observations (e.g. Reichle et al., 2002) and real observations (e.g. Muñoz Sabater et al., 2007). In most cases the EnKF delivered modest improvements over the 20 EKF. It was not obvious beforehand which of these methods would perform better, since they both make incorrect linear assumptions in the analysis update: the EKF by using a linear model and the EnKF by using a linear combination of ensemble members.
The experiments in this paper are partly motivated by studying the results of the experiments by Muñoz Sabater et al. (2007);Draper et al. (2009); Mahfouf et al. (2009). 25 They performed comparisons of the SEKF and the EKF on previous versions of the land surface model used by Météo-France. They found that not only is the SEKF less computationally expensive than the EKF, but that it's performance is slightly better. Muñoz Sabater et al. (2007) also demonstrated that a simple 1-D variational DA of model errors and forcing errors. The incorrect specification of these errors leads to sub-optimal filter performance (Crow and Van den Berg, 2010;Reichle et al., 2008;Crow and Van Loon, 2006).
Various formulations of the EnKF exist, which differ in the way they perform the analysis update. This study examines an implementation called the Ensemble Square Root 10 Filter (EnSRF, Whitaker and Hamill, 2002). The EnSRF is chosen because it does not perturb the observations, which would incur sampling errors. In this paper, the EnSRF is compared with the SEKF in terms of their ability to provide an accurate soil moisture analysis. Although the EnSRF necessarily uses an ensemble of model trajectories, the experiments focus on the deterministic analysis from the ensemble mean. The aim 15 of this study is to compare and analyze the performances of these DA methods by examining the impact of: i. Random errors in the forcing; ii. the Gaussian assumptions made by the DA methods; iii. different soil textures; 20 iv. a flow-dependent background-error covariance.
The cumulative distribution function (CDF) matching technique is used in this study, which bias corrects the observations with respect to the model simulation (Calvet and Noilhan, 2000;Reichle et al., 2004;Drusch et al., 2005). However, ensemble perturbations can introduce additional biases as a result of the nonlinear water fluxes (Ryu et al., 25 2009). A simple bias correction technique is also tested on the EnSRF as a means of reducing the biases caused by ensemble perturbations (Ryu et al., 2009 Twelve grassland sites over southwest France, where in situ observations are available (the SMOSMANIA network, Albergel et al., 2008), are used to compare the methods. These sites include various soil textures that can influence soil water transfers. Only surface soil moisture observations are assimilated. The performance is validated by comparing the root-zone soil moisture analysis (80 cm depth) 5 with 30 cm deep in situ observations. The results are collected over a 3 year period (2008)(2009)(2010).
The methods and materials are described in Sect. 2. In Sect. 3.1, the results of the experiments without DA are presented. The objective here is to show the physical mechanisms behind the ensemble perturbation bias, and the impact of applying a bias 10 correction scheme. The results of the DA experiments are presented in Sect. 3.2. In Sect. 3.2.1 the DA methods are compared using a synthetic twin experiment designed to represent only random errors in the precipitation forcing. This is a test of the ability of the DA methods to represent these errors. Then in Sect. 3.2.2 the DA methods are tested using real in situ observations. Section 4 discusses the results and Sect. 5 15 summarises the main conclusions of the experiments.

Methods and materials
The ISBA-A-gs model and the atmospheric forcing are introduced in Sects. 2.1 and 2.2 respectively. The observations are described in Sect. 2.3. The DA methods are explained in Sect. 2.4. The performance diagnostics of the DA methods are given in 20 Sect. 2.5. Finally, the experimental setup for the DA methods is presented in Sect. 2.6.

ISBA-A-gs model
The experiments were all conducted on version 7.2 of SURFEX, which incorporates the "Interactions between Soil, Biosphere and Atmosphere" (ISBA) land surface model (Noilhan and Mahfouf, 1996). This model is based on the force-restore method of Dear- 25 (1977). The A-gs version of ISBA accounts for leaf-scale physiological processes, including the effects of carbon dioxide concentration and photosynthesis (Calvet et al., 1998). The simulated leaf biomass is used to compute the leaf area index (LAI), a key variable governing plant transpiration. The "NIT" version of the model is applied in this work, which can dynamically simulate LAI evolution (Gibelin et al., 2006). The sea-5 sonal variability in LAI has a significant impact on the soil moisture variables (Barbu et al., 2011). The three-layer version of ISBA-A-gs is used in this study (Boone et al., 1999). The three soil moisture variables are defined here with the depths used for the experiments: -The surface soil moisture (WG1), with depth d 1 of 1 cm. But the effective depth is 10 d 1 /C 1 , where C 1 is the surface restore coefficient, which depends on soil texture and soil moisture; -The root-zone soil moisture (WG2), with depth d 2 of 0.8 m, which includes WG1; -A recharge layer (WG3) with thickness d 3 of 0.2 m, which exists below WG2 (see Fig. 1).

15
All the variables are measured in terms of volumetric soil moisture concentration (m 3 m −3 ). A diagram illustrating the soil moisture fluxes is presented in Fig. 1. The surface layer (WG1) and the root zone (WG2) layers are forced by interactions with the atmosphere and restored towards an equilibrium value. At equilibrium, the gravity forces match the capillary forces. The drainage from WG2 supplies water into a recharge zone 20 (WG3), which conserves the total water volume.
In these experiments 12 model points were used, which are the closest points to the 12 grassland in situ observation sites. In the operational setup each grid-cell is split into 12 so-called "patches" corresponding to different surfaces, but for these experiments only grassland was present. The model points are such that the nearest observation to Introduction

Forcing
The "Système d'Analyse Fournissant des Renseignements à la Neige" (SAFRAN) forcing was used, which is derived from a meso-scale analysis system with a horizontal resolution of 8 km (Durand et al., 1993). This provides values of precipitation, wind, incoming short-wave and long-wave radiation, relative humidity and air temperature, 5 mostly derived from a surface network of weather stations. The hourly forcing values were input into the ISBA-A-gs model for the 12 gridpoints. We have adopted a version of SAFRAN that enables the additional use of 3000 climatological observing stations over France, including rain gauges (Quitana-Ségui et al., 2008;Vidal et al., 2010).

Experiments with real observations
For the experiments with real observations, in situ observations at 12 grassland sites in Southwest France were assimilated. These experiments are hereafter referred to as "real experiments". These sites are part of the Soil Moisture Observing System Meteorological Automatic Network Integrated Application (SMOSMANIA) network (Calvet 15 et al., 2007;Albergel et al., 2008). The observation sites are spaced approximately 45 km apart. The SMOSMANIA network was selected partly because of the large variability in the soil textures between the different sites. In particular, the Sabres site and the Narbonne site represent opposite extremes of soil texture. This is linked to the ratio of the sand 20 and clay contents. The Sabres soil has a high sand to clay ratio, while the Narbonne soil has a low sand to clay ratio. The Sabres soil was measured as having about 932 g Kg −1 of sand and 39 g Kg −1 of clay at 5 cm depth (with similar levels for deeper layers), while the Narbonne soil was measured as having 262 g Kg −1 of sand and 464 g Kg −1 of clay at 5 cm depth (Albergel et al., 2008). The other station soil types have combinations of 25 sand and clay between these values (Albergel et al., 2008). The stations also contain Introduction

Synthetic experiments
For the synthetic experiments the in situ observations were not used. Instead the synthetic WG1 observations were obtained from a model simulation with perturbed precipitation forcing (described in Sect. 2.6.2) and the addition of a random normally distributed observation error. Prior to DA, the RMSE of the model simulation in the syn- 15 thetic experiments (compared with the synthetic observations) was roughly 10 % of the size of the model RMSE in the real experiments (compared with the real observations). Therefore the observation error for the synthetic experiments was specified to be 10 % of the higher value used in the real experiments (σ o = 0.05 (w fc − w wilt )).

20
The DA methods employed in this work are derived from the Kalman Filter (Kalman, 1960). The vector of prognostic variables is x = (WG1, WG2). The background state (x b (t i )) at the current time (t i ) is a nonlinear model propagation of the previous analysis: where M is the (nonlinear) ISBA-A-gs model. The Kalman Filter analysis update is: where H is the linearized observation operator, B is the background-error covariance matrix and R is the observation-error covariance matrix. These matrices measure the 10 expected errors and the covariances are a measure of the correlations in the errors between the different variables (i.e. between WG1 and WG2). The R matrix is assumed to be diagonal i.e. covariances equal to zero. Given that each point used in the experiments was independent, none of the DA methods could take into account background error-covariances between different gridpoints.
In the experiments the observation was present at the end of the assimilation window, so the background-error covariance needed to be propagated from the beginning to the end of the window. This is implicitly calculated via H for the SEKF (Sect. 2.4.1) and via the ensemble perturbations for the EnSRF (Sect. 2.4.2). A summary of the DA methods is given in Table 1.

SEKF
The Simplified Extended Kalman Filter (SEKF, Mahfouf et al., 2009) is based on the EKF (Jazwinski, 1970). The SEKF simplifies the EKF by using both a diagonal and climatological background-error covariance at the start of the assimilation window.
The SEKF was originally designed to assimilate screen-level temperature and hu-25 midity, which are not prognostic variables and therefore cannot be assimilated directly 7363 Introduction  (Mahfouf et al., 2009). For this reason the SEKF uses the linear observation operator H to relate the observed quantity to the prognostic variables. Following the notation of Mahfouf (2010), there are two steps in the calculation of H. The first step H 1 is simply a transformation into observation space (H 1 = (1, 0) in our case). The second step is the calculation of the Tangent linear version of the nonlinear model (M). This linear 5 model is a finite difference approximation between a perturbed and reference nonlinear model simulation: where ∆x l is a model perturbation applied to model variable l . Therefore the Jacobian between the observation k and the model variable l is simply: This formulation of H implicitly propagates the B matrix from the start of the assimilation window to the time of the observation at the end of the window (HBH T = H 1 MBM T H T 1 ). Although screen-level temperature and humidity observations are not assimilated in this study, the same formulation is applied to soil moisture observations. 15 The ∆x l size is important -it needs to be sufficiently small that the linear approximation in deriving M is satisfactory but large enough to not incur significant computational round-off errors. The linear approximation can be measured by the magnitude of the difference between H kl for positive and negative values of ∆x l (Walker and Houser, 2001;Balsamo et al., 2004), with values close to zero indicating linear behavior and  necessary for them to impose a maximum limit on H 12 , which they set equal to 1.0. The same perturbation size and limit were used in our experiments. The validity of the linear approximation was not tested explicitly in this study. Instead, the WG2 Kalman gain was compared before and after imposing the 1.0 Jacobian limit.

5
It was clear when the linear assumption broke down because the WG2 Kalman gain was noticeably reduced by imposing the limit. The WG2 Kalman gain is defined by: where σ o and σ b are the expected observation and background errors. The R matrix in 10 our study is equal to the scalar (σ o ) 2 .

EnSRF
The EnKF (Evensen, 1994) is a way of representing the uncertainty in the prognostic variables by using an ensemble of model trajectories. This circumvents the high computational cost of explicitly storing and propagating the background-error covariance 15 for a large model dimension. Each ensemble member is propagated using the nonlinear model. The deterministic analysis comes from the ensemble mean. The ensemble background-error covariance is defined as: and the perturbation matrix (of dimension n × m) is given by:  The serial ensemble square root filter (EnSRF) was introduced by Whitaker and Hamill (2002) as a means of avoiding the sampling error from the perturbed observations. The ensemble perturbations are defined by: where 10 α = 1.0 Here R and HP b H T are scalars representing the variances at the observation location.
The variable of interest WG2 is linked to WG1 via the Kalman gain: where X WG1 X T WG2 represents the cross-product between the WG1 and WG2 ensemble 15 perturbations. The WG1 and WG2 ensemble spreads are defined by X

Performance diagnostics
The Root Mean Square Error (RMSE), the Anomaly Correlation Coefficient (ACC) and the Bias for the WG2 variable were used to determine the performance of the DA The time t i is the daily time, with t 0 equal to 1 January 2008 and t N equal to 1 January 2011. The ACC climatologies (H(x a ) and y o ) are calculated as a 3 month moving 5 average, which takes into account seasonal variability. The RMSE is a measure of both the random and systematic components of the error. The ACC represents the seasonal correlations, which are unaffected by systematic errors, while the bias measures the systematic errors. The computational cost of the DA methods was not assessed because the ensemble DA methods were not optimized 10 to take advantage of parallel computing. Moreover, Muñoz Sabater et al. (2007) already estimated the computational cost of similar algorithms on a previous version of ISBA-A-gs. They found that the main wall-clock time constraints of the EKF and EnKF algorithms were the model simulations, rather than the DA. Indeed, in our study the SEKF (which requires three simulations for the model Jacobian calculations) did have 15 about the same wall-clock time as the EnSRF with 3 ensemble members.
In the experiments where DA is not applied, the perturbed model simulation is measured against the unperturbed model simulation. In the DA experiments, the performance of the model state x is always measured against independent observations of WG2. The diagnostics are averaged over the period 2008-2010.  The represention of model and forcing errors within DA methods can significantly influence their performance. The SEKF uses a climatological background-error covariance matrix. This matrix theoretically captures the total contribution from background errors and additive model/forcing errors. In this study, the SEKF background-error variances (σ b ) for WG1 and WG2 were tuned to produce the best ACC. The WG1 and WG2 10 background-error standard deviations were tuned with sizes: with λ b 1 and λ b 2 varying between 0.0 and 0.5, in steps of 0.05. In the synthetic experiments, the σ b values were calibrated with values a 10th of the size of those above (as 15 explained in Sect. 2.3.2). The background error variances were scaled by (w fc −w wilt ) for each site. This assumes that a linear relationship exists between the range of possible soil moisture values and the background errors (Mahfouf et al., 2009).

EnSRF calibration
The background errors in the EnSRF are captured by the ensemble background-error 20 covariance matrix (P b ), but this does not include the contribution from model errors and forcing errors. Hamill and Whitaker (2005)  each ensemble member after the daily analysis update. The values of for WG1 and WG2 were tuned to produce the largest ACC, with sizes: inflation was implemented using a 1st order auto-regressive model. It was decided to use time correlations of τ = 1 day for WG1 , and τ = 3 days for WG2 . This is similar to previous studies (Mahfouf, 2007;Muñoz Sabater et al., 2007;Reichle et al., 2002) and is consistent with the longer time correlations of the WG2 variable compared with WG1. The ensemble background-error covariance matrix was further modified to allow for 10 random errors in the precipitation forcing. Following Muñoz Sabater et al. (2007), the precipitation was perturbed using Gaussian noise with a standard deviation equal to 50 % of the precipitation ( Pr = 50 % Pr). The pdf was truncated in order to prevent negative precipitation values. The precipitation forcing was perturbed for each ensemble member and for each hour of the 24 h assimilation window. Other forcing parameters 15 were not perturbed, since it was found through a sensitivity study that perturbing these values had little impact on the model simulations. The results for the sensitivity study are presented in Table S1.2 of Supplement 1. In the real experiments, the performance of the EnSRF with perturbed precipitation was compared with the performance without perturbed precipitation. 20 An ensemble size of 20 members was chosen for the calibration of the additive inflation. The calibrated EnSRF was then tested with ensemble sizes ranging from 3 to 200 in order to explore the effects of sampling errors.

EnSRF bias correction
A bias correction technique was tested on the ensemble DA methods as a means of ). The perturbation bias correction uses the forecast from the previous analysis ensemble mean as an anchor to modify the background ensemble: Equation (14) prevents the mean of the ensemble forecasts from becoming biased with respect to the forecast of the analysis ensemble mean. The perturbation bias correction 5 was implemented on all three layers before the analysis update step.

Experiment list
The list of experiments is displayed in Table 2. The first experiment (Ens) was performed by perturbing an ensemble without DA in order to investigate the cause of the perturbation bias. The bias correction scheme (Eq. 14) was then tested on this ensem-10 ble, which is labelled as Ens bc . Thereafter the synthetic and real DA experiments are denoted by the subscripts S and R respecively. For each experiment the calibrated error variances are specified. For the real experiments the EnSRF was tested with three different configurations: EnSRF R1 is the baseline EnSRF without perturbed precipitation forcing nor bias correction. The EnSRF R2 and EnSRF R3 experiments include perturbed 15 precipitation forcing and bias correction respectively. Note that in the synthetic experiments, the EnSRF S was designed to capture the precipitation forcing errors perfectly. The same precipitation error specification was used to estimate the precipitation errors in the real experiments (EnSRF R2 ).

Investigating the perturbation bias
An ensemble of model trajectories was perturbed by adding random perturbations to WG2 with standard deviation 0.025 m 3 m −3 . This is a similar order of magnitude to the 7370 Introduction introduced, which is not expected because the perturbations are randomly sampled with zero mean. The origin of this bias is investigated by linking the physical processes that underpin the bias to changes in the ensemble spread. The site-averaged (averaged over the 12 stations) and time-averaged water content of the total reservoir (WG1 + WG2 + WG3) is 243 mm for the open loop and 239 mm for 10 Ens. This water loss of 4 mm (of Ens compared with the open loop) represents a small fraction of the total reservoir. However, this dry bias is strongly dependent on the soil texture and the season: the sites with clay soils have a greater dry bias than the sites with sandy soils, particularly during the summer period. It therefore makes sense to compare the results for Sabres with Narbonne, given that Sabres is very sandy while 15 Narbonne has a high clay content. Indeed, between July and September, the average bias for Narbonne (Sabres) is 13 (3) mm. Between January and March the average bias for Narbonne (Sabres) is 7 (3) mm.
The impact of the perturbed ensemble on each individual layer is demonstrated by Fig. 2, which shows the monthly, annually and site-averaged net (a) WG1, (b) WG2 20 and (c) WG3 for Ens and the open loop. The net water amount represents the concentration in m 3 m −3 for the layer scaled by the depth of the layer (in mm). The dry bias (Ens relative to the open loop) is evident in WG2 during the entire period and it peaks between July and September. There also appears to be a dry bias in the winter in WG3, but there is no significant bias in WG1. 25 The seasonal water fluxes are investigated in order to explain the seasonal variabilities in the bias. Water is depleted from the reservoir via either drainage, evaporation, transpiration or surface runoff. Surface runoff is neglected in this investigation because it is relatively small compared with the other processes. The site-averaged monthly  Fig. 2d and e respectively. The bare-soil evaporation is most active during summer, which corresponds with the maximum insolation. The transpiration is largest in spring and early summer, when the vegetation is most developed and before the onset of water-stressed conditions in late summer. Transpiration dominates over bare-soil evaporation, since 5 the grassland vegetation type covers 90 % of the land surface. These two processes add up to an evapotranspiration curve which peaks in May and June (Fig. 2d). In contrast to evapotranspiration, the drainage is most active during the winter and is absent during the late summer/early autumn period (Fig. 2e); since drainage only occurs when the soil moisture is near the field capacity.

10
The main effect of the ensemble perturbations (Ens) on evapotranspiration relative to the open loop is an enhancement in July and August and then a reduction in September and October (Fig. 2d). This effect is also clearly evident in Fig. 2f, which shows the total difference in soil water depletion between Ens and the open loop. The effect of perturbing the ensemble on drainage is a slight increase relative to the open loop 15 between February and June (Fig. 2e). The annually averaged discrepancy between the Ens and open loop total soil water depletion is about 4 mm, which accounts for the dry bias in the Ens total reservoir.
It is possible to link the seasonal changes in soil water depletion to changes in the ensemble spread. The ensemble members, ensemble mean and the open loop for 20 2009 are shown for the Sabres site and the Narbonne site in Fig. 3a and b respectively. Also shown are the wilting point and the field capacity for the two sites. The larger water capacity of clay relative to sand explains the greater field capacity and wilting point of Narbonne compared with Sabres. During prolonged wet periods, which tend to occur in winter, the ensemble members converge because the soil reservoir reaches the field 25 capacity. This corresponds to a reduction in the ensemble spread. Between the spring and autumn the largest fluctuations in soil moisture occur due to changes in rainfall and insolation. During this period the soil moisture simulation becomes sensitive to perturbations in the initial conditions, which is reflected by the large WG2 ensemble spread. The Narbonne soil has a much larger ensemble spread than the Sabres soil, particularly in autumn. Now the seasonal changes in the bias can be related to changes in the ensemble spread and the soil texture. The Ens WG2 ensemble mean is clearly negatively biased (compared with the open loop) for Narbonne during much of the period (Fig. 3b), 5 most especially when the open loop is near the wilting point during summer and autumn. Near the wilting point the WG2 ensemble spread becomes negatively skewed, which occurs because the negative perturbations remain almost unchanged, but the extra water from the positive perturbations is removed rapidly by transpiration. This is evidenced in Fig. 4a for Narbonne, which for clarity shows only 4 of the 20 ensemble members between June and September 2009. The evapotranspiration for the same members is shown in Fig. 4b. The evapotranspiration is very small for the open loop and for the ensemble members below the wilting point. The members above the wilting point experience strong evapotranspiration. This effect is partly linked to the phenology; under water-stressed conditions the vegetation roots readily absorb excess water 15 that becomes available, which increases the transpiration and the LAI (not shown). The Ens negative bias is larger for Narbonne than Sabres because of the greater ensemble spread for Narbonne (compare Fig. 3a with b).
The impact of the ensemble perturbations on drainage is most significant near the field capacity. Figure 4c and d shows the soil moisture and drainage respectively for 20 Narbonne, between February and March 2009. For clarity only 5 of the ensemble members are shown. When the ensemble members are greater than the field capacity then the drainage rapidly increases, which suppresses the ensemble spread. The ensemble members below the field capacity have a drainage near zero. This implies that when the open loop is below the field capacity, and some ensemble members are above 25 the field capacity, the ensemble mean loses water relative to the open loop. This often occurs during the spring and autumn months, which agrees with Fig. 2e.
The simple bias correction scheme (Eq. 14) was tested on the ensemble and the results are also shown in Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | than a 10th of the size and the RMSE reduced by half compared with the original Ens. Figure 2a-c shows the net soil moisture content of each layer for the bias corrected ensemble (Ens bc ). The bias correction has effectively removed the bias from all three layers. The soil water depletion is shown in Fig. 2d-f for Ens bc . It appears that the applica-5 tion of the bias correction scheme has inadvertently increased the soil water depletion of Ens bc relative to the open loop. A side-effect of the increase in water depletion processes is a reduction in the ensemble spread. The monthly average spread is shown in Fig. 5 for (a) Sabres; and (b) Narbonne. The bias is much greater for Narbonne than Sabres (comparing Fig. 3a with b). Therefore the ensemble spread is halved by the bias correction for Narbonne, but only slightly reduced for Sabres. The reduced ensemble spread has important repercussions for DA, where the ensemble spread determines the weight to give to the background. This is investigated in Sect. 3.2.2.

DA experiments
3.2.1 SEKF vs. EnSRF: synthetic random precipitation errors 15 The EnSRF and the SEKF were first tested with only errors in the precipitation forcing.
Recall that the observations were taken from a simulation with perturbed precipitation with a small random observation error (Sect. 2.3.2). For this experiment the EnSRF used a perfect stochastic representation of the precipitation forcing errors. The SEKF cannot capture these forcing errors directly, but instead the climatological background-20 error variances were calibrated to produce the best performance. The SEKF used the same unperturbed precipitation as the open loop. Firstly, the monthly and daily precipitation for the EnSRF and the open loop are described for both Sabres and Narbonne. The Sabres site is the furthest west of the 12 sites and is near the Atlantic coast, while Narbonne is the furthest east of the 12 25 sites and is on the Mediterranean coast. The precipitation undergoes large seasonal changes during the year. Generally the precipitation is greater during the winter than Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | the summer, but there is much inter-annual variability. Since Sabres is more exposed than Narbonne to weather systems arriving from the Atlantic, Sabres receives more than twice the average rainfall, with a mean monthly rainfall of 99 mm, compared with 44 mm for Narbonne. The EnSRF is able to capture the larger perturbations of the wetter stations with a greater ensemble spread (not shown).

5
The time-averaged and site-averaged WG2 RMSE of the 20 member EnSRF is labelled as EnSRF S in Table 2. The EnSRF S RMSE is about half the size of the open loop RMSE (OL S ). The performance of EnSRF S for various ensemble sizes is demonstrated in Table 3. A gradual improvement in the EnSRF S is apparent in Table 3 as the ensemble size is increased from 3 to 20. The sampling error in the perturbed forcing 10 is the cause of the larger RMSE for the smaller ensemble sizes. The EnSRF S has an ACC close to 1.0 and no significant bias was introduced. In fact the bias is only significant when the ensemble spread is large, which was not the case for the synthetic experiments (not shown). Therefore implementing the perturbation bias correction was not necessary. 15 The SEKF S climatological background-error covariance needed to be calibrated in order to minimize the RMSE. The SEKF S with the optimal calibration is labelled as SEKF S in Table 2. The SEKF S performs slightly better than the open loop, but not as well as the ensemble DA methods. This is expected because the SEKF S does not capture the uncertainty in the precipitation directly, rather it uses larger variances in B to 20 compensate for forcing errors. Table 4 shows the performance of the SEKF S with various background-error covariance specifications, with the bold font showing the optimal calibration.
The monthly average performances of the open loop and the DA methods are shown in Fig. 6a-c. The open loop RMSE is greatest in the spring and autumn seasons 25 (Fig. 6a). The soil moisture is going through a transition from a wet to dry state in spring and from a dry to wet state in autumn, which increases its sensitivity to perturbations in the precipitation. During the winter the WG2 reservoir is close to the field capacity. During the summer the soil moisture is close to the wilting point and there is relatively little precipitation to perturb. Unlike the EnSRF S , the SEKF S climatological background-error covariance does not account for the seasonal variability in precipitation amounts. This is evidenced by examining the Kalman gains. The monthly average WG2 Kalman gain for EnSRF S is displayed in Fig. 7. The EnSRF S Kalman gain is closely correlated with the open loop RMSE, with peaks in 5 late spring and autumn. The SEKF S Kalman gain is plotted in Fig. 7 both before and after the 1.0 limit is imposed on the model Jacobian. In Sect. 2.4.1 it was explained that this limit is exceeded only when the model behaviour is very nonlinear, during which time the SEKF tangent linear approximation is inadequate. In contrast to the EnSRF S Kalman gain, the SEKF S Kalman gain peaks in July. By imposing the limit on the oversensitive Jacobian, the Kalman gain is notably reduced between May and October, which shows that the SEKF S tangent linear approximation is poor during this period. This explains why the SEKF S WG2 RMSE and ACC are worse than the open loop between June and September ( Fig. 6a and b). 15 Firstly, the performance of the calibrated EnSRF is analyzed for the baseline experiment, where only additive perturbations were applied to WG1 and WG2. This is labelled as EnSRF R1 in Table 2. The EnSRF R1 method does perform significantly better than the open loop (labelled as OL R in Table 2). But a large dry bias has been introduced, which is about 25 % of the RMSE and is consistent with the size of the dry bias intro-20 duced by experiment Ens in Table 2. Note that the very small bias for OL R is present because the CDF matching (for bias correcting the observations) was applied over the period 2007-2010, but the results are taken over 2008-2010. The EnSRF R1 method was tested with various ensemble sizes and the results are shown in Table 3. There is no improvement beyond an ensemble size of 20 members. 25 Figure 3c and d is equivalent to Fig. 3a and b but instead shows the EnSRF R1 ensemble, the open loop and the observations for 2009. During December and January, the ensemble for Narbonne (Fig. 3d)  which indicates that the perturbation bias is present. The observations are wetter than the open loop during the summer and are therefore offsetting the perturbation bias. The opposite is true in December and January, when the observations are drier than the open loop, which causes many of the ensemble members to dip well below the wilting point.

5
The SEKF R1 performance is presented in Table 2. In terms of RMSE, SEKF R1 performs marginally better than EnSRF R1 , while EnSRF R1 has a slightly higher ACC than SEKF R1 . The SEKF R1 method is also affected by a negative bias, which is about half the size of the EnSRF R1 bias. The SEKF R1 analysis increments themselves introduce a negative bias through the same mechanisms as the ensemble perturbation bias. But 10 the EnSRF R1 method is affected by the biases introduced by both the ensemble perturbations and the analysis increments, and therefore the EnSRF R1 bias is greater than the SEKF R1 bias. Figure 8a and b shows contour plots of the EnSRF R1 RMSE and ACC respectively, for the range of additive perturbations used to calibrate the method. Figure 8c and d 15 shows equivalent contour plots for the SEKF R1 . Both performance metrics are much more sensitive to the WG2 perturbations than the WG1 perturbations, which is logical given that the WG2 layer is much thicker than the WG1 layer. The SEKF R1 results are less noisy than the EnSRF R1 results ( Fig. 8a and b) because the SEKF is not affected by the noise associated with the finite ensemble size of the EnSRF. The EnSRF and 20 the SEKF were also tested with the smaller observation error of 0.35(w fc − w wilt ), but this did not significantly change the performance of the methods (not shown).
The monthly-averaged and station-averaged RMSE, the ACC and the bias are shown for the open loop, SEKF R1 and EnSRF R1 in Fig. 6d-f respectively. The RMSE in all cases is highest in June and October (Fig. 6d), as this corresponds with the greatest 25 fluctuations in soil moisture. This is also when the most improvement over the open loop occurs. The EnSRF R1 RMSE is slightly degraded relative to the SEKF R1 from April to July as a result of the perturbation bias during this period (evident in Fig. 6f). The superior EnSRF R1 ACC from July to September is explained below. The WG2 Kalman gains for EnSRF R1 and the SEKF R1 are shown in Fig. 7b. The SEKF R1 performs better with a larger WG2 Kalman gain than EnSRF R1 . The EnSRF R1 Kalman gain shows some seasonal variability, with the largest values occurring at the same times as the open loop in June and October (Fig. 6d). The SEKF R1 Kalman gain is shown in Fig. 7b before and after the limit of 1.0 imposed on the Jacobian. The

5
Kalman gain peaks in summer as a consequence of the over-sensitive model Jacobian during water-stressed conditions. This problem with the model Jacobian appears to explain why the EnSRF R1 ACC is higher than the SEKF R1 ACC during July, August and September (Fig. 6b).
The impact of precipitation forcing perturbations on the EnSRF is investigated. This 10 experiment is labelled as EnSRF R2 in Table 2. The perturbed precipitation does not modify the analysis performance significantly compared with the unperturbed case (EnSRF R1 ). A slightly smaller additive inflation is optimal with the perturbed forcing. This indicates that the perturbed forcing is having a similar effect to additive covariance inflation, but without the advantages demonstrated for the idealized experiments 15 (Sect. 3.2.1). Finally, the bias correction scheme is tested on the EnSRF. This experiment is labelled as EnSRF R3 in Table 2. The large bias in Table 2 for the EnSRF without bias correction (EnSRF R1 ) has been approximately halved by applying bias correction. The bias correction technique cannot correct biases introduced by the analysis increments. 20 Therefore EnSRF R3 is affected by a small negative bias. The ACC of EnSRF R3 is degraded relative to EnSRF R1 , which is probably related to unrealistic temporal changes in the ensemble spread that occur as a result of the bias correction (see Fig. 5).

Discussion
The discussion focusses on the knowledge gained from the experiments, refering to the

Stochastic precipitation error representation
The experiments in Sect. 3.2.1 were designed to assess the advantage gained by a perfect stochastic representation of random precipitation errors in the EnSRF over 5 additive background errors in the SEKF. Clearly the EnSRF benefited from the direct representation of the errors. However, in the real experiments the same perturbations gave no advantage to the EnSRF compared with additive covariance inflation alone (compare EnSRF R2 with EnSRF R1 in Table 2). Maggioni et al. (2011) demonstrated that soil moisture simulations are less sensitive to rainfall uncertainty information than the precipitation fields themselves. They attributed this loss of information to two factors: (i) The nonlinear and integrating nature of soil moisture models; and (ii) The dissipative behavior of soil moisture dynamics, which dampens perturbations in the initial conditions. These conclusions agree with our findings. The results from our study suggest that other sources of errors may also 15 limit the usefulness of precipitation uncertainty information. Important sources of errors in this study include the errors in the model physics, the linear assumptions made by the DA methods and the errors in representing observations over a 64 km squared grid. It is likely that the precipitation errors in this study were also underestimated. Hossain and Anagnostou (2005) estimated that rainfall errors represent between 20 and 60 % 20 of the uncertainty in soil moisture prediction. Even if the specified rainfall errors applied in this work were correct, they would only make up 10 % of the total RMSE of the open loop. The percentage scaling used to generate the perturbations in the experiments could only estimate random errors in accumulations. It could not take into account the nonstationary and intermittent nature of precipitation errors, including the probability 25 of missed precipitation or false alarms. More comprehensive precipitation error models have been developed which can take these factors into account (see e.g. Carrera et al., 2015;Maggioni et al., 2014Maggioni et al., , 2012Maggioni et al., , 2011Clark and Slater, 2006 , 2006). It is planned that one of these approaches will be adopted for the Land Data Assimilation System (LDAS) at Météo-France. The calibration of the various parameters for these rainfall error models requires considerable testing.

Gaussian assumptions
In the synthetic experiments, the EnSRF was applied with a perfect stochastic repre-  The pdf for EnSRF S agrees very well with Kalman theory, since it has a mean of zero and it closely fits the normal distribution (the green line). On the other hand, the pdf for the SEKF S is flatter than the normal distribution and therefore agrees less well with Kalman theory. This demonstrates that without the correct specification of forcing errors, the background- 15 error covariance calibration that optimizes the analysis is not the same as the calibration that agrees best with the Kalman Filter theory. In the real experiment neither method had a perfect representation of the background errors. Both methods used an average value of HP b H T +R about 4 times larger than (y− y o ) 2 (not shown), which indicates that the Kalman Filter assumptions were incorrect. 20 The nonlinearity problems manifested themselves in different ways for the SEKF and the EnSRF. For the SEKF, the Jacobian between the surface and the root zone became too large. This over-sensitivity is partly related to an unrealistic feature of the modelled surface energy balance, since one single surface temperature is used for bare soil and the vegetation layer (Draper et al., 2009;Mahfouf, 2014). The EnSRF 25 instead suffered from the perturbation bias. The explanation for the perturbation bias was linked to the nonlinear behavior of evapotranspiration and drainage. The ISBA-A-

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version

Interactive Discussion
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | gs evapotranspiration and drainage functions are described by Masson et al. (2013) and Mahfouf and Noilhan (1996)   Finally, a key assumption underpinning the EnSRF is that the ensemble size is sufficiently large to represent the errors in the background state. An ensemble size that is too small results in inbreeding, where the errors in the background state are underestimated (Houtekamer and Mitchell, 1998). We investigated the impact of ensemble size on the EnSRF WG2 RMSE (see Table 3). It was found that sampling errors were only significant with ensemble sizes less than 20 members, which is consistent with studies by Carrera et al. (2015) and Maggioni et al. (2012). However, in all these studies the 1D-EnKF approach was implemented, where the analysis is calculated independently for each gridpoint. It is likely that sampling errors would be much more important for a 2D-EnKF approach, where background-error covariances between gridpoints are 20 taken into account, due to the much larger number of degrees of freedom.

Different soil textures
It was found that the soil texture, in particular the ratio of sand to clay, had a significant influence on the model behaviour. Clay soils are more sensitive than sandy soils to perturbations in the initial conditions (Calvet and Noilhan, 2000). This occurs because 25 water travels faster through sand and as a result the atmospheric forcing tends to dominate over perturbations in the model state. This made the clay soils more vulnerable to the nonlinearity issues encountered in this study. The increased sensitivity of clay soils 7381 Introduction

Tables Figures
Back Close

Full Screen / Esc
Printer-friendly Version

Interactive Discussion
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | compared with sandy soils was reflected by a larger ensemble spread for the ensemble experiments. In particular, Narbonne had a much larger ensemble spread than Sabres (see Fig. 3). The correlation coefficient between the modelled clay percentage and the size of the negative perturbation bias (normalized by the RMSE) for the 12 sites was calculated as 5 0.53 for the experiment Ens and 0.04 for the experiment EnSRF R1 . The correlation was much smaller for EnSRF R1 , which was probably because other sources of errors were also important. The list of the biases (normalized by the RMSE) and the clay percentages are shown in S1.1 of Supplement 1. It is recommended that similar experiments are performed on a much larger number of model points, in order to calculate a more 10 reliable estimate of the correlation between the clay content and the perturbation bias.

Flow-dependence
The flow-dependence of the DA methods was examined by comparing the WG2 Kalman gains. When the background errors are larger, the Kalman gain should increase in order to give more weight to the observations. Given the assumption that 15 there is no temporal evolution in the observation errors, the Kalman gain and the open loop errors should peak at the same time. In the experiments, the EnSRF Kalman gains showed a similar seasonal variability to the open loop RMSE, unlike the SEKF Kalman gains. This showed that the EnSRF was able to capture seasonal variability in the background errors. However, land surface models are not chaotic, unlike atmospheric 20 models. The most important contributions to soil moisture errors come from the model and atmospheric forcing errors, rather than the errors in the initial conditions. Therefore it is unlikely that the EnSRF could gain much advantage over the SEKF without having a superior representation of these errors. This would explain why the EnSRF performed better than the SEKF in the synthetic experiments but not in the real experiments.

Land surface model
The SEKF and the force-restore based ISBA-A-gs model are currently embedded in the SURFEX platform of Météo-France. However, the diffusion-based multi-layer model (ISBA-DF) will soon be implemented (Decharme et al., 2011). The soil moisture evolution of ISBA-DF is determined by the mixed form of the Richard's equation, rather 5 than the force-restore method. This is more realistic than the force-restore method as it solves the heat and water diffusion equations explicitly over at least 5 layers. Parrens et al. (2014) compared the SEKF for a two-layer version of the force-store model with an 11-layer implementation of ISBA-DF. They found that the SEKF performance was enhanced by introducing multiple layers. At times they found that the 10 assimilation of surface observations influenced only the top part of the root-zone with ISBA-DF. The multi-layer model captured the vertical profile of the root zone soil moisture, which was not possible with the two-layer model. It will be interesting to test the EnSRF with ISBA-DF and multiple layers. The EnSRF flow-dependent backgrounderror covariance may be able to exploit the improved vertical correlations between the 15 layers.

Conclusions
Twelve sites in Southwest France were selected for in situ soil moisture DA experiments with various soil conditions. In particular, the different sites were chosen for their variability in sand and clay concentrations, which influence soil water transfers. The SEKF 20 and the EnSRF DA methods were compared in terms of their ability to provide an accurate soil moisture analysis. The three-layer ISBA-A-gs land surface model (Noilhan and Mahfouf, 1996;Calvet et al., 1998) was implemented in the experiments, which is based on the force-restore method of Deardorff (1977).
Synthetic experiments were designed to assess the advantage the EnSRF could 25 gain over the SEKF by using a perfect stochastic representation of random precipi- Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | tation forcing errors. By design the SEKF cannot represent stochastic forcing errors directly, but it may compensate for such errors by using larger background-error variances. Indeed, the EnSRF performance was superior to the SEKF, both in terms of analysis RMSE and ACC. In the real experiments, the same rainfall error specification did not improve the analysis. It is likely that the actual precipitation errors were under-5 estimated. It is also possible that other sources of errors could have limited the transformation of precipitation uncertainty information into improved soil moisture scores. It is recommended that more sophisticated rainfall error models are tested, which should take into account the intermittent and non-stationary nature of precipitation errors.