Kalman filters for assimilating near-surface observations in the Richards equation – Part 2 : A dual filter approach for simultaneous retrieval of states and parameters

This study presents a dual Kalman filter (DSUKF – dual standard-unscented Kalman filter) for retrieving states and parameters controlling the soil water dynamics in a homogeneous soil column, by assimilating near-surface state observations. The DSUKF couples a standard Kalman filter for retrieving the states of a linear solver of the Richards equation, and an unscented Kalman filter for retrieving the parameters of the soil hydraulic functions, which are defined according to the van Genuchten–Mualem closed-form model. The accuracy and the computational expense of the DSUKF are compared with those of the dual ensemble Kalman filter (DEnKF) implemented with a nonlinear solver of the Richards equation. Both the DSUKF and the DEnKF are applied with two alternative state-space formulations of the Richards equation, respectively differentiated by the type of variable employed for representing the states: either the soil water content ( θ ) or the soil water matric pressure head ( h ). The comparison analyses are conducted with reference to synthetic time series of the true states, noise corrupted observations, and synthetic time series of the meteorological forcing. The performance of the retrieval algorithms are examined accounting for the effects exerted on the output by the input parameters, the observation depth and assimilation frequency, as well as by the relationship between retrieved states and assimilated variables. The uncertainty of the states retrieved with DSUKF is considerably reduced, for any initial wrong parameterization, with similar accuracy but less computational effort than the DEnKF, when this is implemented with ensembles of 25 members. For ensemble sizes of the same order of those involved in the DSUKF, the DEnKF fails to provide reliable posterior estimates of states and parameters. The retrieval performance of the soil hydraulic parameters is strongly affected by several factors, such as the initial guess of the unknown parameters, the wet or dry range of the retrieved states, the boundary conditions, as well as the form ( h -based or θ -based) of the state-space formulation. Several analyses are reported to show that the identifiability of the saturated hydraulic conductivity is hindered by the strong correlation with other parameters of the soil hydraulic functions defined according to the van Genuchten–Mualem closed-form model.


Introduction
Accurate determination of the water dynamics in the vadose zone is crucial for the success of many hydrological, climatic and environmental studies.The significant increase in the availability of hydrologic data sets can definitely provide extensive opportunities for reducing the uncertainty associated to the detection of the spatial and temporal variability of soil moisture.However, it also calls for more robust methods to merge new available observations and uncertain model predictions appropriately.
A crucial aspect in the application of these methods is the proper specification of the model parameters, as a function of the variables characterizing the state of the water in the soil (e.g.Heathman et al., 2003;de Lannoy et al., 2007;Vereecken et al., 2008).Parameterization of soil hydraulic properties is considered one of the main challenges in the Published by Copernicus Publications on behalf of the European Geosciences Union.
H. Medina et al.: Part 2: A dual filter approach for simultaneous retrieval of states and parameters current land surface modelling efforts (Zhu and Mohanty, 2004), particularly because hydraulic properties exhibit large spatial variability at all scales of interest, making it extremely difficult to capture hydrological behaviour at one particular scale (Pringle et al., 2007;Chirico et al., 2010).
A prerequisite to properly handle the marked variability of soil hydraulic properties in large-scale applications is the use of efficient calibration methods in terms of time and storage.Traditional methods for parameter identification generally optimize an objective function from a historical batch of data, and hence require a set of historical data to be kept in storage and processed all together, with limited flexibility to account for new available measurements (Moradkhani et al., 2005).In addition, a common problem with most of these traditional inverse methods is stability and convergence (Yeh, 1986;Abbaspour et al., 1997).Therefore, several attempts have been made to develop and apply calibration methods that circumvent these drawbacks.
Considerable progress has been achieved in the development and application of sequential data assimilation (DA) techniques.As recursive data-processing algorithms, DA methods do not require all past information to be stored; they continuously update the variables under scrutiny in the model, when new measurements become available, to improve the model forecast and evaluate the forecast accuracy (McLaughlin, 2002;Vrugt et al., 2005;Reichle, 2008).Although sequential estimation is typically applied only to the state variables, some algorithms, belonging to the family of the dual estimation methods, have been designed to simultaneously estimate model states and parameters as part of the assimilation process.This family of algorithms includes joint and dual filtering as well as expectation maximization (EM) approaches (e.g.Moradkhani et al., 2005;Liu and Gupta, 2007).EM methods have been commonly designed for offline applications, but sequential EM methods have also been proposed (Wan and Nelson, 2001).
In the dual filtering approach, a separate state-space representation is used for the states and the parameters, while in the joint approach the unknown system states and parameters are concatenated into a single higher-dimensional joint state vector.In principle, the joint approach should provide better estimates than the dual approach, because it explicitly accounts for the cross-covariance between state and parameter estimates.However, the estimation process can lead to unstable results because of complex interactions between states and parameters in nonlinear dynamic systems (Moradkhani et al., 2005;Liu and Gupta, 2007).
Some modern approaches extend the traditional parameter estimation paradigm toward a more explicit incorporation of structural data errors.Since the performance in hydrological modelling is also affected by errors in model structure and input data, model adjustment through time variation of parameters together with state variables can result in a limited understanding about the overall uncertainty (Clark and Vrugt, 2006).Nevertheless, the diagnostic analysis of a model can be a difficult task and it is only possible after the model has been parameterized (Spaaks and Bouten, 2013).According to Renard et al. (2010), none of the current approaches appears entirely satisfactory and the optimal methodology for handling structural errors is still to be established.
Common sequential DA methods are based on standard Kalman filtering (SKF), from the innovative work of Kalman (1960).SKF became a widely used technique to merge information optimally from different sources and model predictions in linear systems (e.g.McLaughlin, 2002;Vereecken et al., 2008).Variations of the SKF algorithm have been developed to make it applicable to the sequential probabilistic inference problem within nonlinear dynamic systems, such as the extended Kalman filter (EKF) (Jazwinski, 1970), the commonly used ensemble Kalman filter (EnKF) (Evensen, 1994(Evensen, , 2003)), and the unscented Kalman filter (UKF) (Julier et al., 1995;van de Merwe, 2004).
A fundamental difference between the Kalman filter and variational methods is that the former explicitly evolves the covariance matrix without interruption, while variational methods do not propagate error covariance information from one assimilation interval to the next (Reichle, 2008).In addition, the Kalman filter provides an analytical solution of the a posteriori state mean, while variational methods rely on numerical methods, which are considered more feasible for applications where the dimension of the state vector is very large, as with weather forecasting models (Reichle, 2008).
Kalman filter applications in hydrology (Reichle et al., 2002;Reichle and Koster, 2003;Reichle, 2008;Camporese et al., 2009) favour the use of EnKF, relying on the propagation of a random ensemble of the retrieving variable.The EnKF is an advantageous approach for highly dimensional applications, mainly because, by means of a comparably small ensemble of model trajectories, it captures the relevant parts of the error structure (Reichle, 2008).This method also facilitates the treatment of errors in model dynamics and parameters (Reichle and Koster, 2003;Moradkhani et al., 2005) and it is easily scalable.Nevertheless, the EnKF estimation based on small ensemble sizes, can be affected by spurious modes and large biases even if the ensemble mean and covariance are correct (Luo and Moroz, 2009;Lei and Baehr, 2013).Moreover, the optimal ensemble size for the EnKF is uncertain and is generally chosen on the basis of a heuristic evaluation.
The sampling strategy of the EnKF could be a drawback in large-scale applications where a small variation of the ensemble size has an important impact on the computational demand (e.g.Kumar et al., 2008).The implementation of these large-scale assimilation systems is often described as a collection of independent low dimensional assimilation problems (Crow and Wood, 2003;Reichle and Koster, 2003;Kumar et al., 2008).Chirico et al. (2014) show that the SKF, coupled with a Crank-Nicolson numerical scheme, can be an efficient choice for low dimensional applications, because it provides Hydrol.Earth Syst.Sci., 18, 2521-2541, 2014 www.hydrol-earth-syst-sci.net/18/2521/2014/ retrieval performances similar to those obtained with nonlinear schemes, but with less computational effort, also circumventing some issues which may arise with the sampling strategies used by the UKF and the EnKF.However, the UKF can be more flexible and computationally efficient than the EnKF in problems with low degrees of freedom, because it relies on ensemble sizes equal to twice the number of degrees of freedom plus one.Few attempts have been made to retrieve soil water state profiles and soil hydraulic parameters simultaneously by assimilating near-surface observations with Kalman filters (e.g.Qin et al., 2009;Yang et al., 2009;Montzka et al., 2011).Tian et al. (2008) used a dual UKF for reproducing the temporal evolution of daily soil moisture under freezing conditions by assimilating satellite observations.Lü et al. (2011) developed a dual Kalman filter for estimating the root zone soil moisture using a model based on the Richards equation, by combining the EKF to update the state variables, with an optimization algorithm for retrieving parameters of soil hydraulic functions defined according to the van Genuchten-Mualem (VGM) relations (van Genuchten, 1980).Monztka et al. (2011) performed a joint approach retrieving soil moisture and VGM parameters, but using a particle filter algorithm.
Moving from the result of Chirico et al. (2014), we hypothesize that the combination of SKF applied to a linearized numerical representation of the Richards equation, and UKF applied to handle the intrinsic nonlinearities between hydraulic parameters and soil water states, could provide a suitable strategy for optimizing the prediction of the state dynamics.
The first objective of this study is to illustrate the feasibility of using a deterministic dual filter approach to perform simultaneous retrieval of soil moisture profiles and VGM parameters, with similar accuracy but reduced computational expense, as compared with ensemble Kalman filters, based on the assimilation of near-surface observations in a onedimensional Richards' equation.The analysis is based on a synthetic test assuming uncertain observations and a poor guess of the initial states.A small structural error is also involved by implementing a different numerical solver of the Richards equation in the assimilation algorithm from that employed for generating the reference synthetic data.
The dual Kalman filter (hereafter referred to as DSUKFdual standard-unscented Kalman filter) is designed by coupling the SKF approach for retrieving the states with the UKF for retrieving soil hydraulic parameters.For comparative purposes, the simultaneous retrieval of states and parameters is also performed using the dual ensemble Kalman filter (DEnKF), following the framework described by Moradkhani et al. (2005).Interested readers are referred to this work, widely cited by the hydrological data assimilation community.
A second objective is to compare the potential advantages and limitations of an h-based or a θ-based form of the Richards equation in the retrieval algorithm, also account-ing for different initial guesses of the parameters, observation depths, assimilation frequencies as well as the type of near-surface observations (h or θ).

Governing equation
As in the vast majority of applications in this realm, we describe the vertical movement of water under isothermal conditions in a rigid, homogeneous, variably saturated porous medium using the Richards equation (Jury et al., 1991).The following two equations represent the Richards equation in the h-based and in θ-based forms, respectively: where t is time and z is soil depth taken as positive downward with z = 0 at the top of the profile, C (h) = dθ/dh [1/L] is the specific water capacity of the soil at matric pressure head, h, obtained by differentiating the function θ (h), and For an efficient numerical solution of the model, it is convenient to describe the soil hydraulic properties using closedform analytical relationships.The following non-hysteretic VGM equations (van Genuchten, 1980) are widely used in soil hydrology: where θ s is the saturated soil water content, θ r is the residual soil water content, S e = (θ − θ r )/(θ s − θ r ) is the effective saturation, K s is the saturated hydraulic conductivity, and α [L −1 ], n (-), m(-) and λ(-) are empirical scale and shape parameters.A common assumption, also adopted in this work, is to set λ = 0.5 and pose m = 1 − 1/n.et al. (2014) showed that the implementation of the filtering approach upon a linearized Crank-Nicolson finite difference scheme (CN) can be an efficient algorithm for onedimensional problems.The differentiation of Eq. (1) for intermediate nodes according to the CN scheme, leads to the expression

Chirico
where superscript i is the node number (increasing downward), subscript k is the time level, and t k = t k+1 − t k .The soil column is divided into compartments of thickness z i .All nodes, including the top and bottom node, are in the centre of the soil compartments, with z u = z i − z i−1 and z l = z i+1 − z i .The spatial averages of K are calculated as arithmetic means.
Assuming flux boundary conditions, the differential equations at the top and bottom nodes respectively are where q top and q bot are the fluxes at the top and bottom of the soil profile, respectively.The analogous differential expressions of the Richards equation in the θ form (Eq. 2) can be obtained from Eqs. ( 5)-( 7) by simply removing the soil water capacity (C) and by substituting h with θ, the hydraulic conductivity (K) of the dependent terms with the diffusivity (D), while keeping the independent terms on the right-hand side unchanged.
In the numerical scheme, the explicit linearization of K and C (or D) is implemented by taking their values at the previous time step k − 1.A linear state-space representation of the dynamic system can then be easily derived by combining the set of Eqs. ( 5)-( 7) written for each node and accounting for the boundary conditions: where x represents the state vector (i.e.either soil water contents or matric heads in the soil profile), while A k−1 and B k−1 are tridiagonal matrices obtained by assembling the terms in the first parenthesis on the right-and left-hand side of Eqs. ( 5)-( 7), respectively.The term f k−1 is a vector obtained by assembling the terms on the right-hand side of the state variable at time step k − 1.More explicitly, Eq. ( 8) becomes where F =B −1 A and g = B −1 f .

The dual standard-unscented Kalman filter (DSUKF) formulation
The dual filter approach has been implemented with most of the variants of the Kalman filter, applied to both linear and nonlinear problems, i.e. the SKF (Todini et al., 1976), the EKF (Nelson, 2000;Wan and Nelson, 2001), the EnKF (Moradkhani et al., 2005) and the UKF (Wan and van der Merwe, 2001;van der Merwe, 2004).
In this section we illustrate the (DSUKF) formulation, where the SKF is implemented for retrieving the states of a linear system, while the UKF is applied to handle the marked nonlinearities between states and parameters.
At every time step k, the posterior parameter estimate at time k −1 is used in the state filter, while the current estimate of the states is used in the parameter filter.In the most general case, the set of system equations for the states can be written as follows: In a Bayesian framework, they represent a prior distribution over the states.Equation (10) allows inferring the transition probability density of the states, while Eq. ( 11) determines the probability density of the observations given the prior states.The set of system equations for the parameters can be written as representing a prior distribution over an artificial timedependent random variable that emulates model parameters.
In the equations above, u k is the exogenous input assumed to be known at instant t k ; ν k−1 accounts for a simplified representation of the model errors, assumed to be a zeromean Gaussian process noise with covariance Q k−1 , while η k is the zero mean and temporally uncorrelated observation or measurement noise with covariance R k , corrupting the observation of the states.The state transition density p (x k |x k−1 , u k , w k−1 ) is fully specified by F k−1,k and the process noise distribution p(ν k−1 ), whereas H k and the observation noise distribution p(η k ) fully specify the observation likelihood p (y k |x k , w k ).
F k−1,k and H k are parameterized via the parameter vector w k , whose evolution is artificially set up in a way similar to that employed for the state variables (Moradkhani et al., 2005) by means of a stationary process with identity state transition matrix.ξ k−1 ∈ N 0, Q w,k−1 is the noise driving parameter updating, and ς k = η k + ξ k is the noise corrupting the observation equation relative to the parameters, with zero mean and covariance R w,k .The upper symbol "ˆ" denotes the density mean of the variable.

UKF algorithm for parameter retrieval
The UKF, like the EnKF, is based on a strategy for the selection of the sample points, which aims to capture the posterior true mean and covariance of the retrieved variable, after the sample points are propagated through the true nonlinear system.States or parameters are still represented by a Gaussian random variable.However, in the UKF this is not specified by an ensemble of randomly chosen points, like in the EnKF, rather by using a minimal set of deterministically chosen sample points.
Considering ŵ and P w , respectively, as mean and covariance of the parameter vector w to be retrieved, having the dimension equal to N par , the UKF selects a set of sigma points S i = µ i , W i , i = 0. ..2N par , consisting of 2N par + 1 vectors W i and their associated weights µ i , completely capturing the actual mean and covariance of the random variable w.A selection of sigma points fulfilling this requirement is defined as follows: Weight values for calculating the mean and the covariance are distinguished by the upper indexes m and c, respectively.The other parameters are defined as follows: γ = ρ 2 N par + κ , where ρ is a factor employed to expand or to shrink the sample state distribution around the mean; κ is a scaling parameter; β affects the weights of the points when calculating the covariance.Details about the proper choice of ρ, β and κ can be found in the work of van der Merwe (2004).
The term √ γ P w i is the ith column (or row) of the root square matrix γ P w , calculated by Cholesky decomposition (Press et al., 1992).
The evolution of the parameter mean and covariance during each time step is computed as follows ŵ− k = ŵk−1 (16) The artificial noise covariance Q w is computed as follows (Wan and Nelson, 1997;Nelson, 2000;van der Merwe, 2004): The parameter λ RLS ∈ (0, 1] is considered a forgetting factor, as defined in the recursive least-squares (RLS) algorithm.Nelson (2000) showed that setting λ RLS < 1 (i.e. the prior covariance is larger than the posterior covariance) provides an approximate exponentially decaying weight on past data.By setting λ RLS = 1 (i.e.no process noise for the parameters is considered) all past data are equally weighted to obtain the current dynamics.Whenever measurements are available, new sample states are created by substituting the a priori parameter mean, ŵ− k , and covariance, P − w,k , in Eq. ( 14).In principle the weights, µ i , do not change during the simulation.
The set of 2N par + 1 parameter vectors W k is propagated across the model, and the observation equation, using as states the a posteriori mean at k − 1, xk−1 , is expressed as follows: Y k also represents a set of 2N par + 1 vectors, each having N obs elements.The Kalman gain employed for modifying the parameter trajectories is obtained as follows: P wy,k is computed by the following weighted outer product: P w yy,k is given by

H. Medina et al.: Part 2: A dual filter approach for simultaneous retrieval of states and parameters
The measurement noise covariance R w,k is assumed to be a constant diagonal matrix following the basic implementation of the dual UKF proposed by van der Merwe ( 2004), previously applied by Wan and Nelson (1997), Nelson (2000) and Wan and Nelson (2001) in the context of a dual EKF.This assumption leads to a recursive prediction error algorithm that minimizes a simplified cost function with respect to the parameters.This prediction error algorithm, while questionable from a theoretical perspective, has been shown to be quite useful (Wan and Nelson, 2001).As part of a dual UKF application, Gove and Hollinger (2006) showed that by setting the measurement noise covariance equal to the identity matrix, the overall trajectory of the retrieved parameter was very similar to that obtained considering the actual measurement errors, and attributed this behaviour to the robustness of the filter with respect to changes in parameter measurement variance components.The parameter mean is updated according to the standard Kalman filter equation: The parameter covariance is updated as follows: P w υυ,k (see also Eq. 19) represents the covariance of y k − ŷ− w,k .Here we opt for the expression also employed by Julier and Uhlmann (2004), van der Merwe (2004) and Tian et al. (2008), given the nonlinearity between parameters and observations.

Algorithm for parameter sampling
Given the marked differences in the range of variation of the VGM parameters, a variable transformation is required to guarantee operational stability.Bounding parameters by means of a function of reference values and a variable correction term ensures that the model behaves reliably.Moradkhani et al. (2005) and Montzka et al. ( 2011) also applied some strategies to limit the overdispersion of parameter sampling.
Given that w i is the true value of the ith parameter, the parameter estimation system makes use of the following variable transformation: where w min and w max represent user-defined nominal values, indicating the minimum and maximum values of the parameter, respectively, while the correction term, δw, which is the actual variable under estimation, is expressed as an independent term of a nonlinear sigmoidal function g(δw).This function g(δw), termed a "squashing function" by van der Merwe ( 2004), limits the absolute magnitude of iterative parameter adjustment, further preventing the divergence of the parameter estimations.Therefore, the parameters are not estimated directly, rather "correction terms" are estimated.
A preliminary analysis has shown that the approach is not very sensitive to the type of sigmoidal function and that the following relationship performed well for all of the circumstances examined: Note that lim which cases w i = w i min and w i = w i max , respectively.

Synthetic experimental framework
We explore the performance of the proposed dual Kalman filter with a synthetic study.The main advantage of testing the algorithm with a synthetic study is that, by knowing the true system, the results are not overshadowed by other sources of uncertainty: a fundamental aspect that should be addressed prior to evaluating algorithm performance with real data, as in the study presented by Medina et al. (2014).

Model implementation and synthetic data generation
We simulate the vertical movement of water in a homogeneous and variably saturated soil column of 100 cm.The hydraulic properties of the homogeneous soil column are identified by using the VGM parameters reported in the papers by Entekhabi et al. (1994) and Walker et al. (2001): θ s T = 0.54; θ r T = 0.2; K s T = 0.00029 cm s −1 , α T = 0.008 cm −1 and n T = 1.8,where the subscript "T" indicates the "true" values, i.e. those employed to produce the reference synthetic simulations.However, different boundary conditions have been set so as to make the synthetic study more representative from a practical perspective: -the top boundary condition is the result of a combination of a stochastically generated daily series of rainfall plus a constant evaporation rate of 2.35 mm d −1 ; -the bottom boundary condition is set by a zero gradient of the matric head, also known as "free drainage" condition, which also implies that this condition is affected by uncertainty in the identification of the unsaturated hydraulic conductivities.
The inclusion of a rainfall pattern allows evaluating of the dual filter performance during continuous wetting and drying processes taking place in the soil profile.In mathematical terms, the higher variability in the flux at the soil surface entails a higher temporal variability of the correlations between adjacent states, thus affecting the potential ability of the filter to adjust the soil profile.This makes the synthetic study a more representative stress test of the overall retrieval process than the case with a constant top boundary condition, which is the one applied by Entekhabi et al. (1994) and Walker et al. (2001).
Daily rainfall is obtained by stochastically sampling a Poisson probability distribution of the occurrence of daily events with an exponential distribution of the rainfall depth.The bar plot in Fig. 1 illustrates the synthetic daily rainfall time series for a period of 150 days.
The soil column is discretized by 27 nodes with variable node spacing; this according to van Dam's (2000) suggestion that accurate computation of soil water fluxes at the top boundary requires that the distance between the nodes close to the soil surface has to be in the order of a few centimetres.A similar criterion is followed for the bottom compartments, given the flux condition adopted for the bottom boundary.
Subsequently, time series of synthetic "true" matric head and soil moisture profiles are generated for 150 days, by setting the initial profile matric head uniformly equal to −50 cm, and by employing the nonlinear numerical scheme illustrated by Chirico et al. (2014).Figure 1 also shows the time series of the generated matric pressure head values at 5 cm depth.

Retrieval modes
The synthetic study involves the retrieval of states and parameters by assimilating near-surface observations into the Richards equation, according to three different retrieval modes: -the h-h retrieval mode, indicating that matric head is used as both observed and state variable, with the hbased form of the Richards equation; -the θ-θ retrieval mode, indicating that soil water content is used as both the observed and state variable, with the θ-based form of the Richards equation; -the θ-h retrieval mode, indicating that soil water content is used as the observed variable, while matric head is used as the state variable, with the h-based form of the Richards equation.
Preliminary analyses showed that the h-h mode is prone to volatility in the parameter solution, for relatively abrupt changes in the original state variable, generating either very large or very small values.When the solution is very close to the extreme values, the parameter estimation filter sometimes loses its tracking ability, and the algorithm becomes unstable.
A successful strategy is to work with log transformed matric heads only for the parameter filter, without the need to make any change in state relationships.This alternative can be implemented straightforwardly in a nonlinear KF as UKF or EnKF, where the covariance matrices are not propagated analytically.Hence, in the h-h mode the parameter equations are not directly applied with log transformed predicted measurements and observations.
As illustrated below, the retrieval algorithm using soil moisture as a state variable is permanently stable, albeit at the expense of a slightly slower convergence speed, as compared with the case of h as a state variable.As stated by Walker et al. (2001), the soil moisture transformation not only reduces the differences between model predictions and observations, but also the numerical values of the gradients along the soil profile.

Reference scenarios
The performance of the proposed dual Kalman filter approach is evaluated with respect to reference scenarios, given by implementing the three retrieval modes introduced in the previous section, with different assimilation depths, assimilation frequencies and initial guessed parameter sets.
We simulate the assimilation of observed variables at three alternative observation depths (OD): 2, 5 and 10 cm.Escorihuela et al. (2010) found 2 cm to be the most effective soil moisture sampling depth by L-band radiometry.Nevertheless, L-band sensors receive their signal from approximately the top 5 cm, on average (Kerr, 2007).A depth of 10 cm represents the maximum observation depth that can probably be explored with the current remote sensing technology (e.g.Nichols et al., 2011).
We also examine three alternative assimilation frequencies (AF): 1, 1/3 and 1/5 d −1 .Daily assimilation frequency accounts for future L-band missions or a combination of different remote sensors, whilst 3 days is the minimum time interval of SMOS spaceborne platforms (Kerr et al., 2010).One observation every 5 days represents a more common remote sensing time frequency.
The assimilation scenarios with the h-based form of the Richards equation are initialized with an initial matric pressure head profile assumed to be uniformly equal to −100 cm.The assimilation scenarios with the θ-based form of the Richards equation are initialized with an initial soil water content profile uniformly equal to −0.47 cm 3 cm −3 , which corresponds to the soil water content at h = −100 cm, according to the true water retention function.
The retrieved parameters are K s , α and n of the VGM analytical model.We assume parameters θ s and θ r to be known, as they can be easily measured or estimated by indirect H. Medina et al.: Part 2: A dual filter approach for simultaneous retrieval of states and parameters methods, i.e. with pedotransfer functions (e.g.Chirico et al., 2007).
We considered six very dissimilar sets of initial values for the parameters K s , α, and n, to evaluate the role exerted by different initial guesses on the performance of the retrieval process.These initial values were identified by employing the six possible permutations of the values −1, 0 and 1 as correction terms δw i in Eq. ( 27), and subsequently in Eq. ( 26).The initial matrix of the normalized correction terms associated with the soil hydraulic parameters is set to be diagonal, with non-zero entries equal to 0.01, following Nelson (2000).
Table 1 shows the resulting initial values of the parameters and the corresponding w min and w max -w min .Notice that the limit values of our parameters, w min and w max , cover practically the whole spectrum of values reported by Carsel and Parrish (1988) for the 12 major soil textural groups, except for some sandy soils.
On the whole, 162 reference scenarios are examined, made by three retrieval modes, three observation depths, three assimilation frequencies and six initial parameter sets.

Comparative performance analyses
The DSUKF performance in retrieving the state profiles is analysed by comparing it with the DEnKF, the SKF and the "open loop" solution.
The DEnKF is implemented following the framework described by Moradkhani et al. (2005), coupled with the fully implicit numerical representation of the Richards equation described by Chirico et al. (2014).The DEnKF is run with ensembles of 12 members (hereafter referred to as DEnKF( 12)) and 50 members (hereafter referred to as DEnKF(50)).Reichle and Koster (2003) found that the uncertainty in soil moisture retrieved with an EnKF applied to a one-dimensional problem, is consistently reduced with an ensemble of 12 members.Camporese et al. (2009) suggest a minimum of about 50 realizations for ensuring a suitable level of accuracy in analogous applications.
The SKF is implemented with a linearized Crank-Nicolson finite difference scheme of the Richards equation (Chirico et al., 2014), with time-independent initial guessed parameters.
The "open loop" solution is obtained without assimilating any near-surface observations, i.e. the system is simply propagated from the initial uniform conditions and the timeindependent initial guessed parameters, using the known boundary conditions.
For quantitatively evaluating the performance of the retrieval algorithms, the normalized root mean square error (RMSE) between predicted and synthetic data (SD) state profiles is calculated as follows: where x p i,j and x SD i,j represent the predicted and SD state value at node i and time j , respectively, and σ SD is the standard deviation of the SD state series, with N nod = 27.Normalization is carried out to enable the comparison between θ-based and h-based retrieval processes.
The average RMSE of the last 15 days of simulation (hereafter simply referred to as RMSE) is taken as accuracy index of the retrieved state profiles, since RMSE values in the last 15 days are not affected by the poor guess of the initial condition.Under the assumption of "free drainage" at the bottom boundary, the dynamic evolution of the system's states, conditional upon a specific set of parameters, rapidly loses its dependence on the initial state values, and the effect of the poor guess of the initial state disappears after a few weeks.
The performance in state retrieval is thus analysed by comparing the RMSE of 648 experiments, resulting from the application of four retrieval algorithms (DSUKF, DEnKF(50), DEnKF(12) and SKF) to the 162 reference scenarios outlined in the previous section.In addition, another 12 experiments are undertaken for the "open loop" solutions, resulting from the application of the h-based and θ-based forms of the Richards equation with the six sets of parameters.
The reference scenarios, but only in the h-h and θ-θ retrieval modes, are also employed for assessing the capability of DSUKF to identify the unknown parameters.The effect of dealing with a nonlinear observation operator in the θ-h mode is examined by comparing the parameters retrieved in the h-h and in the θ-h retrieval modes, but using an hourly assimilation frequency.
Further insights into parameter identifiability are gained by comparing the performances of the retrieval algorithms in estimating the states when only one or two parameters are uncertain.
The effect of parameter uncertainty in retrieving states and parameters is also assessed by applying the DSUKF and the DEnKF with a large number of initial parameter combinations in the h-h and the θ-θ modes.Similarly to Moradkhani et al. (2005), 500 random sets of initial parameters are chosen by sampling them from uniform distributions within the respective limit values listed in Table 1.These 500 sets are mapped into the space of the correction terms (Eq.25), prior to running the assimilation algorithms.The DEnKF is implemented with ensembles of 12, 25, 35 and 50 members, in order to obtain a comprehensive survey of the tradeoff between computational expense and accuracy of the two retrieval algorithms.This complementary bootstrapping analysis provides us with the posterior parameter uncertainty and the associated probability distribution of the errors of the estimated states.It involves 5000 additional experiments, resulting from the combination of five retrieval algorithms (DSUKF, DEnKF(50), DEnKF(35), DEnKF(25) and DEnKF( 12)), 500 random sets of initial parameters, and two retrieval modes (h-h and θ-θ ), while accounting for just one assimilation resolution (i.e.AF = 1d −1 and OD = 10 cm).Table 2 provides an overview of the numerical experiments undertaken for assessing the relative performance of DUSKF.

Setting system and noise covariances
The covariance matrices of the added process and measurement noises (Q, Q w , R and R w ) and the initial system covariance matrices (P 0 and P w,0 ) are set to be diagonal for all cases.The initial state covariance matrix accounts for a standard deviation of 32 % of the initial state value, denoting a sufficiently high, yet realistic, error, with no correlation between nodes.This corresponds to an initial covariance of 10 3 cm 2 using the h-based form of the Richards equation and 0.023 using the θ-based form.The response to a variation of the initial system covariance in a dual Kalman filter framework is less predictable than in a state Kalman filter application, where an increase in the error covariance regularly drives the system to converge faster to the true regime, provided that the variance of the observation error is lower.
Similarly to Camporese et al. (2009), a standard deviation of 1.4 % of the observed value was given to the observation noise, while a standard deviation of 2.24 % was given to the system noise.These values respectively correspond to a covariance of 2 and 5 % of the initial state value in the h-based form, as also adopted by Walker et al. (2001).
The system noise covariance was added every hour as a means of normalizing the incorporated error with respect to the time step, i.e. to make the incorporated error independent of the adopted time step.
We set b = 2.0 and k = 0 for the deterministic sampling within the UKF (see Eq. 14), as suggested by van der Merwe (2004).The ensemble size is equal to seven, i.e. twice the number of retrieved parameters (N par ) plus one.Through sensitivity experiments, we chose ρ = 0.3 using the h-h and θ-h modes, and ρ = 0.8 using the θ-θ mode.
The forgetting factor coefficient λ RLS , affecting the artificial parameter noise covariance Q w (Eq.17), was set equal to 0.995 in the h-based form and to 0.9999 in the θ-based form, while the diagonal entries of the artificial observation noise covariance R w were set equal to 0.5 and 10 −5 , respectively.Given that these coefficients are subjectively chosen and have a major effect on parameter updates, we also examined some scenarios accounting for different combinations of λ RLS and R w values, as listed in Table 3 for both retrieval modes.

State retrieval
The coloured grid depicted in Fig. 2   methods improve the open loop estimations independently on the adopted scenario.
The RMSE values of the DSUKF estimations are in all cases higher than those obtained using DEnKF(50), but generally lower than the ones of the DEnKF(12).The observation depth and the assimilation frequency affect the retrieval performance of the dual filters more in the θ-θ and in the θ-h modes than in the h-h mode.
Figure 2 clearly shows that the retrieval performance with the θ-h mode is much poorer than that one obtained with the h-h mode.This occurrence confirms that a nonlinear observation operator in the θ-h mode has a relevant detrimental effect on the state-retrieving performance.Therefore, later in this section we specifically focus on the comparison of the h-h and the θ-θ retrievals, by showing additional results.
Figure 3 depicts the absolute RMSE values, in the h-h mode and the θ-θ mode, for two extreme assimilation scenarios: one with OD = 10 cm and AF = 1 d −1 , the other with OD = 2 cm and AF = 1/5 d −1 .The RMSE of the open loop simulations obtained with θ-θ mode is lower than those relevant to the h-h mode, showing the lower impact that parameter uncertainty exerts on the soil water content uncertainty with respect to the matric head uncertainty.Although the initial conditions in terms of h or θ are consistent between them according to the "true" soil water retention function, they actually have a different impact on the corresponding open loop simulation errors because of the poor guess in the parameters.In addition, these results may also be biased somehow because the soil moisture space is constrained, whereas the matric head space is not and it is theoretically infinite.
As mentioned above, the SKF algorithm not only fails to considerably reduce the state uncertainty, but it could even provide worse results than the open loop solution.That is the case of sets S1 and S2 using matric heads and S1, S3, and S6 using soil moisture, all of them with n < 2.6.In general, larger RMSE values of the open loop simulation correspond to smaller RMSEs of the SKF approach.This behaviour is more pronounced using matric heads, partly because larger errors in h are predominantly associated with poor guessing of α, which produces a shift of the entire matric head profile that can be more easily corrected by state retrieval with SKF.
The DSUKF considerably reduces the state uncertainty in all cases.For OD = 10 cm and AF = 1 d −1 , the error statistics with the h-h and the θ-θ modes are about 40 and 13 times lower than those with the open loop simulations, respectively.
Hydrol.Earth Syst.Sci., 18, 2521-2541, 2014 www.hydrol-earth-syst-sci.net/18/2521/2014/For OD = 2 cm and AF = 1/5 d −1 , the RMSE is about 18 and 9 times lower than with the corresponding open loop solutions.The RMSE with the h-h mode is always lower than that with the θ-θ mode, except for the case with the initial parameter set S4.The RMSEs of DSUKF are about 2.5 and 1.65 larger than using DEnKF(50), respectively for the h-h mode and the θ-θ modes, almost irrespective of the time and space resolutions of the assimilated observation.DEnKF(50) also appears to be less vulnerable to the anomalies affecting the DSUKF algorithm implemented with the matric heads, but its computational time is almost seven times larger than that required for DSUKF.However, with OD = 10 cm and AF = 1 d −1 , the DSUKF errors are approximately 1.6 and 1.4 times lower, respectively, than those of DEnKF(12) .
For OD = 2 cm and AF = 1/5 d −1 , the DEnKF(12) performed similarly to the DSUKF in terms of RMSE, but it is subjected to some numerical artifacts, in particular for S4, due to the relatively large sampling errors associated with the small size of the ensemble.In addition, the computational time required for DSUKF is about 1.7 times smaller than that required for DEnKF(12).
Figure 4 shows the states retrieved using the h-h and θ-θ retrieval modes after 5, 10, 20, 50, 100 and 150 days, considering the minimum assimilation frequency of the nearsurface observations (AF = 1/5 d −1 ), together with the open loop profiles.
The two sets of open loop profiles simply reflect the same model predictions with two different representations of the system states, which are reciprocally related by means of the water retention function parameterized according to the corresponding guessed parameters.Compared with the corresponding true profiles, the open loop matric head profiles are biased toward larger matric heads, while the open loop water content profiles are mainly shifted toward lower content values.
The DSUKF, with both retrieval modes, considerably reduces the uncertainty of the states despite the initial wrong parameterization.At the 50th day, a good match is observed between the estimated and "true" profiles , almost irrespective of the initial guessed parameter set.
Figure 5 depicts the ratios of the mean RMSE within each group of parameter sets, computed during the last 15 days for different observation depths and assimilation frequencies.In general, the RMSE exhibits limited sensitivity to the observation depths under the adopted range (Fig. 5a, c).In the case of the θ-θ retrieval mode, the ratio between OD = 2 cm and OD = 5 cm is even smaller than one both for AF = 1/3 and AF = 1/5 d −1 , which should be due to the stochastic nature of the simulations.Increasing AF from 1/5 to 1/3 d −1 does not appreciably improve the error statistics.Only the transition from AF = 1/3 d −1 to daily assimilations consistently reduces the RMSE, particularly with matric heads (Fig. 5b,  d).With the DEnKF(50) (Fig. 3), the RMSE for OD = 2 cm and AF = 1/5 d −1 , compared with that assuming OD = 10 cm and AF = 1 d −1 , is about 2.1 times larger with the h-h mode and 1.3 times larger with the θ-θ mode, consistent with the results obtained with the DSUKF.
The ratio of the mean RMSE computed with the θ-θ retrieval mode to that obtained with the h-h mode is about 1.7 considering AF = 1 d −1 , while it is close to one for the other two frequencies.The values reported in Fig. 3 for DEnKF(50) show that the RMSE with the θ-θ mode is 2.4 times higher than with the h-h mode, for OD = 10 cm and AF = 1 d −1 , and 1.5 times higher for OD = 2 cm and AF = 1/5 d −1 .However, the average statistics are strongly affected by the high errors obtained for S4 and S5.
Given the stochastic nature of the problem, we examine the probabilistic distribution of the error of the estimated states by applying the DSUKF and the DEnKF with 500 random sets of initial parameters, as illustrated in Sect.3.4.Figure 6 shows cumulative probability distributions of the RMSE obtained with DSUKF, DEnKF(12), DEnKF(25), DEnKF(35), DEnKF(50), assuming OD = 10 cm and AF = 1 d −1 .
Using both retrieval modes, DEnKF(50) and DEnKF(35) outperform DSUKF in terms of accuracy.The error statistics using DEnKF( 35) are almost half those obtained with DSUKF when retrieving pressure heads (h-h mode).The accuracy of the DSUKF is found to be comparable to that of DEnKF( 25), whose implementation demands a computational time 3.3 times larger than that required by the proposed method.The 5th and 25th percentiles of DEnKF( 25) are lower than those of DSUKF with both retrieval modes.The median of the RMSE distribution with DEnKF(25) is also lower, but only for the h-h mode.However, the 75th and 95th percentiles almost redouble when the ensemble size is reduced from 35 to 25 members, leading to an increase in the skewness of the error distribution, and hence to an appreciable decrease in the accuracy of the DEnKF.The DEnKF(12) exhibits RMSE percentiles higher than those of the DSUKF, except for the 5th percentile, whose RMSE is slightly smaller in the θ-θ mode and it is equal in the h-h mode.However, it should be pointed out that the DEnKF, unlike the DSUKF, is not affected by any structural error.Indeed, DEnKF is implemented with the same nonlinear numerical solver of the Richards equation used for generating the synthetic "true" data, while DUSKF is implemented with a CN scheme, as illustrated in Sect.2.2.As shown by Luo and Moroz (2009), high sampling errors, resulting from a small ensemble size, produce high biases and spurious modes.These sampling errors make DEnKF(12) unfeasible, due to their detrimental impact on both precision and accuracy.For a considerable number of simulations, most of them with initial n values close to 3, DEnKF(12) fails to propagate correctly the first and second moments of the states and parameters.
A favourable aspect of DEnKF is that the computational time can be reduced by the use of parallel computing.Kumar et al. (2008), for example, found that the execution time of the EnKF with 12 members, decreased about threefold when using four processors instead of one.In contrast, DSUKF is not completely scalable with available computational resources.

Parameter identifiability
Figure 7 shows the retrieved parameters K s , α and n, using both the h-h and θ-θ retrieval modes.These graphs depict the evolving patterns with the two alternative resolutions (OD = 10 cm, AF = 1 d −1 , and OD = 2 cm, AF = 1/5 d −1 ).
The "true" value of α is rapidly identified during the retrieval process in both modes, but in particular using matric heads.Parameter α is also the least affected by the initial guess of the parameters under scrutiny.Indeed, since parameter α acts as a scaling factor of the state values in the soil hydraulic property functions, its retrieval is highly sensitive to the convergence rate of the first moment of the state vec-tor.Vrugt et al. (2001Vrugt et al. ( , 2002) ) found that most of the information on α is embedded in soil water content observations just beyond the air entry value of the soil.Accordingly, in the present study, the identifiability of α is probably favoured by the relatively wet states explored in the initial stage of the synthetic experiment.
The identifiability of parameter α is seemingly also related to the relative position of the observations in the soil profile, depending on the type of simulated process.Ritter et al. (2004) performed a sensitivity analysis of three state variables (soil moisture, matric head and bottom flux) to the VGM parameters, using a soil profile with four soil horizons, and found that the average sensitivity of parameter α was higher than that of parameter n by about a factor of 2, particularly for the uppermost horizon.For the deeper horizons, instead, the sensitivity to n was almost three times higher than that of parameter α.This is an interesting aspect, particularly for the issues related to near-surface observations.
Convergence toward the true n is more delayed as compared with α.A close inspection of the time series of the retrieved parameters reveals that the convergence of n for the h-based form is mainly driven by the relatively abrupt reductions in soil moisture on about the 50th, 65th, 80th, 100th and 110th days.These gradients generally induce pronounced shifts on the updated n values for sets S4 and S5, with an initial n = 2.6 (see Fig. 7a), while more moderate shifts for sets S1, S3, and S6, which provide systematic underestimations of n.When parameter retrieving does not account for the logarithmic transformation of the matric heads, the sharp decrease in the state variable, taking place on the 110th day, induces in some cases a failure in the retrieval algorithm.As shown by Vrugt et al. (2001Vrugt et al. ( , 2002)), most of the information on n is embedded in observations whose matric heads are located well beyond the inflection point of the soil water retention function.
Using the h-based form of the Richards equation, we generally observe relatively large differences between the evolving patterns stably adopting n < 2 or n > 2, after the "erratic" first few updates.This behaviour deserves further attention in future studies.We ascribe these differences to the change in the shape of the soil water capacity, C(h), and of the hydraulic conductivity, K(h), functions, both linked to the governing equation (Eq.1), when n changes from n < 2 to n > 2 near saturation, as addressed by Vogel et al. (2001).
The n values retrieved in the θ-θ mode exhibit low sensitivity to the cited sharp soil moisture gradients.The reduction of the posterior uncertainty in this mode is clearly lower than in the h-h mode for daily assimilations (Fig. 7c), and very small for assimilations every 5 days (Fig. 7d).The convergence is more greatly affected by decreasing AF, although the corresponding RMSE values of the retrieved state profiles appear to be rather insensitive to it.The overall performance is affected by the slower convergence of S5.Nevertheless, even for the low-resolution scenario, the method always provides convergent solutions.
However, the saturated hydraulic conductivity, K s , was not correctly identified in the course of the 150 days of simulation.According to Wöhling and Vrugt (2011), in situ measurements of soil water dynamics contain insufficient information to warrant a reliable estimation of the soil hydraulic properties.The poor performance in terms of retrieved K s seems to be a confirmation of their results.To solve this problem these authors suggest considering soil moisture and matric head data simultaneously as part of the statistical inference of the soil hydraulic parameters.
One question arising is whether parameter K s is necessarily more prone to problems of identifiability than parameters α and n, given their role in the VGM relationships, or if this is solely a result of the adopted experimental conditions.The limited variability of the observations being assimilated is definitely a factor that can affect a proper identification of K s .Several authors highlighted the limitations for a successful estimation of VGM parameters, as imposed by the narrow variability of naturally occurring boundary conditions (Vrugt et al., 2001(Vrugt et al., , 2002(Vrugt et al., , 2003;;Scharnagl et al., 2011).A wide range of soil moisture states is required to constrain the soil hydraulic functions reliably.Moreover, the use of a single metric also conspires against the desired identifiability, as pointed out by Vrugt et al. (2013).
Another possible reason is the fact that the soil water retention parameters also feature in the hydraulic conductivity function, thus enhancing the occurrence of high correlations among the model parameters.A strong correlation is found between retrieved parameters n and K s .It is known that this strong interdependence also affects the performance of the VGM model.Especially for certain soil types, Romano and Santini (1999) showed that more successful inverse modelling results can be achieved by decoupling the hydraulic conductivity function from the water retention function.As a strategy to reduce the relative uncertainty, Scharnagl et al. (2011) suggested that the parameter K s should be assessed soon after rainfall events, when soil moisture redistributes more rapidly in the entire soil profile, being essentially driven by gravity.
Table 4 provides further insights into parameter identifiability.We illustrate the performance of the adopted approach when the simulations involve only one or two uncertain parameters, as compared with the original method considering three uncertain parameters.We compare the average RMSE of the results obtained with the six sets of parameters within the last 15 days, respectively using open loop simulations, the state retrieval algorithm SKF, and the DSUKF, both with OD = 10 cm and AF = 1 d −1 , and with OD = 2 cm and AF = 1/5 d −1 .
Under the conditions adopted for this experiment, the open loop simulations highlight that the uncertainty in the individual parameters, in particular that of α and n, has a very different impact on the overall uncertainty, depending on the adopted retrieval mode.The retrieval of matric heads is mainly affected by the uncertainty in parameter α, with an RMSE (1.788) more than five times higher than that with n and K s .The soil moisture estimations are preponderantly influenced by the uncertainty in parameter n, whose RMSE (1.059) is more than three times higher than those computed considering the other two parameters uncertain.In both retrieval modes, K s uncertainty has only a limited impact.
When considering two uncertain parameters, the influence of those pairs involving uncertain values of α in the h-h mode and n in the θ-θ mode is also predominant.In both cases the assumption of uncertain α and n parameters gives rise to the highest RMSE.As can be seen in some cases, the uncertainty of one of these dominant parameters could cause a detrimental effect, similar to that provoked by the combined uncertainty of two or even all three parameters.This result is clear evidence of the marked correlation between them.Again, simple state retrieving always provides poorer results.
Careful inspection of the DSUKF behaviour provides further insights into the non-identifiability of K s .When we consider only the uncertainty of K s , the method correctly converges to the true value.This can be inferred from the considerable reduction of the RMSE when using the DSUKF for daily assimilations (0.035 for the h-h mode and 0.040 for θ-θ ), as compared with the analogous statistics for the open loop solution (0.3 and 0.334, respectively).However, when considering two unknown parameters, we observe (not shown here for the sake of brevity) that of the two pairs including an unknown K s , the one with the most explanatory parameter (θ for matric heads and n for soil moisture) still fails to reach the convergence of K s .For example, when using matric heads and we consider K s and α uncertain, the method still fails to find the correct K s , while when considering K s and n uncertain, all parameter sets tend to converge to the same solution.
We have verified that the states exhibit a cross-covariance with K s two or three orders of magnitude lower than that formed with the other two dominant parameters.Thus, the Kalman gain scarcely affects the prior K s values consistently.We have also verified that both the DEnKF(12) and the DEnKF(50) come across the same problem.However, the DEnKF with 500 ensemble members converges to the true value.Nevertheless, even in this case, minimal deviations of αwith respect to the true value cause some shifts of the evolving K s around the true value.The problem of the identifiability of K s appears principally related to the unavailability of an effective cross-covariance between states and K s , which can suitably reflect the effect of the uncertainty of K s (at least   within the temporal extent of this experiment), making the model poorly sensitive to the errors of this parameter.The box plots in Fig. 8 illustrate the posterior uncertainty of the parameters estimated with the DSUKF and the DEnKF, considering the 500 randomly chosen sets of initial parameters described in the previous section.The uncertainty ranges of the proposed method are roughly comparable to those obtained by Moradkhani et al. (2005) within the first 150 days of simulation.As emphasized, both methods fail to identify parameter K s correctly.The decrease in accuracy by assuming only 12 ensemble members, especially during matric head retrieving, is also considerable.Both accuracy and precision of the estimated n values are superior using DEnKF under the h-h mode, except for ensembles of 12 members (Fig. 8a).However, for the same scheme, the median of parameter α provided by the DSUKF is less biased than that provided by the nonlinear approach, which overestimates the true value.The performance of the DSUKF is comparable to that of the DEnKF(25), although the interquartile ranges of α and n are slightly smaller with the latter method.The uncertainty regions of the estimated parameters in the h-h mode are narrower than in the θ-θ mode (Fig. 8b).In this latter case, the DEnKF(25) provides wider uncertainty bounds and less accurate estimates of parameters α and n than the DSUKF.
It could be argued that, given the scope of the present study, a classical calibration method could also provide similar results.For a more reliable analysis of the pros and cons of the DSUKF, we also implemented gradient iterative algorithms based on the Levenberg-Marquardt algorithm (Kool and Parker, 1988), considering the six sets of parameters S1-S6 in the h-h mode.The algorithms were implemented by exploiting the solvers embedded in the Matlab ® optimization toolbox.
A major drawback of this technique is its computational time, which is about 30 times larger than that of the DSUKF.About 1 week was needed to generate the a posteriori distri-bution from the 500 sets of initial parameters.The performance was also poorer in terms of identifiability.The retrieved n values varied between a minimum of 1.64 for S2 and a maximum of 2.23 for S5, while α varied between 0.0055 for S1 and 0.012 cm −1 for S6.K s remains clearly not identified.The identifiability problems of traditional approaches like this have been documented in the literature (e.g.Kool et al., 1987;Romano and Santini, 1999;van Dam, 2000).Finally, a third difficulty is that these variational methods, although they can be easily implemented, demand some expertise for suitably tuning the parameters involved in the numerical solvers (e.g.tolerance threshold values applied in the numerical algorithm).

Influence of the type of observed variables with respect to the selected state variables
The analyses in the previous section focused on the performance of the h-h and θ-θ retrieval modes, i.e. when observed and retrieved variables are of the same type.This allows implementing a linear observation equation (Eq.2), with a standard Kalman filter for state retrievals.Nevertheless, part of the study also focused on the relation between the type of assimilated data and the h-based form or θ-based form of the state equation.In principle, the numerical algorithm can be structured to assimilate soil moisture observations (or some information linked to it) in the h-based form of the Richards equation by dealing with a nonlinear observation equation, above referred to as the θ-h retrieval mode.This issue can be frequent, given the structure of many widely used simulation models as well as the type of information provided by current remote sensing techniques and ground-based sensors.
At this point, it is important to note that the inversion of the observation variable, i.e. converting soil water contents to matric pressure heads by means of a water retention function with guessed (wrong) parameters, would be a serious mistake, because the observations would be significantly biased, incorporating an unpredictable error in the retrieval algorithm.By contrast, a nonlinear relationship for transforming an exogenous observation variable (such as soil surface temperature from thermal infrared remote sensing) in soil moisture can be directly employed prior to applying the observation operator H k in Eq. ( 11).
The effect of dealing with a nonlinear observation operator within the retrieval algorithm is illustrated in Fig. 9. Figure 9a shows the retrieved parameters using an hourly assimilation frequency, with the h-h retrieval mode.Figure 9b shows the analogous results, but with the θ-h retrieval mode, i.e. by assimilating soil moisture observations and using matric heads as state variables.This last case requires the VGM analytical model to be used as a (nonlinear) observation equation for mapping the predicted matric heads into the soil moisture space.The unscented algorithm is also employed for dealing with the nonlinearity of the observation equation, similarly to what is done for retrieving the parameters.During each time step we propagate the a priori mean and covariance of the matric heads using the SKF, as currently done in the h-h and θ-θ retrieval modes.When the observation vector becomes available for assimilation, we sample the predicted state variable around the mean, following the UKF precepts, using the estimated a priori covariance, P − k .The sample state vectors are propagated through the VGM expression using the a posteriori mean of the parameters at time k − 1.Then the crosscovariance between predicted states and predicted measurements, P xy,k , and the auto-covariance of these predicted measurements, P yy,k , are estimated analogously to what is done for the parameters (Eqs.20, 22, respectively).The Kalman gain is then estimated as K k = P xy,k P υυ,k −1 , from which the estimations of the a posteriori state mean and covariance are straightforward.
Such linearization clearly incorporates a certain amount of error, which affects the overall identifiability of the unknown parameters.Even the identifiability of parameter α is affected by this assimilation strategy.With low assimilation frequencies the algorithm is subjected to persistent failures.
According to such results, while applying the DSUKF, the state variable and observation variable should be preferably of the same type, either in the h-based form or in θ-based form, to avoid the need to linearize the observation equation (Eq.11) with respect to the states.
Finally, it is useful to see that in an extended Kalman filter framework, the non-zero entries of the linearized observation operator H k would correspond to the hydraulic capacities C(h), evaluated in the prior state values x− k , at the observation nodes.This provides an idea of the unpredictability of the uncertainty due to the linearization process, as this is strongly influenced by the soil properties.

Influence of the initial covariance matrices
The DSUKF algorithm, like its analogous approaches, requires initial values for the state covariance, P, and the parameter covariance, P w .The effects of the initial state covariance matrix, P, and of the noise covariance matrices Q and R on the assimilation scheme are clear and have been widely examined (see for example Walker, 1999;Nelson, 2000).The values that should be used for the initial parameter covariance P w and the artificial noise covariances Q w and R w , are less clear and depend on several factors (Nelson, 2000).
An initial value 10 −2 for the diagonal entries of P w,0 performed well for most of the cases, with both soil water contents and matric heads as state variables, as found by Nelson (2000), who also employed normalized parameterization.Once the normalized P w is fixed, the values of the noise covariances depend on the variance of the data, and hence of the state variable.
The effect of evaluating different scenarios accounting for the variability of the parameter noise covariances is illustrated in Fig. 10, showing the daily retrievals of parameter n, using OD = 10 cm. Figure 10a   from 2 to 4 and from 6 to 8, respectively, considering three values of the forgetting factor (λ RLS ) for the h-h mode and three values for the θ-θ mode.Assuming λ RLS = 1 entails no process noise Q w for the parameters, while a small λ RLS incorporates a significant noise in the retrieval process.
The prediction error covariance Q w is a key variable, having effects on parameter retrieving for longer time intervals, and it is decisive in convergence and tracking.For the adopted h-based form, where parameter retrieving considers the log transformation of the states, λ RLS = 0.975 is appreciated as a fair value.However, when assuming the original (i.e.non-transformed) state values, we observe that a similar value of λ RLS compromises the stability in some cases, due to the added volatility.Setting λ RLS = 0.95 definitively adds too much error to the estimated parameters.Using the θ-θ mode, a value of around λ RLS = 0.9999 produces good results (Fig. 10b), while λ RLS = 0.995 improves the convergence, although it affects the stability in a few cases.An intermediate value between 0.995 and 0.9999 could be a good choice.
Van der Merwe (2000) and Wan and van der Merwe (2001) suggested two other options on how to choose the matrix Q w .One is to set Q w equal to an arbitrary "fixed" diagonal value, which may then be annealed toward zero as training continues.Another choice is to apply a Robbins-Monro stochastic approximation scheme for estimating the innovations (see Wan andvan der Merwe, 2001 andvan der Merwe, 2004, for details about the applied expression).Our preliminary analysis suggests that the most efficient approach is to use the method currently adopted, i.e. the forgetting factor, although further research is required to address this issue.
The effect of the artificial observation covariance R w for each retrieval mode can be evaluated by comparing the results obtained with scenarios 1, 3 and 5 (Fig. 10c), and 8, 9 and 10 (Fig. 10d).As stated by Nelson (2000), R w acts as a scaling term, determining the relative influence of the initial covariance P w,0 on later covariance matrices P w,k .For a prefixed P w,0 , a large R w produces a more stable (i.e, lower variance) behaviour, but this produces significantly biased estimates of w k for small time steps.A very small R w exposes the algorithm to retrieve parameters toward the corresponding limiting values, undermining its stability and convergence.
Using the h-h mode (Fig. 10c), R w = 10 −3 has been found appropriate for many of the examined cases.Again, R w = 10 −4 improves the convergence but at the price of lower stability.When retrieving soil moisture (Fig. 10d), R w has been set equal to 10 −5 .R w = 10 −3 is too large, since with this value the Kalman gain K w adopts a relatively low value, thus allowing for less variability in the estimations.R w = 10 −4 is also a suitable value; instead R w =10 −6 persistently induces the collapse of the system.
In general, higher R w values are required for a larger variability of the retrieved variable involved.As a general guideline, we observe a roughly linear relationship between the log of the initial state value and the log of the adopted R w , fulfilling the relationship R w = 4.23 × 10 −5 x 2.04 0 , x 0 being the value of the initial states.That said, the selection of proper values for λ RLS and R w deserves more attention in further studies, because they act as important stability factors, particularly when using matric pressure heads.

Conclusions
This study presented a DSUKF formulation for the simultaneous retrieval of states and parameters controlling the soil water dynamics in a homogeneous soil column, by assimilating near-surface observations into the Richards equation.The proposed approach takes advantage of the standard Kalman filter applied to a linear numerical scheme of the Richards equation for straightforward retrieval of the states within a linear system of small dimension, and of the unscented Kalman filter for retrieving a small number of soil hydraulic parameters defined according to the nonlinear VGM relations.A transformation of variables is used to deal with the physical constraints and the marked differences in the range of variability of the VGM parameters, thus improving the operational stability.
The unscented approach deals with the nonlinearity of the model with respect to the parameters, without the need to perform any analytical differentiation, thus making the computational implementation simple and of general applicability, i.e. independent of the analytic equations employed.
By means of a synthetic experiment, we showed that the DSUKF, with an ensemble of seven sigma points of the parameter space, provides predictions with an accuracy similar to that provided by a dual ensemble Kalman filter with an ensemble size of 25 members (DEnKF(25)), but with a computational time three times smaller.The DEnKF guarantees more accurate states and parameter predictions than the DSUKF, with an ensemble of 35 or more members, but at the cost of a large increase in computational effort.The study also demonstrated that DEnKF(12), whose computational expense is slightly larger than that of the DSUKF, does not ensure a substantial reduction in the posterior uncertainty of states and parameters due to the relatively large sampling errors involved in this scheme.
Except for the case with 12 ensemble members, the DEnKF provided less biased estimates of parameter n.Instead, with the DSUKF, we achieved less biased estimates of parameter α.The performance in reducing the uncertainty of K s was poor, both for the DSUKF and the DEnKF, even with 50 ensemble members.
The dual Kalman filter approach is able to retrieve states close to true values, even for observations at very shallow depths, i.e. using observation depths of 2 cm and an assimilation frequency of one every 5 days.Comparison between parameter initialization, observation depth and assimilation frequency, evidenced that the latter has the most dominant effect on the evolving errors.
The problem associated with the choice of either the hbased form or θ-based form of the Richards equation in the dual Kalman filter algorithm was explored.The former scheme is generally preferred for practical applications, particularly when having to deal with both saturated and unsaturated flows.The matric head retrieval algorithm outperformed that using soil water content in terms of state convergence and final accuracy.However, to avoid some stability problems in this mode, a log transformation of the predicted model observations and measurements was needed prior to applying the parameter equations.
The assimilation of near-surface soil moisture observations recalls some considerations about the system sensitivity to the VGM parameters, at least when the system is initialized with wet conditions.The identifiability of parameter α is markedly higher than that of n, particularly when using matric pressure head as state variable.The impact of the uncertainty of parameters α and n depends on whether the state and observed variables coincide with the soil water content (θ-θ retrieval mode) or with the matric pressure head (h-h retrieval mode).The retrieved matric head profiles are markedly influenced by the uncertainty of parameter α, while the retrieved soil moisture profiles are preponderantly influenced by the uncertainty of parameter n.
The identifiability of the saturated hydraulic conductivity K s is very poor in all cases, unless K s is the only uncertain parameter.This limitation in the identifiability of K s can be ascribed to the strong correlation between the parameters (not shown for the sake of brevity), and particularly between n and K s , which in turn also causes the cross covariance between states and K s to become too poor to make the model sensitive enough to the errors of this parameter.The limited identifiability of K s in a dual framework was also experienced with the DEnKF.A large ensemble size (in the order of 500) is required to achieve the convergence of K s , but the solution remains vulnerable to slight oscillations of the other two parameters (for instance of α in the h-h retrieval mode).These results suggest the advisability of employing other analytical models for the soil hydraulic functions, representing the hydraulic conductivity decoupled from the retention function.
By examining different combinations of retrieved and assimilated variables, the study also highlights some other issues to be considered beyond the efficiency of the dual Kalman filter approach.In this synthetic experiment, the performance of the overall retrieval process is significantly dampened when adopting soil moisture as the observed variable and matric pressure head as the state variable, even when observations are assimilated hourly.
Finally, we point out that the implementation of a dual (or a joint) exercise for state and parameter estimation demands more caution, as compared with a standard KF approach, particularly with respect to the initialization of the covariances, not only of the parameters, but even of the states.This can be seen as the price to be paid for achieving more accurate results.
provides a comprehensive representation of the relative performances of the Kalman-based retrieval algorithms with respect to the reference scenarios.The colour scale indicates the logarithmic of the ratio between the RMSE of the examined assimilation algorithm and the RMSE of the open loop solution, both averaged among the six parameter sets.The average RMSE values retrieved in the h-h and in the θ-h modes are divided by the average RMSE open loop value obtained with the hbased form.The average RMSE values retrieved in the θθ mode are divided by the value obtained with the θ-based form.Thus, looking at the different retrieval modes, only the cells of the grid referring to the h-h and the θ-h modes can be directly compared.The SKF method, involving only state retrieval, gives estimations that can be considerably poorer than the open loop solutions.Instead, both the proposed and the established dual www.hydrol-earth-syst-sci.net/18

Figure 5 .
Figure 5. Ratios of the average RMSE, involving the last 15 days of simulations and the six initial guessed parameter sets, computed between contiguously sampled (a, c) observation depths (OD) and (b, d) assimilation frequencies (AF), with the DSUKF algorithm.

Figure 6 .
Figure 6.Cumulative probability distribution of the RMSE of DSUKF and DEnKF with 50, 35, 25 and 12 ensemble members for both the h-h and θ -θ modes.

Figure 8 .
Figure 8. Box plots representing the posterior uncertainty of the parameters by performing DSUKF and DEnKF for 500 randomly chosen sets of initial parameters considering (a) the h-h (b) and θ -θ modes.
and b account for scenarios

Table 1 .
Values of the true parameters K s , α and n, the minimum, w min , and the range, w max -w min , used to constrain their distribution, and the resulting six sets of input values considered during the assimilation process.

Table 2 .
Summary of numerical experiments involved in the performance analyses.

Table 3 .
Scenarios for assessing the effect of the artificial noise variance in parameter state space equations.