Variably saturated subsurface flow models require knowledge of the soil hydraulic parameters. However, the determination of these parameters in heterogeneous soils is not easily feasible and subject to large uncertainties. As the modeled soil moisture is very sensitive to these parameters, especially the saturated hydraulic conductivity, porosity, and the parameters describing the retention and relative permeability functions, it is likewise highly uncertain. Data assimilation can be used to handle and reduce both the state and parameter uncertainty. In this work, we apply the ensemble Kalman filter (EnKF) to a three-dimensional heterogeneous hillslope model and investigate the influence of updating the different soil hydraulic parameters on the accuracy of the estimated soil moisture. We further examine the usage of a simplified layered soil structure instead of the fully resolved heterogeneous soil structure in the ensemble. It is shown that the best estimates are obtained when performing a joint update of porosity and the van Genuchten parameters and (optionally) the saturated hydraulic conductivity. The usage of a simplified soil structure gave decent estimates of spatially averaged soil moisture in combination with parameter updates but led to a failure of the EnKF and very poor soil moisture estimates at non-observed locations.

Numerical models of the unsaturated zone are very sensitive to the soil hydraulic parameters

However, the quantification of these parameters is very difficult

Another possibility for the quantification of the soil hydraulic parameters is to solve the inverse problem (i.e., finding an adequate set of parameters that best reproduces given observations for a quantity of interest, such as the soil moisture). This is an optimization problem and can be done by applying one of the manifold of existing optimization algorithms. The advantage of this method is that the optimization can be applied directly for the given initial and boundary conditions. The disadvantage is that the parameter estimation problem is often ill-posed; this is caused by either non-uniqueness or instability

It is thus clear that the parameter uncertainty cannot be eliminated and needs to be taken into account in the modeling process. According to

A very popular data assimilation method is the ensemble Kalman filter (EnKF).

Another data assimilation method used to account for parameter uncertainty is the particle filter. Although the particle filter is better suited for nonlinear models and does not require Gaussian distributions, its high computational demand rather limits its application to conceptual models

Many studies put their focus on the accurate estimation of the soil hydraulic parameters, like

From these studies it can be established that parameter updates in filters can hardly be used to estimate the soil hydraulic parameters under field conditions. Regardless, the parameter updates can have a positive impact on the state estimates when making predictions. This was shown by

While parameter updates have been applied to land surface models

To the best of our knowledge, there are no studies on how to treat the uncertain soil hydraulic parameters in three-dimensional heterogeneous subsurface models in data assimilation. The parameters are either excluded entirely from the update or a (sub)set of the parameters is included (whose choice is not further motivated). The effect of updating different combinations of parameters on the soil moisture estimates is not analyzed. In

The remainder of this paper is structured as follows: the next section explicates the governing equations of the hydrological model and the ensemble Kalman filter. Additionally, details are given on the specific implementation of the EnKF required for our simulations. Section

The flow problem is solved with the software ParFlow

The van Genuchten–Mualem model

If the water level rises above the land surface, then the kinematic wave equation is solved to model overland flow as follows:

We use the ensemble Kalman filter (EnKF) introduced by

First, we need to define our model system, which consists of the model states

States, parameters, and observations are linked via the measurement operator

The model and the observations are merged in the so-called analysis step, which is performed at every observation time at which an observation is available. An analysis step is preceded by a forecast step, which integrates the augmented state ensemble from the previous to the current observation time, according to Eq. (

Equations (

The update of the forecasted ensemble in Eq. (

states and parameters that show strong correlations to the observations;

ensemble members whose simulated observations differ significantly from the measured value; and

the entire ensemble when the measurement error is small.

Due to the usage of an ensemble of finite-sized

There exist different approaches to handle these issues. The problem of spurious correlations can be helped by either using a large-enough ensemble or by applying localization. Localization reduces or eliminates entries in the covariance matrix, where only small or no correlations are to be expected (e.g., when the spatial distance to the observation location is too large). As we did not encounter spurious correlations or a positive effect of localization, we assume the ensemble size in our experiments to be large enough so that localization is not needed.

However, we already noticed a strong reduction in the ensemble spread during the first analysis steps, especially when performing parameter estimates. While the application of inflation, where the ensemble perturbations are artificially increased by a small factor (usually 1.01) after every analysis step, led to instabilities in our ensemble, we achieved better results by applying a dampening factor

Note that the inverse of Eq. (

Nevertheless, we had to deal with non-converging ensemble members during our simulations (i.e., ensemble members for which the numerical flow simulation to the next time step did not converge after the update). This issue is specific for soil models and has been seen very often in data assimilation with soils

The Parallel Data Assimilation Framework

We investigate the effect of parameter updates on soil moisture estimates in a three-dimensional hydrologic hillslope model, as shown in Fig.

Dimensions and topography of the hillslope model. The blue line denotes the outlet.

Net precipitation (negative) and evaporation (positive) time series.

It shall be pointed out that imposing such a moisture-independent flux on the surface (especially in the case of evaporation) is not realistic and may cause numerical issues. Yet, as already mentioned in Sect.

The subsurface is divided into different layers. The lower 18.6 m of the subsurface consists of low-permeability homogeneous bedrock, while the upper 1.4 m comprises either one or two (

Values of the soil hydraulic parameters for the different models. The values in parentheses are the standard deviations of the parameter distributions.

The domain is discretized horizontally into a

The observations needed for the data assimilation are obtained from reference runs with the same numerical model and a deterministic set of parameters. The usage of synthetic observations instead of field data allows the elimination of all unwanted sources of uncertainty, like model error, structural error, and uncertain forcing, that most probably would strongly impact the state estimates. Furthermore, it offers full knowledge of all state and parameter values at each point in the domain and of the measurement error. To run the numerical reference model, we need to define a deterministic set of parameters which we deem the true parameters.

We set up two reference models for our experiments. The first model considers a homogeneous soil layer above the bedrock. Its values are given in Table

Correlation coefficients of the soil hydraulic parameters.

The initial conditions for both reference models are generated by spin-up runs by repeatedly applying the same forcing (Fig.

An ensemble consisting of 100 members is used for the data assimilation. Each ensemble member is an identical copy of the reference run, with only the soil hydraulic parameters

The observations are obtained from the reference model runs described in the previous section. We use observations of soil moisture, which are taken hourly at four measurement locations, as shown in Fig.

Observation (1–4) and validation (5 and 6) locations. The creek is plotted in blue.

The analysis step is performed according to Eqs. (

We generate three ensembles that differ in terms of their soil setup. The first one considers one homogeneous soil layer that is similar to the homogeneous reference model. The ensemble mean and standard deviation (in parentheses) for each parameter are listed in Table

The second ensemble is based on the heterogeneous reference model with two heterogeneous soil layers. Each parameter field is generated with a random field generator constrained by the mean and standard deviation (in parentheses) given in Table

It should also be noted that we consider only well-behaving heterogeneity, so this includes multi-Gaussian fields with the horizontal correlation length

In the third ensemble, two soil layers are again considered. While the depth of the layers is the same as in the heterogeneous reference model, the soil layers are homogeneous in this case. The parameter values are sampled from Gaussian distributions, with the mean and standard deviation (in parentheses) as given in Table

We assume the depth of the layers to be known. This is a reasonable assumption, since this information can be obtained from a borehole sample and is not expected to vary significantly within such a small domain. Nevertheless, we are aware that this can be a relevant source of uncertainty, especially for large-scale models. Yet, this is not part of this work, and we refer to existing studies on this subject

We perform three test series with the described reference models and ensembles. The first test series involves the homogeneous reference model and the homogeneous ensemble. It serves as a test bed for the implemented parameter updates and analyzes the impact of parameter updates under two-dimensional flow conditions in the absence of soil heterogeneities. Even though the model is three-dimensional, the topography and the homogeneity of the subsurface cause a quasi-two-dimensional flow field.

Real soils are certainly not homogeneous but heterogeneous. As outlined in the introduction of this work, this leads to model uncertainty. We want to investigate how to deal with this uncertainty when the goal is to obtain decent predictions of soil moisture. Hence, for the other test series, the heterogeneous model is used as a reference. Often, the heterogeneity is neglected. We compare this approach to resolving the heterogeneity. This will most probably lead to wrong estimates of the parameter fields, yet fields with a similar heterogeneous structure. Therefore, in the second test series, the heterogeneous ensemble is used. With these experiments, on the one hand, we want to investigate how different parameter updating strategies could improve soil moisture estimates in a three-dimensional heterogeneous model, which would be the case for any field application. On the other hand, we want to compare the estimates to those obtained when the heterogeneous structure is neglected. These are the results of the third test series, where we use a simplified (layered) soil structure in the ensemble and try to represent the soil moisture of the heterogeneous reference model. Using a layered structure with homogeneous layers would significantly reduce the size of the augmented state vector but may lead to less accurate state estimates.

As we want to identify the parameters which lead to the best soil moisture estimates when included in the update, we perform joint updates of soil moisture

The results of the data assimilation runs are compared with regard to the soil moisture estimates by means of the spatially and temporally averaged root mean square error RMSE (–) at the observation and validation locations, respectively, as follows:

We also look into the parameter estimates. These can help us understand the performance of the filter with regard to the soil moisture estimation, as they can be, e.g., a reason for filter divergence. The estimated parameter fields of the heterogeneous test case are compared at the point scale and at the field scale. At the point scale, the normalized mean deviation, NMD (–), which is a measure of the discrepancy between reference and estimated fields at each grid point, is used for comparison as follows:

At the field scale, the reference and estimated parameter fields are compared in terms of their statistics. For this purpose, the normalized mean value, NMV (–), is as follows:

Soil moisture over time for the homogeneous scenario. Values are taken at one observation and one validation location, each at

Figure

At the validation location (Fig.

To evaluate the overall performance of the different parameter updates, the mean RMSE values, as given by Eq. (

Spatially and temporally averaged RMSE values at the observation and validation locations for the homogeneous scenario and the different parameter update combinations. The first entry (–) denotes the data assimilation run, where only soil moisture is updated. The lower plot gives the number of converging ensemble members. The best values are highlighted by larger markers.

The lower plot of Fig.

The probability density functions (pdf's) of the soil hydraulic parameters are shown in Fig.

Probability density functions (pdf's) of the parameters for the homogeneous scenario and the joint update of all parameters.

From this simple test scenario, we can draw the following conclusions:

The implemented parameter updates and the data assimilation work properly.

Even in such a simple scenario with a few biased initial guesses, the parameters do not fully converge to their true values.

Small parameter errors already lead to visible deviations of the soil moisture estimates.

The last point can be seen in Fig.

The heterogeneous test case is discussed next. With this, we want to test the transferability of the findings in simple, homogeneous settings to more realistic, heterogeneous ones. The soil moisture over the time of this test case is plotted for one observation and one validation location in Figs.

Soil moisture over time for the heterogeneous scenario at one observation location. Values are taken at

At the validation locations (Fig.

Soil moisture over time for the heterogeneous scenario at one validation location. Values are taken at

Estimated probability density functions (pdf's) of the four parameters for the heterogeneous scenario at one observation and one validation location and

Comparing all parameter update combinations based on the averaged RMSE values (Fig.

Spatially and temporally averaged RMSE values at the observation and validation locations for the heterogeneous scenario and the different parameter update combinations. The first entry (–) denotes the data assimilation run where only soil moisture is updated. The lower plot gives the number of converging ensemble members. The best values are highlighted by larger markers.

The results at the validation locations in comparison to those of the homogeneous test case show that the soil heterogeneity has a large impact on the filter performance at locations far from the observations. The heterogeneity is characterized by the correlation lengths of the parameter fields. In this case, the distance of the validation locations to the observations (

Spatially and temporally averaged RMSE values at horizontal distance

Due to the grid resolution (

In the following, we look into the estimated parameter fields. Figure

Normalized mean deviation (NMD) of the estimated parameter fields compared to the reference field for the two soil layers of the heterogeneous scenario. The error bars denote the standard deviation.

Nevertheless, the updates are able to recover some features of the reference parameter field, which can be seen in Figs.

Horizontal cut of the saturated hydraulic conductivity field at

Vertical cut of the saturated hydraulic conductivity field at

Hence, we now compare the reference and the estimated parameter fields in terms of their statistics. Figure

Statistics (mean and standard deviation) of the parameter fields of the two heterogeneous soil layers. All values are normalized by the mean value of the reference field. The statistics of the ensemble runs refer to the estimated parameter field (ensemble mean).

Such a large uncertainty in the parameter estimates should be represented by a large ensemble spread. The ensemble spread can be illustrated by the cumulative density functions (cdf's) of all parameter values in the ensemble, which are shown in Fig.

Cumulative density functions (cdf's) of the parameter values for the two heterogeneous soil layers. The solid line denotes the cdf of the reference parameter field. For the initial (dashed) and estimated (dash-dotted) ensemble, two lines are plotted to indicate the minimum and the maximum values of the ensemble.

In addition to the conclusions of the test series with the homogeneous scenario, we can now further summarize as follows:

Small correlations of the observations to parameters at other locations cause only small updates of the latter, thus maintaining a high uncertainty in the soil moisture estimates there.

Consequently, in the presence of soil heterogeneities, soil moisture estimates are less accurate far from the observation locations and a surrounding area, depending on the correlation length of the parameter fields.

Generally, the parameter estimates are better in the lower soil layer that contains the observations. This does not refer to point values but rather to the field statistics.

Point values of the parameter fields differ clearly from the values of the reference field.

The representative variability in the parameter fields is improved by the data assimilation updating the soil parameters.

In the third test series, the soil moisture of the heterogeneous reference run is represented by an ensemble that consists of two homogeneous soil layers. The reason behind this is that, as shown before, heterogeneous fields cannot be retrieved, so the missing information shall not be included in the model. Based on the results of the heterogeneous scenario, for this test series, only two data assimilation runs were performed, i.e., one without parameter updates and one updating all four parameters. Figures

Soil moisture over time for the heterogeneous scenario with a simplified layered soil in the ensemble runs at one observation location. Values are taken at

Soil moisture over time for the heterogeneous scenario with a simplified layered soil in the ensemble runs at one validation location. Values are taken at

At the validation location shown in Fig.

As a consequence, the estimated soil moisture when updating the parameters is even less accurate than when updating only the states which becomes evident in Fig.

Spatially and temporally averaged RMSE values at the observation and validation locations and for the averaged root zone soil moisture for the heterogeneous scenario using a heterogeneous and a simplified layered structure in the ensemble. The first entry (–) denotes the data assimilation run where only soil moisture is updated. The lower plot gives the number of converging ensemble members.

On the other hand, the layered ensemble is numerically more stable, with 97 and 100 converging ensemble members for the two data assimilation runs, respectively (see the lower plot of Fig.

The third plot from the top in Fig.

Root zone soil moisture over time for the heterogeneous scenario.

The (only temporally) averaged RMSE values for the root zone soil moisture in Fig.

The experiments demonstrate the strong influence of the soil hydraulic parameters on the soil moisture estimates. The ensemble spread of soil moisture depends mainly on the parameter spread and cannot be reduced by state updates only. Thus, if there is a large uncertainty in the parameters, the estimated soil moisture will be uncertain as well. The aim of the data assimilation is to reduce the uncertainties in the model. However, the updates can cause a reduction that is too strong in the ensemble spread, which means that the actual uncertainty is underestimated. If the estimates match the true states, then this is not a problem, since the model uncertainty is truly very small. This is, e.g., the case in the homogeneous test case when performing parameter updates (Fig.

Therefore, it is important to prevent an ensemble spread that is too small. This can be achieved by the thorough tuning of the filter properties, especially the dampening factor, the assimilation frequency, and inflation. For test series involving multiple data assimilation runs, this is not feasible, as these settings would have to be optimized for each individual run. Aside from that, different filter settings may decrease the comparability of the different assimilation runs. An optimized filter setup that would, e.g., impede a strong spread reduction would not change the conclusions of these experiments but would make it difficult to identify true and spurious correlations, which are relevant for the analysis of the different methods. Hence, we keep the same settings for all runs, even though they may not be optimal in some cases. Yet, we want to stress that this applies only to the testing of data assimilation, while a rigorous tuning for the assimilation in operational models is indispensable.

One concern when performing parameter updates, and in particular when including the parameter values of each grid cell in the augmented state vector to resolve the heterogeneous structure, is that this increases the required computational resources. This is only partly true. Of course, larger matrices need to be handled, which can either be accomplished by using more workspace (if available) or by a parallelization on multiple cores. As the large number of model realizations in the ensemble is often run in parallel mode in any case, the latter option of handling the matrices would not require any additional resources here. In terms of run times, the parameter updates have revealed a positive influence. For unsaturated flow problems, convergence issues in single members of the ensemble are the main cause for long run times. As we have seen throughout our experiments, the parameter updates increase the numerical stability of the ensemble, which actually reduces the run times.

Thus, there is no real drawback of applying parameter updates. Instead, it is quite the contrary, as updating the soil hydraulic parameters notably improves the soil moisture estimates not only in the homogeneous case but also in the heterogeneous case. In the homogeneous scenario, the update of the van Genuchten model parameters turned out to be crucial for the numerical stability. In the heterogeneous test case, this trend could not be confirmed. In

In the heterogeneous test case, the importance of using a stochastic model becomes apparent. Due to the small correlations of parameters and states at locations other than the observation locations, the former are hardly updated, and the initial ensemble spread is mostly maintained. The estimates at those locations are therefore rather poor, even though an idealized setup was used in which all other unwanted sources of uncertainty were eliminated. This reflects, however, the reality, as at these locations, there is really no information, and they differ from the locations where observations are available. The large ensemble spread indicates the high uncertainty at these locations, making it clear that the estimates there are not reliable. A deterministic model does not quantify the model uncertainty and would claim the wrong estimates to be correct. The small analysis on the radius of the positive influence of the data assimilation suggests that improved estimates can only be expected at distances smaller than the correlation lengths of the parameter fields. To improve the estimates, more information for the assimilation would be needed, e.g., in terms of more observations or remote sensing data.

Given the spatially very limited effect of the assimilation in heterogeneous soils, performing simplified parameter updates by, e.g., Miller scaling, as in

However, when estimating a spatially averaged quantity, in this case the spatially averaged root zone soil moisture, the accuracy when using the simplified soil structure is almost as good as when the fully heterogeneous structure is used. This applies only when the parameters are updated. Updating only the states, the accuracy of the heterogeneous ensemble is clearly better.

From this, one can summarize that one cannot obtain decent estimates of point values when applying a simplified soil structure, but it is possible to give decent estimates of spatially cumulative values, as, e.g., spatial means. In this case, again, the importance of parameter updates becomes evident. Yet, this is supposed to work only if the observations are taken at locations where the parameter values are somewhat representative of the mean parameter value of the domain. If the parameter values at these locations are in the extreme ranges of the parameter distributions, then the estimation of such cumulative values may fail too. Here, we need to point out that the soil structure in the heterogeneous ensemble is also not exactly resolved. While the position of the interface between the layers is assumed to be known, the structure within the layers is not prescribed. The information is contained indirectly in the ensemble, as the parameter fields of the individual ensemble members are created using the true correlation lengths and correct correlations among the parameters. Yet, the correct spatial structure is contained neither in the individual parameter fields nor in the initial guesses. This can be interpreted as applying a finer resolution of the heterogeneous layers with more degrees of freedom than in the reference soil, whereas assigning a layered soil structure means the contrary. Prescribing the correct soil structure could possibly improve the estimates even more. As it is hard to obtain this information in the field, such a setting would be rather unrealistic. On the other hand, prescribing a wrong soil structure could lead to worse estimates, as

In this study, the ensemble Kalman filter was applied to a three-dimensional hillslope model to assimilate soil moisture. The augmented state vector approach was used to investigate the influence of parameter updates on the soil moisture estimates. To this end, two reference models were created, namely one with a homogeneous soil and the other one with two heterogeneous soil layers. These models provided synthetic observations for the assimilation and validation of the data assimilation runs.

A previous sensitivity analysis revealed the saturated hydraulic conductivity, porosity, and the van Genuchten model parameters

One issue that we encountered is that the improvement by the filter updates in heterogeneous soils is mostly limited to the observation locations and a small area around them that depends on the correlation lengths of the parameter fields. Estimates at more distant locations are still highly uncertain after the assimilation. More information is needed to overcome this problem. This can be achieved by using a denser measurement network. However, given the very small radius of influence of point-scale soil moisture observations, it is hardly feasible to install a monitoring network with the required density in a real field application. Instead, the additional assimilation of soil moisture observations from remote sensing and cosmic
ray neutron probes can be an option. Besides, the studies by, e.g.,

Generally, the present study has shown that, whenever the soil structure can be represented accurately in the ensemble (as e.g., in homogeneous soils), parameter updates are able to improve state estimates, with optimally conditioned parameter estimates reducing the model error caused by parameter uncertainty significantly. Yet, soil heterogeneity produces additional uncertainty in the model which needs to be accounted for. In this work, this was done by updating the fully heterogeneous parameter fields. Thus, the assimilation can reduce the model error caused by the soil heterogeneity as much as the observations allow. By applying a simplified soil structure, this error can only be reduced to a very limited extent due to the insufficient degrees of freedom in the ensemble. Besides, at some point, this reduction can most likely not be further improved by assimilating more observations. Localization of the parameter updates, which in principle adds soil heterogeneity to the ensemble, may be beneficial in such cases.

Another open question remains that is related to other error sources in the model. The key message of this work regarding parameter uncertainty is to take the whole lot, i.e., all sensitive parameters and the full heterogeneous soil structure. When using real data, uncertainties may also arise from the boundary conditions and the model error. How to optimally handle these errors and uncertainties is not yet thoroughly examined.

TerrSysMP, including its interface to PDAF, is freely available at

The data are available on request from the corresponding author (brandhorst@hydromech.uni-hannover.de).

Simulations and code enhancements were performed by NB. IN acquired the funding. Both authors contributed to the design of the experiments, the analysis of the results, and writing the paper.

At least one of the (co-)authors is a member of the editorial board of

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The authors acknowledge the computing time that has been provided by the Jülich Supercomputing Centre (

This research has been supported by the Deutsche Forschungsgemeinschaft (grant no. NE 824/12-1).

This paper was edited by Anke Hildebrandt and reviewed by two anonymous referees.