Bayesian inverse modelling of in situ soil water dynamics : using prior information about the soil hydraulic properties

Introduction Conclusions References

two constitutive relationships from laboratory or field experiments.An overview of these methods together with a discussion of their strengths and limitations can be found in Durner and Lipsius (2005), amongst others.With the ever increasing pace of computational power, availability of accurate and stable numerical solution schemes of the governing flow equations, and effective and efficient parameter optimization methods, the use of inverse modelling to determine soil hydraulic properties has become increasingly popular in the last few decades.A review of Vrugt et al. (2008a) discusses recent progress in inverse modelling of soil hydraulic properties.Laboratory methods such as the multistep outflow method (van Dam et al., 1994) have the advantage of being comparatively quick and precise.This allows the processing of a large number of samples, which opens up the possibility to study the spatial variability of the soil hydraulic properties.However, soil hydraulic properties derived from laboratory experiments on small soil cores are typically inadequate to simulate soil water dynamics at larger spatial scales (Ritter et al., 2003;Mertens et al., 2005;Guber et al., 2006;W öhling et al., 2008;Baroni et al., 2010).There are multiple reasons for this discrepancy, most notably that the sample volume analysed in the laboratory is not a representative elementary volume (e.g., Mallants et al., 1997) or that the experimental conditions dictated in the laboratory are not representative for field conditions (e.g., Basile et al., 2003).Arguably, field methods such as the internal drainage method (Zhang et al., 2003) provide estimates of the soil hydraulic properties that are more representative for in situ soil water dynamics.However, such methods place a high demand on equipment and time and are labour intensive.Moreover, considerable difficulties arise when these local scale soil hydraulic properties are used to infer the effective retention and hydraulic conductivity function that characterize soil water dynamics at larger spatial scales (e.g., Smith and Diekkr üger, 1996;Zhu and Mohanty, 2002).Effective properties are defined here as the soil hydraulic properties of an equivalent homogeneous domain that produces the same response as the actual heterogeneous domain under some upscaled boundary conditions (Vereecken et al., 2007) As alternative to laboratory or field experiments, soil hydraulic properties can also be derived from field measurements of soil water state variables under naturally occurring boundary conditions (Vereecken et al., 2008).One main advantage of this approach is that it allows for estimating effective soil hydraulic properties at larger spatial scales given observations at multiple locations within the considered soil domain.
An example of this can be found in Jacques et al. (2002) who estimated the soil hydraulic properties of a four-layer soil profile using pressure head and water content data collected at 12 different locations and 5 depths along a 5.5 m long trench.To reduce problems with parameter uncertainty and limit the dimensionality of the inverse problem, they used a stepwise parameter estimation procedure that sequentially estimates the soil hydraulic functions for each individual soil layer.Another study by Ritter et al. (2003) used soil water content measured at 6 locations and 3 depths within a 4800 m 2 plot to infer the soil hydraulic properties of a three-layer soil profile.They also reported problems in finding well-defined values of the soil hydraulic parameters.These findings inspired Ritter et al. to fix some of the soil hydraulic parameters at values derived from laboratory experiments.Mertens et al. (2004) estimated effective soil hydraulic properties for a two-layer profile using water content observations from 25 locations and 3 depths within a 80 × 20 m hillslope plot.They adopted an informal Bayesian approach using the Generalized Likelihood Uncertainty Estimation (GLUE) method (Beven and Binley, 1992;Beven, 2006) and investigated whether prior data and measurements of spatially distributed soil water content at 36 locations and 2 depths within a 6 × 6 m plot.They implemented a formal Bayesian approach using Markov chain Monte Carlo (MCMC) simulation with the DiffeRential Evolution Adaptive Metropolis (DREAM) algorithm (Vrugt et al., 2008b(Vrugt et al., , 2009) ) and report considerable uncertainty in the estimated soil hydraulic parameters.Particularly, some of the parameters attained physically unrealistic values, which Steenpass et al. attributed to a lack of information in the wet range of the soil hydraulic functions.Finally, W öhling and Vrugt (2011) explored whether the use of two different measurement types of the soil water state could help constrain the soil hydraulic parameters.Indeed, the posterior distribution derived with DREAM demonstrated that joint use of in situ soil moisture content and pressure head data significantly improved the reliability of the soil hydraulic properties.
In summary, the current state of knowledge is that in situ measurements of soil water dynamics contain insufficient information to warrant a reliable estimation of the soil hydraulic properties.One possible solution to this problem is to simultaneously consider multiple soil water state variables.Indeed, the soil hydraulic functions become much better identified when moisture content and pressure head data are jointly used for statistical inference of the soil hydraulic parameters (W öhling and Vrugt, 2011).Unfortunately, this places a heavier demand on measurement resources, and this approach is therefore less appealing to implement in practice.In this paper, we therefore seek an alternative solution and explore whether prior information could help reduce the uncertainty of the soil hydraulic functions.Prior information obtained from pedotransfer functions (PTFs) that predict the soil hydraulic parameters from basic soil data is readily available in most field studies and could hence be used more explicitly than is currently done during inverse modelling of in situ soil water dynamics.This prior information can be used to formulate a informative prior distribution of the soil hydraulic parameters, which might prove useful in constraining parameter uncertainty.The current practice is to use a uniform prior distribution of the soil hydraulic parameters and let the observational data speak and determine parameter and predictive uncertainty.Introduction

Conclusions References
Tables Figures

Back Close
Full Another option to alleviate problems with parameter nonidentifiability is to reduce the dimensionality of the model calibration problem and fix some of the soil hydraulic parameters at some appropriate value (e.g., Jacques et al., 2002;Ritter et al., 2003).This approach is practical but may impair the ability of the soil hydraulic model to accurately describe the experimental data, particularly if parameters are fixed at nonoptimal values.Due to parameter correlation, this fixing may corrupt the estimates of the other soil hydraulic parameters.Moreover, the decision which parameter to fix is usually rather arbitrary, without consideration of the actual information content of the data.For instance, we could end up fixing a parameter whose value is actually well defined by the calibration data.An example of this is the L parameter in the van Genuchten-Mualem (VGM) model (van Genuchten, 1980).The literature indicates that it is most productive to fix this parameter to a value of L = 0.5 (Mualem, 1976) or L = −1 (Schaap and Leij, 2000), whereas it might prove more effective to estimate L directly from the data.If we still decide to fix some of the soil hydraulic parameters, it remains typically difficult to choose appropriate values.This is particularly true for θ r , θ s , and K s in the VGM model.These parameters are generally poorly defined by direct measurements and should therefore considered as fitting parameters (e.g., van Genuchten and Nielsen, 1985).The Bayesian inverse modelling approach presented in this study avoids many of the problems associated with fixing parameters and provides a general-purpose approach to estimating the probability distribution of the soil hydraulic properties by merging prior knowledge with experimental data.
In this study, we tested the usefulness and applicability of prior information for improving the identifiability of soil hydraulic properties at the field scale.We used spatially distributed observations of soil water content in a 50 × 50 m bare soil plot exposed to natural boundary conditions.The ROSETTA computer program (Schaap et al., 2001) was used to predict the soil hydraulic parameters of the VGM model from percentages of sand silt, and clay measured at the experimental site.To appropriately consider parameter correlation, we ran ROSETTA multiple times with a set of input data randomly sampled in the vicinity of the actual measured percentages of sand silt, and clay.This Introduction

Conclusions References
Tables Figures

Back Close
Full prior information was used to formulate a joint probability density function (pdf) of the VGM parameters.A Bayesian framework was then used to combine this prior pdf with the information contained in the observational data.The resulting posterior distribution was explored by MCMC simulation using DREAM.The sample of the posterior pdf obtained this way combines prior information from ROSETTA with information contained in the observational data.This distribution contains all information about parameter uncertainty and correlation, both of which affect parameter identifiability.Our results are grouped around three central questions: (i) What is the effect of prior information on the identifiability of the VGM parameters?(ii) How much prior information is needed to get well-defined soil hydraulic parameters?(iii) Is the Bayesian approach sufficiently robust in case of a biased prior distribution?

Soil water content measurements
The measurement site was located in the lower part of a gently sloping agricultural field at Selhausen near J ülich, Germany (50 • 52 9.4 N, 6 • 27 0.5 E, FLOWatch test site).The soil was classified as a Stagnic Luvisol and has a silt loam texture.The fine earth fraction of the topsoil (0 to 30 cm) is composed of 14% sand, 70% silt, and 16% clay (percent by weight).Due to clay accumulation, the subsoil has a slightly higher percentage of clay, with 14% sand, 66% silt, and 20% clay.The soil was kept bare during the measurement campaign.Accumulation of weeds was prevented by occasional application of herbicides and manual removal.
We measured soil water content at 61 different locations distributed evenly within a 50 × 50 m plot at the experimental site.Soil water content was measured using time domain reflectometry (TDR) with two-rod probes (25 cm rod length, 2.3 cm rod spacing).The TDR probes were installed horizontally at a depth of 6 cm below the soil surface.The waveforms were recorded manually using a TDR100 device (Campbell Introduction

Conclusions References
Tables Figures

Back Close
Full Scientific, Logan, UT, USA) and analysed using the algorithm described in Heimovaara and Bouten (1990).We used the empirical relationship of Topp et al. (1980) to convert the apparent dielectric permittivity to soil water content.Measurements were taken on 29 days between 19 March and 14 October 2009, comprising a measurement campaign of 210 days.The observed soil water content data did not show any significant spatial trend and could be well described with a normal distribution at any given measurement day.Because our interest is to estimate effective soil hydraulic properties, we arithmetically averaged the soil moisture data from the different measurement locations of the 50 × 50 m plot.We used this time series of soil water content data in our analysis.

Governing flow equation
Vertical flow of water in the one-dimensional, variably saturated soil profile was described using the Richards equation: where θ (cm 3 cm −3 ) denotes the soil water content, K (cm h −1 ) represents the soil hydraulic conductivity, h (cm) signifies the pressure head, t is time (hour), and z defines the vertical coordinate (cm, positive upward).We used the HYDRUS-1D model ( Šim ůnek et al., 2008) to solve the Richards equation for given initial and boundary conditions.

Soil hydraulic properties
The water retention function, θ(h), and hydraulic conductivity function, K (h), were parameterized using the VGM model (van Genuchten, 1980).The water retention Introduction

Conclusions References
Tables Figures

Back Close
Full function, expressed in terms of effective saturation S (dimensionless), is given by: where θ r and θ s (cm 3 cm −3 ) represent the residual and saturated water content, respectively, and α (cm −1 ), n and m (both dimensionless) are shape parameters.Using the capillary model of Mualem (1976) van Genuchten derived the following closed-form equation for the hydraulic conductivity function: where K s (cm h −1 ) is the saturated hydraulic conductivity, L (dimensionless) is a shape parameter, and m = 1 − 1/n.In total, the VGM model contains six adjustable parameters: θ r , θ s , α, n, K s , and L.

Model domain
Soil water dynamics was simulated in a 100 cm vertical profile.We assume a homogeneous soil and explicitly ignore spatial variability of the hydraulic properties.This simplifies the analysis and is commensurate with the information content of the data.
The single observational depth of the soil moisture content contain insufficient information to warrant investigation of spatially varying hydraulic properties.
The model domain was discretised into 81 nonequidistant nodes.Nodal distance was shortest adjacent to the soil surface and gradually increased with depth, with a distance of 0.05 cm at the upper and 3.5 cm at the lower boundary.This spatial discretization was deemed adequate to appropriately model the large gradients in pressure head typically found close to the surface in response to atmospheric forcing.In general, if nodal spacing is too large, the numerical solution of the Richards equation becomes inaccurate due to linearization errors of the pressure head and averaging errors of the 2027 Introduction

Conclusions References
Tables Figures

Back Close
Full hydraulic conductivity (van Dam and Feddes, 2000).In the present case, a further increase in the number of nodes did not significantly alter our simulation results, from which we concluded that the spatial discretization of the soil profile was adequate.

Initial and boundary conditions
The water flux at the soil surface is controlled by potential evaporation E pot (cm h −1 ), precipitation P (cm h −1 ), and surface runoff R (cm h −1 ).The numerical solution of the Richards equation was obtained by limiting the actual water flux as follows: where h min UB and h max UB denote the minimum and maximum values of the pressure head allowed at the upper boundary, respectively.If the simulated pressure head reaches either of these two limits, the HYDRUS-1D model switches to a pressure head boundary condition to calculate the actual water flux: The limits were set to h min UB = −100000 cm and h max UB = 2 cm.This last value allows for a small amount of ponded water at the soil surface, which we occasionally observed after heavy rainfall events.No surface runoff was simulated during any of our model runs.This is also consistent with our field expertise.
Precipitation and other standard meteorological variables were observed at a meteorological station next to the measurement site.Potential evaporation was estimated using the FAO method (Allen et al., 1998).The FAO method consists of two steps.
First, the potential evapotranspiration from a grass reference surface, ET ref , is calculated using a modified Penman-Monteith equation (Allen et al., 1998, p. 74).We used air temperature, relative humidity, wind speed, incoming shortwave radiation, and atmospheric pressure as input variables.In a second step, the reference evapotranspiration is scaled with an empirical coefficient, E pot = 1.15ET ref (Allen et al., 1998, p. 263).Introduction

Conclusions References
Tables Figures

Back Close
Full This value reflects the increased evaporation potential from bare soils (as compared to the reference grass surface) due to a lower albedo of wet soils.
In the absence of detailed information about the bottom boundary of the considered soil domain, we tested different lower boundary conditions in HYDRUS-1D.From an inverse modelling point of view, a zero gradient pressure head is most appealing because it does not require explicit information about soil water state variables or fluxes at the lower boundary.Readings from a nearby piezometer suggested that the ground water table was at about 300 cm below the soil surface, that is, about 200 cm below the lower boundary of the considered soil domain.Simulations with the zero gradient boundary condition indicated that a substantial amount of water was lost from the profile due to drainage, resulting in simulated water contents that considerably underestimated the actual observed soil water content data.Repeated model runs with different realizations of the VGM parameters demonstrated that this difference was persistent and could not be explained by a wrong selection of the soil hydraulic parameters.We therefore used a prescribed constant pressure head h LB as the lower boundary condition: Unfortunately, no measurements of pressure head or soil moisture content were made at the bottom of the simulation domain from which an appropriate value of h LB could be inferred.We therefore treated h LB as an unknown parameter whose value was estimated jointly with the VGM parameters.
All our numerical simulations with HYDRUS-1D started from a uniform initial pressure head throughout the considered soil domain equal to the pressure head at the lower boundary condition h LB .A 75 day spin-up period was used to allow for the relaxation from the initial pressure head profile and to reduce sensitivity of simulation results to soil water state initialization.Introduction

Conclusions References
Tables Figures

Back Close
Full

Inverse modelling
To explicate the inverse approach utilized herein, we conveniently stored the lower boundary condition in x 1 = [h LB ] and the VGM parameters in The difference between the HYDRUS-1D predicted soil water content values, y(x 1 ,x 2 ), and respective observations, ŷ, was computed using the following residual vector: It is particularly difficult to directly interpret this N-dimensional vector of model residuals and find the appropriate parameter values that provide the best fit to the experimental data.A common approach is therefore to aggregate ε(x 1 ,x 2 , ŷ) into a single measure of model performance and, depending on its definition, minimize or maximize this criterion during model calibration.
To formulate a probabilistic measure of ε(x 1 ,x 2 , ŷ) that can be used to draw statistically meaningful inference about the parameters, we need to make some assumptions about the statistical distribution of the model residuals.This probabilistic measure is called the likelihood function p( ŷ|x 1 ,x 2 ).It gives the probability of observing the data ŷ conditional on the parameter estimates x 1 and x 2 .Under the assumption of independent, identically and normally distributed residuals, ε∼N (0,σ 2 ε ), the likelihood function can be defined as (e.g., Box and Tiao, 1992): Bayesian inference provides a formal way of combining information from observations with prior knowledge of the system.This is achieved through Bayes' theorem: where p(x 1 ) and p(x 2 ) denote the prior probabilities of the lower boundary condition and VGM parameters, respectively, and p(x 1 ,x 2 | ŷ) represent the posterior probabilities 2030 Introduction

Conclusions References
Tables Figures

Back Close
Full In addition, we can infer predictive uncertainty by propagating each realization of the posterior distribution through the HYDRUS-1D model.Now that we have available an explicit mathematical formulation of the likelihood function, we are left with a definition of the prior distribution of the various parameters.In practice, the prior distribution is often assumed to be uniform (noninformative).This includes most of our own work.Yet, the main thrust of the current paper is to show that we can do better than this and formulate informative prior distributions that are specified from soft data.We now discuss the different prior distributions considered herein.
Unfortunately, it is not easy to specify informative prior distributions for each individual parameter.For instance, we have rather limited knowledge of the pressure head at the lower boundary condition.We therefore specify a uniform prior distribution for h LB , and mathematically define this distribution as p(x 1 )∼U(a x 1 ,b x 1 ), where the variables a x 1 and b x 1 denote the lower and upper bounds of h LB , respectively.In all our calculations reported herein, we assume h LB to lie between −250 and −50 cm.In practice, p(x 1 ) needs not to be evaluated because it is uniform and can therefore be omitted from Eq. ( 9).Fortunately, it is easier to specify a prior distribution for the remaining VGM parameters.PTFs can be used to derive estimates of these parameters from basic soil data.This will be discussed later.We consider two different statistical distributions.The first is a multivariate uniform distribution, p(x 2 )∼U(a x 2 ,b x 2 ).This is our reference scenario and represents the case where no explicit prior knowledge of the soil hydraulic parameters is used.Indeed, uniform prior distributions are most often used in Bayesian studies.The second prior distribution is multivariate normal, p(x 2 )∼N (µ x 2 ,Σ x 2 ), and Introduction

Conclusions References
Tables Figures

Back Close
Full can be written as: The various prior distributions p(x 2 ) used in this study are defined in Sect.2.4.Not only does Bayesian analysis allow for explicit use of prior information about the system of interest, it also provides access to the entire probability distribution of the estimated parameters, p(x 1 ,x 2 | ŷ).Unfortunately, for nonlinear models such as HYDRUS-1D, the posterior distribution cannot be obtained by analytical means or analytical approximations.We therefore resort to iterative methods that approximate the posterior pdf by generating a large sample from this distribution.The most general of such methods is MCMC simulation (e.g., Brooks, 1998).The basis of such methods is a Markov chain that generates a random walk through the parameter space with stable frequency stemming from a fixed probability distribution.The basic building block of most MCMC schemes is the Metropolis-Hastings algorithm (Hastings, 1970).To simplify notation, we merge the two parameter vectors x 1 and x 2 into a single vector x.The Metropolis-Hastings algorithm generates a Markov chain, which has the property that the next position of the chain x i +1 only depends on its current position x i .The Markov chain is generated by alternating between two basic steps: First, a proposal x is generated by sampling from a proposal distribution q(x |x i ).The candidate point is then accepted with probability: If the proposal is accepted, the Markov chain moves to the proposal position, Otherwise, the current position is retained, To generate samples from the posterior distribution, we used the DREAM framework of Vrugt et al. (2008bVrugt et al. ( , 2009)).This MCMC scheme runs multiple chains simultaneously for global exploration of the parameter space and automatically tunes the scale and Introduction

Conclusions References
Tables Figures

Back Close
Full orientation of the proposal distribution during the evolution to the posterior target distribution.This scheme is an adaptation of the Shuffled Complex Evolution Metropolis optimization algorithm (Vrugt et al., 2003b) and has the advantage of maintaining detailed balance and ergodicity while showing excellent efficiencies on complex, highly nonlinear, and multi-modal target distributions.Jumps in each chain are created using a fixed multiple of the difference of the states of different chain pairs.The use of multiple different chains protects against premature convergence and opens up a wide array of statistical tests to diagnose whether a limiting distribution has been found.We used the most recent variant of DREAM that uses sampling from an archive of past states and a mix of parallel direction and snooker updates to generate candidate points in each individual Markov chain.This algorithm, entitled DREAM (ZS) , has several desirable advantages.Sampling from the past circumvents the requirement of using a relatively large number of chains for posterior exploration Vrugt et al. (2011).Just a few chains will suffice.This will speed up convergence to a limiting distribution, especially for highdimensional problems.Second, outlier chains do not need explicit consideration.By sampling historical states, aberrant trajectories can jump directly to the modal region at any time during the simulation.The various chains simulated with DREAM (ZS) therefore maintain detailed balance at every single step in the chain.Finally, the transition kernel defining the jumps in each of the chains does not require information about the current states of the chains.This is of great advantage in a multi-processor environment where the various candidate points can be generated simultaneously so that each chain can evolve most efficiently on a different processor Vrugt et al. (2011).To increase the diversity of jumps beyond parallel direction sampling, DREAM (ZS) includes a snooker updater with adaptive step size.The snooker axis runs through the states of two different chains, and its algorithmic implementation within the context of DE-MC has been described in ter Braak and Vrugt (2008).A detailed description of DREAM (ZS) appears in Vrugt et al. (2011) and so will not be repeated herein.We used three parallel chains to explore the parameter space and approximate the posterior distribution of the VGM parameters and the unknown pressure head at the lower boundary.The diagnostic of Introduction

Conclusions References
Tables Figures

Back Close
Full Brooks and Gelman (1998) was used to check when convergence to a limiting distribution had been achieved.After convergence was officially satisfied, we continued to run DREAM (ZS) and produce an additional 50 000 samples, which were used to summarize the posterior distribution.

Predicting the soil hydraulic parameters
The ROSETTA program (Schaap et al., 2001) implements five hierarchical PTFs to estimate the VGM parameters from a varying degree of basic soil data, such as textural class, texture, bulk density, and soil water content at specific pressure head values.We estimate the soil hydraulic parameters from measured sand, silt, and clay percentages of the topsoil layer at the experimental site.This constitutes rather basic soil data, and is readily available in most case studies.The corresponding PTF is labelled with H2-C2 (Schaap et al., 2001).This PTF actually consists of an ensemble of artificial neural network models that were each calibrated to a different data set.These data sets were generated from the ROSETTA database using the bootstrap method (Efron, 1979).The use of multiple calibration data sets provides a simple way to explicitly address uncertainty in the predicted soil hydraulic parameters.Each artificial neural network model provides slightly different estimates of the VGM parameters, and the ensemble mean p = p 1 ... p 6 and standard deviation u = u 1 ... u 6 (both size 1 × 6) constitute the outputs of ROSETTA.These two measures of central tendency and dispersion provide detailed information about the statistical properties of the predicted VGM parameters and can readily be used to specify an informative prior distribution.ROSETTA uses log 10 -transformed values of α, n, and K s .This transformation works well in practice and induces an approximate normal distribution for each of the VGM parameters (Schaap et al., 2001).A normal distribution is computationally easy to implement, and we therefore use log 10 -transformed parameter values in our prior distribution.Introduction

Conclusions References
Tables Figures

Back Close
Full The information provided by ROSETTA can readily be used to formulate an informative multivariate normal prior distribution with mean p and covariance matrix, Σ f consisting of u 2 on the diagonal entries and off-diagonal terms of zero.This approach, however, is rather simplistic in that it ignores the correlation that is typically found between the soil hydraulic parameters.Parameter correlation is evident from first-order approximations of the parameter covariance matrix (e.g., Kool and Parker, 1988), objective function contour plots (e.g., Toormann et al., 1992), or scatter plots of the posterior sample (Vrugt et al., 2003a).Given these findings, it seems productive to include this correlation of the soil hydraulic parameters in the prior pdf.If we ignore parameter correlation, we assign too high prior probabilities to physically unrealistic values of the soil hydraulic parameters.
To formulate prior pdfs that explicitly consider parameter correlation, we need detailed information about the correlation structure of the predicted soil hydraulic parameters.This information is not readily provided by ROSETTA but can be derived using the following approach: Step 1: Draw a random sample of input variables.Let the measured percentages of sand, silt, and clay be denoted by f = [14 70 16] (size 1×3).In a first step, we generated a random sample F (size 1000 × 3) from a multivariate normal distribution, N (µ f ,Σ f ), centred around the sand, silt, and clay percentages: The diagonal entries (variances) were assigned an arbitrary small value of 0.25%, which was shown to work well in practice.The negative values of the off-diagonal terms (covariances) were chosen such that (i) they are consistent with the compositional nature of soil texture, and that (ii) Σ f is positive semidefinite.Using this covariance matrix, the sum of the sampled percentages F remained in close vicinity of 100%, with error deviation of 1%, which is the error tolerance of ROSETTA.Introduction

Conclusions References
Tables Figures

Back Close
Full To help implement our prior distributions, we fitted a multivariate normal distribution through the sample of 1000 VGM parameter combinations.This closed-form mathematical distribution is easy to implement and sample from.Quantile-quantile plots and pairwise scatter plots have shown that this normal distribution adequately describes the ROSETTA derived sample of soil hydraulic parameters.
Step 3: Derive the covariance matrix of the estimated parameters.We then calculated the covariance matrix C and the matrix of Pearson correlation coefficients R (both size 6 × 6) of the parameter sample P. The correlation matrix R is defined as: This correlation matrix contains all required information to estimate the covariance matrix of the soil hydraulic parameters, Σ p .We derive this matrix by scaling R with the respective ROSETTA predicted standard deviations u: This scaling ensures that the diagonal entries of Σ p correspond to the standard deviations of the predicted parameters while the off-diagonal terms reflect the covariance among the soil hydraulic parameters.

Defining the prior distributions of soil hydraulic parameters
In this case study, we tested three different formulations of the prior distribution for Bayesian inverse modelling of in situ soil water dynamics.These prior distributions incorporate to different extends the information derived from ROSETTA predicted soil hydraulic parameters.We now discuss the three different priors.2036 Introduction

Conclusions References
Tables Figures

Back Close
Full In the first case, we assumed a multivariate uniform distribution, p(x 2 )∼U(a x 2 ,b x 2 ) with lower and upper bounds a x 2 and b x 2 , respectively, that jointly define the feasible parameter space.This type of noninformative prior distribution is quite popular because it only requires knowledge of the minimum and maximum values of the soil hydraulic parameters.We calculated the bounds as p±4u (Table 1).
Prior 2: Multivariate normal distribution without correlation among the soil hydraulic parameters.In the second case, a multivariate normal distribution, p(x 2 )∼N (µ x 2 ,Σ x 2 ), was used.In this case, the vector of mean values equals the predicted parameters, µ x 2 = p (Table 1), and the covariance matrix is diagonal, Σ x 2 = diag(u 2 1 ,...,u 2 6 ) with off-diagonal terms (covariances) set to zero.This prior neglects parameter correlation.
Prior 3: Multivariate normal distribution with correlation among the soil hydraulic parameters.The third and last prior builds on the previous distribution but explicitly considers correlation among the soil hydraulic parameters, Σ x 2 = Σ p .This prior is most informative in that it conveys the mean, standard deviation, and covariance among the predicted soil hydraulic parameters.
Previous studies (e.g., Carsel and Parrish, 1988;Vrugt et al., 2003a) have shown that the correlation structure of the soil hydraulic parameters varies considerably between the various textural classes.We therefore do not attempt to interpret the correlation matrix or correlation coefficients listed in Table 1.Instead, what we look at is how the prior distributions affect the prior uncertainty of the retention and hydraulic conductivity functions.The results of this analysis are presented in Fig. 1, which plots the 95% confidence intervals of the soil hydraulic functions for each of the three prior distributions.The largest spread of the soil hydraulic functions is observed when using a uniform prior distribution (Prior 1, red line).The hydraulic functions within this confidence interval are deemed equally likely a priori before any soil moisture data is processed.This uncertainty is unequivocally large and places a premium on the soil moisture data to constrain the VGM parameters.When using an informative prior (Prior 2, blue line), the uncertainty of the soil hydraulic functions is substantially reduced.Even though parameter correlation is ignored the VGM parameters are much better constrained.Even better results are obtained when parameter correlation is explicitly considered Introduction

Conclusions References
Tables Figures

Back Close
Full in the prior distribution (Prior 3, grey area).The confidence interval around the inflection point of the water retention function symmetrically shrinks, but the uncertainty in the wet and dry ends is hardly affected.Qualitatively similar results are found for the hydraulic conductivity function, but the uncertainty intervals are somewhat larger, particularly in the dry range.

Generation of synthetic data
To test the effectiveness and robustness of the Bayesian approach, we start with two different synthetic data sets.In this case the true values of the soil hydraulic parameter values are known and we avoid uncertainty originating from boundary condition errors and model structural inadequacies.We created a time series of observed soil water contents using the HYDRUS-1D model with h LB = −150 cm and values of the soil hydraulic parameters derived from ROSETTA.The upper boundary conditions as well as the measurement dates of the synthetic data set were the same as for the real data set.A normally distributed error was added to the simulated soil moisture contents to represent the combined effect of model structural, boundary condition, and observational data error.The magnitude of this random error was similar to the root mean square error of the best HYDRUS-1D fit to the observed in situ water content data (Sect.3.2).This artificial data set was then used with the three different prior distributions to inversely estimate the VGM parameters and h LB .
To test the robustness of our approach, we considered an alternative case in which we have a biased prior distribution of the VGM parameters.This constitutes a difficult test to establish whether our inverse modelling approach can correct the corrupted initial information and infer the appropriate values of the respective soil hydraulic parameters.There are essentially two ways in which we can generate a biased prior distribution.We can maintain the synthetic time series of soil water content observations and corrupt the three different prior pdfs.A simpler approach followed herein is to leave the prior pdfs untouched, but to create a new synthetic time series of soil moisture observations with soil hydraulic parameters that differ substantially from their 2038 Introduction

Conclusions References
Tables Figures

Back Close
Full values derived with ROSETTA and used to create the prior pdfs.The VGM parameters of this run were selected by drawing a large sample from Prior 3 and then purposely picking a realization with very low prior probability.The synthetic data generated in this way is inconsistent with the information contained in the three prior distributions.
We then estimated the VGM parameters using these three biased prior distributions.
Upper boundary conditions and h LB were the same as before.

Synthetic data
The results for the synthetic soil water content data with an unbiased prior are depicted in Fig. 2.This plot shows the cumulative probability function of the estimated parameters corresponding to each of the three prior distributions.When using a uniform prior (Prior 1, Fig. 2a), the water retention parameters θ r , θ s , α, and n were not identifiable, with posterior distributions that extend over the entire prior defined ranges.The posterior distribution of the two additional soil hydraulic parameters K s and L on the contrary differed markedly from their prior distribution.Seemingly, the observational data contained sufficient information to constrain these two parameters.Unfortunately, the pressure head at the lower boundary (h LB ) was not warranted by calibration against the synthetic soil moisture data.Very similar findings were observed with an informative prior that does not explicitly consider correlation among the VGM parameters (Prior 2, Fig. 2b).Note that h LB was somewhat better constrained but still demonstrates considerable uncertainty.This is a nice illustration of parameter interaction.Even though the prior of h LB was the same for the three different prior distributions tested, this parameter became better identifiable if the VGM parameters were more constrained in their prior pdf.An informative prior distribution with explicit consideration of the parameter correlation (Prior 3, Fig. 2c) substantially improved the results.All the soil hydraulic parameters suddenly became well identified by calibration against the in situ soil water Introduction

Conclusions References
Tables Figures

Back Close
Full content measurements.Note that the modes of the posterior distribution did not necessarily coincide with the actual parameter values used to generate the synthetic data.This is a direct consequence of the random error used to corrupt the synthetic data.Nevertheless, the true values of the VGM parameters and h LB remained within the 95% confidence interval of the posterior distribution.Altogether these results advocate the use of prior information in the fitting of field scale soil hydraulic functions.Indeed, whereas field scale measurements of moisture dynamics contained insufficient information to warrant a reliable identification of the soil hydraulic functions, prior information derived from PTFs considerably reduced the uncertainty of the retention and hydraulic conductivity functions.This finding is rather exciting, but it remains to be investigated how robust these conclusions are.For instance, would we obtain similar results if our prior distribution is corrupt and inconsistent with the actual information content of the soil moisture data?We therefore repeated the analysis but now with a biased prior distribution.
The results for the biased priors are shown in Fig. 3. Perhaps rather surprisingly, but the results were qualitatively similar to those observed previously for the unbiased prior.Prior distributions that did not explicitly consider correlation among the soil hydraulic parameters performed rather poorly.Indeed, Prior 1 and Prior 2 did a rather poor job in constraining the VGM parameters and did not yield reasonable estimates of the water retention parameters.On the contrary, Prior 3 significantly improved the results and illustrates a reliable and robust identification of all the soil hydraulic parameters.This is illustrated in more detail in of soil moisture dynamics.This finding inspires confidence in the effectiveness and robustness of the Bayesian methodology.The likelihood function considered herein was powerful enough to correct the biased prior information and converged upon the appropriate values of the VGM parameters.Note that this required explicit knowledge of the correlation structure of the soil hydraulic parameters.

Real data
We now present the results for the actual soil moisture data set collected in the field.
In this case, we do not know the true values of the soil hydraulic parameters, and we resort to our Bayesian inference scheme to provide posterior pdfs of the VGM parameters and h LB .The results of this analysis are illustrated in Fig. 5.We draw very similar conclusions.Again, the use of a uniform prior (Prior 1, Fig. 5a) and multivariate normal prior without correlation among the VGM parameters (Prior 2, Fig. 5b) did not warrant a reliable identification of the soil hydraulic functions.Explicit consideration of parameter correlation in the prior pdf (Prior 3, Fig. 5c) substantially improved the results.To verify whether the prior distribution of the parameters is consist with the information from the soil moisture data, please consider Fig. 6 which compares the 95% confidence intervals of the prior distribution with the respective ellipses derived from the posterior samples.The posterior samples are plotted separately with grey dots.Note that h LB is drawn from a uniform distribution, and we therefore do not plot prior confidence intervals for this particular parameter.This figure highlights several important observations.In the first place, notice that the prior uncertainty was generally larger than the posterior uncertainty.The 95% confidence ellipses of the prior distribution are much larger than their respective counterparts of the posterior distribution.In the second place, notice that the prior distribution was unbiased and consistent with the posterior samples.The VGM parameters predicted with ROSETTA matched closely with the soil hydraulic properties observed in the field.Finally, notice that the prior and posterior distributions exhibited a very similar correlation structure, particularly for the highly correlated parameters.

HESSD Introduction Conclusions References
Tables Figures

Back Close
Full Summary statistics of the prior and posterior distributions are listed in Table 2.We also included the best values of the parameters by locating the point in the MCMC sample for which the posterior density was maximized.This solution is in close agreement with the mean values of the VGM parameters separately derived with ROSETTA and used to create the three different prior distributions.This again illustrates that our prior distribution was relatively unbiased, a testament to the ability of ROSETTA to generate accurate predictions of the soil hydraulic functions from basic soil data.Yet, an unbiased prior is not a requirement for the statistical inference to work well in practice.As demonstrated earlier, the inference is sufficiently robust against biased prior information as long as the correlation structure of the VGM is appropriately represented.Note also that the posterior uncertainty of h LB is quite large.In part, this may also reflect the uncertainty in the underlying assumption of a constant pressure head at the lower boundary.
Our analysis thus far has focused on analysing the prior and posterior distribution without recourse to considering the actual HYDRUS-1D model predictions and soil moisture data.Figure 7 compares the observed and simulated soil moisture dynamics.The dark grey region represents the 95% prediction uncertainty intervals associated with the posterior parameter uncertainty, whereas the light grey region depicts the total predictive uncertainty.Details on how to compute these uncertainty intervals can be found in Schoups and Vrugt (2010) and so will not be repeated herein.The in situ observations are denoted with circles.For completeness, the top panel plots the observed rainfall hyetograph (blue) and potential evaporation (red).The HYDRUS-1D model matched the data quite well, with a corresponding root mean square error of 0.009 and model efficiency (Nash and Sutcliffe, 1970)  from these data points.Figure 8 plots the 95% prediction uncertainty ranges of the soil hydraulic functions corresponding to the prior and posterior distributions.We also plotted the observed soil water content data.Note that the in situ data exhibit relatively small variability and cover only a limited range of soil water states.Seemingly, the naturally occurring boundary conditions display insufficient variability to make the soil water content at this observational depth cover the entire water retention function and warrant a reliable identification of the soil hydraulic functions.This explains, at least in part, why in situ observations of soil water dynamics contain insufficient information to reliably estimate all VGM parameters.Indeed, prior information was needed to help constrain the posterior distribution of the soil hydraulic properties.
The likelihood function used herein was based on assumptions of uncorrelated, homoscedastic, and normally distributed residuals.If any of these assumptions has been violated then our parameter distributions and corresponding HYDRUS-1D model predictions are subject to considerable error and should be revisited.Good statistical practice therefore constitutes checking whether the underlying assumptions of the likelihood function have been met (e.g., Schoups and Vrugt, 2010).Three different diagnostic tests have been conducted, and the results of this are plotted in Fig. 9.Each panel considers a different statistical test.The top panel measures the correlation among the residuals by plotting the autocorrelation function.The blue lines denote theoretical upper and lower 95% significance intervals of a time series of white residuals (no correlation).The circles (partial autocorrelation at given lag) remain within the interval of the blue lines, and we therefore conclude that the residuals are uncorrelated.The first assumption was met.The middle panel tests whether the size of the residual depends on the magnitude of the soil moisture observation.There is no apparent statistical trend.The residuals appear homoscedastic and independent on the magnitude of the calibration data.Assumption two was also met.Finally, the bottom panel presents a quantile-quantile plot and explores whether the residuals follow a normal distribution (blue line).The circles (percentiles) nicely coincide with the blue line.We Introduction

Conclusions References
Tables Figures

Back Close
Full therefore conclude that all assumptions of the likelihood function were met and that the parameter distributions and predictive uncertainty estimates are adequate.

Discussion
The results presented in this paper indicate that in situ observations of soil water content in bare soil under natural boundary conditions do not contain sufficient information to warrant a reliable estimation of the soil hydraulic parameters.This finding is not new but has been reported in many other studies.Naturally occurring boundary conditions display insufficient variability to results in a wide range of soil moisture states, a prerequisite to successfully estimating the VGM parameters (Vrugt et al., 2001(Vrugt et al., , 2002)).
The information for some of the soil hydraulic parameters simply appears beyond the range of the actual soil moisture observations, and these parameters are therefore insensitive and very difficult to constrain.There is different ways in which the information content of our data could have been increased.The simplest approach is to increase measurement frequency and measure directly after rainfall events (Fig. 7).This would have resulted in a larger range of observed soil water states and likely reduced the uncertainty of θ s and K s (Fig. 8).The presence of vegetation, and hence uptake of water by plant roots, would have extended the range of observed soil water states to the dry end.This gain of information, however, comes at a cost.The simulation of root water uptake as a function of time and depth requires specification of additional root water uptake parameters, which would need to be estimated simultaneously with all the other parameters, introducing additional parameter and model uncertainty (e.g., Ines and Mohanty, 2008;Wollschl äger et al., 2009).As a consequence of the limited amount of information contained in the observational data, estimates of the soil hydraulic parameters were subject to substantial uncertainty (Figs. 2 and 5).Our results indicate that this seems to be especially true for the retention parameters.To better understand these findings, please refer to the results presented in Vrugt et al. (2001Vrugt et al. ( , 2002)).They used numerical experiments and Introduction

Conclusions References
Tables Figures

Back Close
Full recursive MCMC simulation to explain why most studies demonstrate problems with soil hydraulic parameter inference.Vrugt et al. conclusively demonstrate that a wide range of soil moisture states is required to reliably constrain the soil hydraulic functions.For instance, they found that most information on θ s and θ r is contained in the wet and dry range of the water retention curve, respectively.In addition, Vrugt et al. found that most information on α is contained in soil water content observations just beyond the air entry value of the soil, and that most information on n is contained in the observation well beyond the inflection point of the water retention function.Obviously, the data set considered herein covered a rather limited range of soil moisture values (Fig. 8), which explains why a uniform prior distribution (Prior 1) or multivariate normal prior that neglects correlation (Prior 2) are not strong enough to reduce parameter uncertainty.The reason why Prior 3 works well in practice is because parameter correlation among the VGM parameters exerts a strong influence on the possible soil hydraulic functions.The parameters L and K s that define the hydraulic conductivity function were somewhat better identified than the retention parameters (Figs. 2 and 5).Seemingly, the observed soil water content data, albeit covering a small range of soil water states, still contained valuable information about L and K s .This reinforces our argument that the information content of the data should dictate which parameters are being fixed, rather than common practice.Indeed, L is often fixed at some nominal value because it is deemed unimportant or insensitive (e.g., Abbaspour et al., 2000;Wollschl äger et al., 2009).
The usefulness of the Bayesian approach considered herein essentially relies on the availability of information about the correlation among the soil hydraulic parameters.This correlation structure differs between soils, and we therefore need general approaches that can help build such informative prior distributions.In this study, we used the ROSETTA PTF to obtain prior information about the VGM parameters and their correlation structure.PTFs are easy to use in practice and require only limited soil data that is readily available in most studies.Mertens et al. (2004) used a more elaborated approach and created prior distributions by combing information from laboratory and Introduction

Conclusions References
Tables Figures

Back Close
Full field experiments.This approach works well in practice but requires more experimentation, resources, and time.PFTs therefore exhibit desirable advantages.A recent review by Vereecken et al. (2010) discusses the strengths and limitations of existing PTFs to make accurate and reliable predictions of the parameters in the VGM model.They make also suggestions on how to improve the predictive capabilities of future PTFs.We like to add to their suggestions to include information on parameter correlation in the output of future PTFs.This information is useful in Bayesian inverse modelling, as shown in this study, but also in other applications such as stochastic modelling of soil hydraulic properties and soil water flow (e.g., Mishra and Parker, 1989;Mallants et al., 1996).

Summary and conclusions
Many contributions to the soil hydrological literature have demonstrated the limited information content of in situ measurements of soil water state variables to estimate the soil hydraulic properties.One approach that helps to improve the inverse identifiability of the soil hydraulic parameters and reduce the uncertainty in the soil hydraulic functions is to use observations of different soil water state variables.This approach works in practice but places a heavier demand on measurement resources.Another approach that has yet received very little attention is to formulate an informative prior distribution of the soil hydraulic parameters and combine this distribution with the in situ observations using Bayes' theorem.The prior distribution reflects knowledge of the soil hydraulic parameters before any measurements of the soil water state have been taken and processed.
In this paper, we tested whether prior information on the soil hydraulic parameters can help improve the identifiability of the soil hydraulic parameters in inverse modelling of in situ soil water dynamics.Basic soil data (percentages of sand, silt, and clay) was used to formulate three different prior distributions using ROSETTA.This computer program uses an ensemble of PTFs to predict the VGM parameters from different Introduction

Conclusions References
Tables Figures

Back Close
Full hierarchical levels of input data.We purposely used soil textural information because this information is readily available in most vadose zone studies.We illustrated our approach using synthetic and real-world observations of soil water content.The results demonstrated that prior information on the VGM parameters significantly reduced the uncertainty in the soil hydraulic functions during Bayesian inverse modelling of in situ soil water dynamics.Analysis with synthetic data elucidated that this approach was effective and robust, even in the case of a biased prior distribution.To be effective and robust, however, we needed an informative prior distribution that explicitly incorporated knowledge of the correlation structure of the soil hydraulic parameters.The Bayesian framework presented in this study can be applied to a wide range of vadose zone studies and provides a blueprint for the use of prior information in inverse modelling of in situ soil water dynamics at various spatial scales.Introduction

Conclusions References
Tables Figures

Back Close
Full  Full  Full soil water dynamics under transient conditions typically requires knowledge of the soil hydraulic properties, that is, the water retention function and the hydraulic conductivity function.A broad array of methods exists to determine these 2020 Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | information derived from laboratory and field experiments improves the identifiability of the soil hydraulic properties.Indeed, explicit use of prior knowledge of the retention and hydraulic conductivity functions improved the compliance of model predictions and data.W öhling et al. (2008) derived soil hydraulic properties of a three-layer soil profile using pressure head observations from 9 tensiometers installed at 3 different depths.They compared the efficiency of three multiobjective search algorithms in finding Pareto solutions of soil hydraulic parameters that characterize the trade-off in the fitting of pressure head data at different depths.Steenpass et al. (2011) estimated soil hydraulic properties of a two-layer profile from observed soil surface temperature Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | after assimilating the observational data.This posterior distribution summarizes what we know about the parameters.It can be used to derive descriptive statistics such as measures of central tendency, dispersion, or association of the estimated parameters.
Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Step 2: Predict the soil hydraulic parameters.In a second step, we predicted the soil hydraulic parameters P (size 1000 × 6) for each realization of the input variables: ROSETTA : F → P (13) Discussion Paper | Discussion Paper | Discussion Paper | Prior 1: Multivariate uniform distribution.
Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Fig. 4, which presents pairwise scatter plots of the sampled posterior parameter values and associated 95% confidence ellipses of the prior (blue line) and posterior (red line) distribution.These ellipses were calculated based on the assumption of a bivariate normal distribution of the respective parameters.The true parameter values are indicated in each individual plot using the square symbol.Although the prior distribution of the VGM parameters was corrupt and assigns very low probability to the true soil hydraulic properties, the 95% posterior confidence ellipses encompassed the actual parameter values used to generate the synthetic time series Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | of 0.87.The soil moisture observations at day of year 190 and 238 fall outside the prediction uncertainty bounds and cannot be fitted with the HYDRUS-1D model.Unfortunately, it is not particularly clear what exactly caused this discrepancy.This mismatch can be attributed to potential errors in the boundary condition, model structural inadequacies, or measurement errors.Additional research is needed to explain the deviation of the model prediction Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Fig. 1 .Fig. 2 .Fig. 5 .Fig. 9 .
Fig. 1. 95% prior uncertainty bounds of (a) the water retention function and (b) the hydraulic conductivity function corresponding to the three prior probability distributions of the soil hydraulic parameters: multivariate uniform distribution (Prior 1), multivariate normal distribution without correlation among the soil hydraulic parameters (Prior 2), and multivariate normal distribution with correlation among the soil hydraulic parameters (Prior 3).

Table 1 .
(Schaap et al., 2001) deviations, and Pearson correlation coefficients of the ROSETTA(Schaap et al., 2001)predicted soil hydraulic parameters used to define the three different prior distributions.In addition, the lower and upper bounds of the soil hydraulic parameters are listed.

Table 2 .
Summary statistics of the prior and posterior distributions using real soil water content data and the prior distribution that accounts for correlation among the soil hydraulic parameters (Prior 3).Instead of the mean values, the parameters with maximum posterior density (MPD) are given for the posterior distribution.In addition, the lower and upper bounds of the parameters are listed.