Inverse modeling of hydrologic parameters using surface flux and runoff observations in the Community Land Model

This study demonstrates the possibility of inverting hydrologic parameters using surface flux and runoff observations in version 4 of the Community Land Model (CLM4). Previous studies showed that surface flux and runoff calculations are sensitive to major hydrologic parameters in CLM4 over different watersheds, and illustrated the necessity and possibility of parameter calibration. Both deterministic least-square fitting and stochastic Markov-chain Monte Carlo (MCMC)-Bayesian inversion approaches are evaluated by applying them to CLM4 at selected sites with different climate and soil conditions. The unknowns to be estimated include surface and subsurface runoff generation parameters and vadose zone soil water parameters. We find that using model parameters calibrated by the samplingbased stochastic inversion approaches provides significant improvements in the model simulations compared to using default CLM4 parameter values, and that as more information comes in, the predictive intervals (ranges of posterior distributions) of the calibrated parameters become narrower. In general, parameters that are identified to be significant through sensitivity analyses and statistical tests are better calibrated than those with weak or nonlinear impacts on flux or runoff observations. Temporal resolution of observations has larger impacts on the results of inverse modeling using heat flux data than runoff data. Soil and vegetation cover have important impacts on parameter sensitivities, leading to different patterns of posterior distributions of parameters at different sites. Overall, the MCMC-Bayesian inversion approach effectively and reliably improves the simulation of CLM under different climates and environmental conditions. Bayesian model averaging of the posterior estimates with different reference acceptance probabilities can smooth the posterior distribution and provide more reliable parameter estimates, but at the expense of wider uncertainty bounds.


Introduction
Inverse problems (or parameter calibrations/optimizations) involve a general framework to derive information about a physical object or system from measurements directly or indirectly related to the physical object/system (Tarantola, 2005).During the past decades, numerous inversion strategies, deterministic or stochastic, have been developed and applied in earth systems science including atmospheric science, hydrology, geology, and geophysics.Three conditions (existence, uniqueness, stability of the solutions) are necessary for a well-posed inverse problem (Hadamard, 1902).However, as solution uniqueness and stability are usually violated in practice, some regularization is generally needed to introduce mild assumptions on the solution and prevent parametric over-fitting.It is also important for an inverse approach to be capable of quantifying and evaluating the prediction uncertainty, particularly for complex systems, where unknown parameters and observable variables have nonlinear and nonunique relationships, observations are limited, or the forward models are not perfect.
For a given inverse problem, one can choose different approaches depending on the requirements of parameter estimation accuracy, computational demand, and importance of prediction uncertainty (e.g., Hou et al., 2006;Chen et al., 2004;Hoversten et al., 2006), which requires understanding of the advantages, disadvantages, and applicability of each method.Deterministic approaches have been used to obtain "optimal" parameter sets by evaluating the goodness of fit between observed and model simulated response variables (e.g., Sorooshian, 1981;Sorooshian and Dracup, 1980;Duan et al., 1992Duan et al., , 1993;;Sorooshian et al., 1993;Hoversten et al., 2006).These approaches generally assume that an optimal parameter set exists and implicitly ignore the estimation of predictive uncertainties.However, a single optimal parameter set may not exist and the uncertainties associated with the optimal parameter sets could be large (e.g., Gupta et al., 1998;Klepper et al., 1991;Van Straten and Keesman, 1991;Beven and Binley, 1992;Yapo et al., 1996).Moreover, a model with the optimal parameter set may provide the best fit over the calibration period, but multiple parameter sets may result in comparable misfits and are likely to be the "true" values with certain probability, and therefore are considered to be acceptable or equally probable parameter sets (Van Straten and Keesman, 1991;Klepper et al., 1991).Stochastic approaches can address these limitations by describing the input parameter and output uncertainties in a statistical manner.Generally, the input parameter space is represented in a form of multivariate probability distributions of input parameters and must be sampled to generate multiple realizations of the model simulations so that the prediction range can be estimated based on the ensemble model simulations (Beven and Binley, 1992;Freer et al., 1996;Kuczera and Parent, 1998;Vrugt et al., 2003b).When multiple types of data are available, multi-objective calibration can be used to deal with parameter estimation uncertainty by combining several measures of performance for data fusion (Boyle et al., 2000;Kollat et al., 2012;Gupta et al., 1998;Vrugt et al., 2003a).In practice, different optimization methods can also be combined to improve the treatment of uncertainty in hydrologic modeling (Vrugt and Robinson, 2007;Feyen et al., 2007;Vrugt et al., 2005).
Uncertainty in model simulations can stem from model parameters, as well as from model inputs (e.g., atmospheric forcing and land cover data) and model structure (e.g., the physical parameterizations used to describe certain processes or the numerical solvers).Uncertainty in model input data or model structure can be reduced by improving the observation precision and taking advantage of multiple observational platforms with different error characteristics in space and time, or better understanding of the physical processes.Much progress has been made by the land surface modeling community in the last two decades to deal with uncertainty in model parameters, data, and model structure.The focus of this study is uncertainty in model parameters.
In previous studies (Hou et al., 2012;Huang et al., 2013), we investigated the sensitivity of surface fluxes and runoff simulations to major hydrologic parameters in version 4 of the Community Land Model (CLM4) by integrating CLM4 with a stochastic exploratory sensitivity analysis framework at 13 flux towers from the Ameriflux net-work and 20 watersheds from the Model Parameter Estimation Experiment (MOPEX) (Huang et al., 2013) spanning a wide range of climate, landscape, and soil conditions.We found that the CLM4-simulated latent heat flux (LH), sensible heat flux (SH), and runoff show the largest sensitivity to subsurface runoff generation parameters.These studies demonstrated the necessity and possibility of parameter inversion/calibration using available measurements of surface fluxes and streamflow to invert the optimal parameter set, and provided guidance on reduction of parameter set dimensionality and parameter calibration framework design for CLM4.
This study aims to demonstrate the inversion methodology at selected sites based on the global sensitivity analyses detailed in Hou et al. (2012) and Huang et al. (2013).Among various inversion approaches, we adopt a stochastic Bayesian inversion approach integrated with Markov-chain Monte Carlo (MCMC) sampling.The unknowns to be estimated include model parameters for surface and subsurface runoff generation and vadose zone soil water movement.Different options for the inversion framework are evaluated at the selected sites.For example, it is important to evaluate the prior incompatibility issues in an inversion design (Hou and Rubin, 2005).As detailed in the remaining sections of this paper, we evaluated the impacts of prior information (e.g., initial guesses) on the inversion results.We also compared the consistency and reliability of inversion using both monthly and daily flux observations, and compared the performance of Bayesian model averaging to the individual inversion approaches for parameter estimation.We also discuss the issues related to data quality, data worth, and redundancy.

Site and data description
We perform parameter estimation at two flux tower sites and one MOPEX basin.The MOPEX basin is in close proximity to one of the flux tower sites to investigate model inversion using heat flux versus runoff data.These sites/basins are chosen based on previous sensitivity analyses over a larger set of flux tower sites and MOPEX basins (Hou et al., 2012;Huang et al., 2013) that demonstrated the feasibility and necessity of parameter calibration at those locations and to provide representative and contrasting climate and environmental conditions within the United States for more robust conclusions.US-ARM is located in Oklahoma and is covered by croplands (Allison et al., 2005;Baer et al., 2002;Fischer et al., 2007;Riley et al., 2009;Suyker and Verma, 2009).US-MOz is located in Missouri and is covered by deciduous broadleaf (Gu et al., 2012(Gu et al., , 2006)).Meteorological forcing, site information such as soil texture, vegetation cover, and satellitederived phenology, as well as validation data sets, such as water and energy fluxes, are provided by the North American Carbon Program (NACP) site synthesis team.The site information is provided in Table 2 of Hou et al. (2012).
MOPEX basin (07147800) is the Walnut River basin in Kansas with a drainage area of 4869 m 2 , which is dominated by silty clay loam soil and covered by about 56 % C3 grass, 22 % C4 grass, and 20 % croplands according to the MODIS based land cover map in Ke et al. (2012).The meteorological forcing for the MOPEX basin was extracted from the phase two North America Land Data Assimilation System (NLDAS2) forcing at an hourly time step from 1979 to 2007 (Xia et al., 2012), including precipitation, shortwave and longwave radiation, air temperature, humidity and wind speed at a 1/8th degree resolution derived from the 32 km resolution 3 h North American Regional Reanalysis (NARR) following the algorithms detailed in Cosgrove et al. (2003).An area-average algorithm was then applied to the NLDAS2 forcing as inputs to CLM by treating the entire basin as a single computational unit.CLM was spun up by cycling the forcing at each site for at least five times until all the state variables reached equilibrium before any statistics were calibrated for inversion, based on the methodology described below.

Parameterization
Observational data used in parameter estimation are observed latent heat fluxes and runoff measurements, which are processed and gap-filled to obtain daily and monthly averaged data.For unknowns, we focus on the parameters that are the most identifiable from the response variables (i.e., they have significant, straightforward, and distinguishable influences on hydrologic processes, including soil hydrology and runoff generation processes).The 10 hydrologic parameters that we found to have impacts on the simulations of surface and subsurface runoff, latent and sensible heat fluxes, and soil moisture in CLM4 are f max , C s , f over , f drai , Q dm , S y , b, s , K s , and θ s .Explanations of the 10 parameters and their prior information are shown in Table 1.Some vegetation parameters, such as leaf maximum carboxylation rate and the slope of the stomatal conductance, are important parameters that govern plant transpiration and modulate latent heat flux and soil moisture.In this study, we focus on the hydrological parameters that are more directly related to runoff and soil moisture, and the latter has important control over latent heat flux, and set the vegetation parameters using default values.Soil texture also affects soil evaporation and fluxes.In this study, the prior distributions for porosity, permeability, specific yield, Clapp and Hornberger exponent, and saturated soil matrix potential parameters are determined from soil texture information based on Cosby et al. (1984).

Parameter calibration using least-square fitting approaches
Different approaches can be used to calibrate the selected hydrologic parameters using available observations such as runoff and surface fluxes data.The method of least squares is a standard approach to approximate the solution of overdetermined systems, i.e., systems specified by more equations than unknowns.One well-known approach utilizing the least-square fitting concept is Parameter ESTimation (PEST) (Doherty, 2008), which is a general-purpose, modelindependent, parameter estimation and model predictive uncertainty analysis package.Here we adopt PEST to perform single-objective least-square fitting of the observational data, with the loss function defined as the sum square of fitted residuals between calculated and observed runoff or latent heat fluxes.
There is a set of tuning parameters in the PEST inverse setup.We used the widely adopted default settings as given in the user guide.However, simulations of heat flux and runoff using the calibrated parameters show only small improvements compared to simulations using the default parameter values.PEST, using the least-square fitting approach, works well for more linear systems with monotonic relationships between unknown parameters and observable variables and weak interactions between the unknowns.For a complicated system, it is very likely that a single optimum set of parameter values does not exist and the solutions may converge to local minima.In this case, a probabilistic description of all possible solutions is more reasonable by assigning proper weights (i.e., likelihoods) to all possible solutions of parameter sets.In order to further explore the usefulness of the leastsquare fitting approach, the PEST-calibrated parameter estimates are used as one of the choices for initial guesses of the parameters for stochastic Bayesian updating as explained below to determine if such estimates may improve the convergence rate or robustness of the inversion results.

Bayesian updating
Stochastic inversion/calibration approaches (e.g., Bayesian inference) can be used to describe the input/output uncertainties in a probabilistic manner.Bayesian inference derives the posterior probability as a consequence of two antecedents: a prior probability given prior information, and a "likelihood function" derived from a probability model for the data to be observed.Bayesian inference computes the posterior probability according to Bayes' rule: where f (m|d, I ) represents the posterior pdf (probability density function) of parameter m, f (d|m, I ) denotes the likelihood of observing d given parameter m, and f (m|I ) is the prior pdf of m given prior available information I .* Reproduced with permission from American Geophysical Union.
We assume that the forward models (e.g., CLM4) are being characterized by d * ij = g ij (m) + ε ij , where m represents the vector of model parameters, d ij = g ij ( ) is the forward model, ε ij is the difference (i.e., residual) between the model d ij and observation d * ij , i = 1,. . .,K; j = 1,. . .,N i , K being the number of data types and N i being the number of observations for the ith data type.The residuals include both measurement errors in observations (e.g., heat fluxes) and modeling errors due to model parameters.As discussed above, model errors due to forcing data and model structural uncertainty are not addressed in this study.However, we used forcing data developed and adopted by the community (NACP and NLDAS2) that reduce some uncertainty by using multiple observational data sets with adjustments to account for local factors such as topography.
With the underlying assumptions that ε ij is normally distributed with variance σ 2 ij , and the distributions are independent, the likelihood function can be represented as (Hou et al., 2006) The posterior distributions of the input parameters are the products of the priors and the likelihood functions.As more information comes in, Bayes' rule can be applied iteratively, that is, after observing some evidence, the resulting posterior probability can be treated as a prior probability, and a new posterior probability is computed from the new evidence.

Sampling methods
An efficient sampling approach is important for the success of Bayesian inversion, especially when the forward modeling computational demand is high, the parameter dimensionality is high, or the parameters are weakly identifiable.In this study, the Metropolis-Hasting sampling method is used to draw samples from the joint posterior distribution functions (Hastings, 1970;Hou, 2008;Metropolis et al., 1953).The procedure is as follows.
1. Initialize a random vector m from the prior distributions m (0) i , i = 1, . .., p, where p is the number of parameters.

Generate a random variable m *
i , i = 1, . .., p from the proposal distributions, and calculate the following ratio (note the probabilities in the formula are calculated using Eq.2): p ) prob(m where p ra is reference acceptance probability. 3. Generate a random value u uniformly from interval (0,1).
Repeating steps (b) to (d) by replacing index (k) with index (k + 1), we can obtain many samples as follows: , where n is the number of sample sets.From the procedure, we can see that the value m (k) i only depends on the current state of m, but not the previous states; therefore, these samples form a Markov Chain.The summary statistics of the posterior samples during the burn-in period (e.g., the last 400 sample sets) can be used to check the convergence with a tolerance relative error (e.g., 0.001), and posterior distributions can be derived from the samples.

Choices of initial values (default, mean, PEST estimates)
Initial values have little impact on precision and accuracy of a robust optimization algorithm, but affect the convergence speed.In order to increase the efficiency of inverse modeling, we compared choices of initial values including the default parameter values in CLM4, the mean values between the prior bounds, as well as the PEST estimates calibrated using observational data at each study site.The default and mean values are based on prior information in Table 1.Tests show that the convergence speed of the MCMC-Bayesian calibration is not affected by the choices of initial values, although the PEST estimates slightly reduce the discrepancies between calculated and observed responses compared to using the default parameter values for CLM4 model calculations.

Choices of proposal distribution widths (automatic versus multi-chain comparison)
The convergence speed (i.e., the number of steps needed to obtain consistent posterior statistics of the unknown parameters) and accuracy of parameter estimation is associated with several tuning parameters of the inversion setup.Proposal distribution width, an important tuning parameter of the MCMC algorithm, controls the searching precision and speed.We compared different widths, i.e., 1/2, 1/4, 1/8, and 1/16, relative to the prior bounds of the unknown parameters.The widths correspond to the 99 % confidence interval of Gaussian proposal distributions.Optimized convergence speed and stable results can be achieved with a proposal width of 1/8 relative to the prior bounds.To compare the performances of different tuning parameters, and to reduce the time of convergence, we take advantage of high performance computing resources to conduct multi-chain calibrations simultaneously.

Reference acceptance probability
In this study, we also evaluated the effects of using different acceptance probabilities for newly generated parameter samples in the MCMC process.We named the criterion as reference acceptance probability, a trade-off between convergence speed and accuracy of parameter estimates.If the acceptance probability of new sample sets is greater than the product of reference acceptance probability and the acceptance probability of prior sample, we accept the new sample.Specifically, four reference acceptance probabilities, i.e., 1.0, 0.95, 0.9, 0.5, are adopted to set the multi-model for inverse modeling of CLM.Probabilistic model averaging can then be used to integrate the different sets of parameter predictions, where the weights are not determined arbitrarily, but rather based on the posterior probabilities of all the models sharing the same unknowns and observational data in the inverse setup.The integrated results might have larger spread but the estimates are generally more unbiased.

Sensitivity to data frequency, observation type, and site conditions
Daily and monthly data (runoff and heat flux) are both used in parameter calibration, and the performances are compared to study the impact of data temporal resolution on parameter inference.Compared to monthly data, daily data include higher-frequency characteristics of temporal variations in the observations; however, adjacent observations tend to have more information redundancy and data quality is an issue.
A stochastic inversion approach usually provides parameter calibrations with lower uncertainty (more "precise") as more and more data are used, but measurement errors could lead to over-fitting of errors and therefore biased estimates.Through comparison, we can identify the most appropriate observing timescale for calibration to improve CLM4 simulations.Different model responses associated with different components of CLM have different sensitivities to the unknown parameters.We conduct MCMC-Bayesian inversion at the US-ARM site and at the nearby MOPEX basin (basin ID: 07147800).Although a basin is usually not homogeneous in climate and land cover, the two sites are the closest in geographical proximity among all the Ameriflux first priority and MOPEX sites, and they have similar climate and land surface conditions.By comparing the inversion results using observed heat flux and runoff, we can evaluate the impacts of using different types of observations on parameter inference in CLM.
Calibrations are also done at flux tower sites (US-ARM vs. US-MOz) with different soil and vegetation conditions to study the impacts of land surface conditions on parameter

798
Figure 1.Posterior distribution of the parameters with four different reference acceptance 799 probabilities (p ra ) using monthly heat flux data at the US-ARM site.800 parameters are wide and non-informative as expected.This demonstrates the necessity of performing parameter screening such that the inverse problems can be less ill-posed.As shown in Eq. (3), a low reference acceptance probability p ra means that the rejection standard and searching ranges are relaxed.As more potential estimates are identified and accepted, the bounds of posterior distributions increase, and multi-modal behaviors occur, especially for θ s .The posterior means/modes of the estimated parameters shifted farther from or closer to the prior means, particularly for f over and θ s .
Figure 2 shows the simulated monthly mean heat fluxes using posterior estimates of parameters with four reference acceptance probabilities using monthly observations at the  US-ARM site.The black line shows the monthly mean heat fluxes obtained from observations of the 2003-2006 period, and the red line shows the calculated monthly mean heat fluxes using default parameters based on prior information.Simulations using default parameter values overestimate the heat fluxes in summer (from May to September), and underestimate in winter at the US-ARM site.When the reference probabilities are 1.0, 0.95, and 0.9, the posterior estimates of parameters all significantly improve the heat flux simulation in summer, and the posterior model probabilities (i.e., the product of Gaussian probabilities of misfits between calculated and observed responses) are 0.730, 0.730, and 0.728 respectively, which are greater than 0.636 for the default parameter values.Note that the Gaussian probability of the default or a posterior parameter set is calculated using Eq. ( 2).The estimates with reference acceptance probability of 0.5 noticeably deviate from other inversion estimates, and tend to result in underestimates of simulated heat fluxes in summer.None of the parameter estimates is able to yield much better fits during winter, which might be due to errors in the observed heat fluxes, errors in the CLM forcing data, and/or under-representation of the complicated physical processes using the current parameterization schemes.
Similarly, inversion was also performed using monthly heat flux data from the 2004-2007 period at the US-MOz site with different soil and vegetation cover from US-ARM site.Figure 3 shows the posterior distributions of ten parameters with four reference acceptance probabilities.They show similar patterns for different reference acceptance probabilities, except for the parameter b.Even when the reference acceptance probability is 0.5, the inversion yields reasonable Figure 4 shows the calculated monthly mean heat fluxes using the posterior parameter estimates from monthly observation at the US-MOz site.It is obvious that the default simulation underestimates the heat flux over all seasons.From January to June, all posterior estimates with different reference acceptance probabilities can significantly improve the simulation of heat flux, except for some small underestimations in April, May, and June.From July to  December, the posterior estimates with reference acceptance probabilities of 1.0 and 0.5 are similar and close to observations, while the other two overestimate the heat flux a little in July and August.As a whole, all estimates of the inverse modeling can improve the simulation of heat flux over all seasons, and the posterior model probabilities are 0.893, 0.889, 0.886, and 0.892 respectively, which are greater than 0.876 of the default simulation.Differences in the posterior distributions with different reference acceptance probabilities are small.

Posterior distributions of input parameters and simulated heat flux from use of daily data
Figure 5 shows the posterior distribution of model parameters with four reference acceptance probabilities using daily heat flux data from the 2003-2006 period at the US-ARM site.The posterior distributions disperse over the prior bounds for most parameters.Among the four sets of posterior distributions, reference acceptance probability of 1.0 and 0.95 identify similar bounds, while the other two sets yield different results, particularly for f max , f over , Q dm , and K s .Moreover, multi-modal distributions occur for most parameters when the rejection standard is relaxed.
Figure 6 shows the calculated monthly mean heat fluxes using the posterior estimates of parameters from daily observations at the US-ARM site.The posterior estimates of parameters also improve the heat flux in summer.The acceptance probabilities of the simulations with four parameter sets are 0.636, 0.618, 0.584, and 0.593 respectively, which are all greater than 0.579 of the default simulation.The posterior distributions of the parameters with the four reference acceptance probabilities using daily heat flux data from the 2004-2007 period at the US-MOz site show that the posterior distributions are more consistent for C s , f over , Q dm , S y , and s , but dispersed for b, K s and θ s .When the rejection standard is relaxed, the posterior bounds can become much wider, especially for f drai , Q dm , b, K s and θ s .In winter, all simulations of heat flux using the posterior estimates are close to observations.In summer, posterior estimates with reference acceptance probabilities of 1.0 and 0.95 significantly improve the heat flux simulation.

815
Figure 7. Posterior distribution of the parameters with four reference acceptance probabilities 816 using monthly runoff data at the MOPEX basin.817

Parameter inversion at MOPEX sites using runoff observations
Runoff observations are also used in the inverse modeling of CLM. Figure 7 shows the posterior distribution of the parameters with four reference acceptance probabilities using monthly runoff data from the 2002-2005 period at the MOPEX basin close to US-ARM.Posterior distributions with strict reference acceptance probabilities (e.g., 1.0, 0.95 and 0.9) have consistent patterns for most parameters, except for b, K , and θ s .It is interesting to see that f max is identically estimated by inversions with different reference acceptance probabilities.When the rejection standards are relaxed, the bounds of posterior distributions of most parameters become wider, and multi-modal patterns occur.
Figure 8 shows the calculated monthly mean runoff using the posterior estimates of parameters from monthly observations at the MOPEX basin.The default simulation barely shows any variability of runoff.The posterior estimates significantly improve the runoff simulations in all seasons, albeit larger variability than observations is noted from July to October.Further analysis (Fig. 15) shows large peaks in the observed runoff in June and July, which the CLM4 model simulations cannot match well by adjusting model parameter values.This suggests some systematic biases in the model parameterizations that cannot be fully addressed by parameter calibration.However, we cannot exclude the possibility of errors in either the external forcing or observational heat fluxes.For example, the atmospheric forcing data are calculated over the 1/8th degree grid, which may underestimate rainfall intensity for heavy precipitation events, leading to underestimation of runoff peaks in the summer and accumulation of soil moisture for runoff generation in subsequent periods.
The acceptance probabilities are 0.846, 0.767, 0.758, and 0.737, all greater than 0.707 of the default runoff simulation.Among the four sets of simulations based on inversion, more stringent sample rejection criterion results in a better match between the simulated responses with observations.Because hydrological parameters are more sensitive to hydrological processes, this conclusion is not surprising for inversion with runoff.In cases using heat fluxes for calibration, we also found that the RMSEs of flux simulations using the posterior estimates with the reference acceptance probability of 1.0 are better than others.It is worth mentioning that for Y. Sun et al.: Inverse modeling of hydrologic parameters observational data that are more directly related to the processes influenced by the model parameters, a more stringent rejection criterion can help to improve calibration accuracy and precision.For indirect information, a relaxed criterion is helpful to avoid biased estimates.
For comparison with inversion using monthly runoff data, we also performed inversion using daily runoff data from the 2002-2005 period at the MOPEX basin.The posterior distributions with different reference acceptance probabilities disperse over the prior bounds, except for S y .The calculated monthly mean runoff values using the posterior estimates of parameters from daily observations are reasonable.Compared to using monthly runoff data, the big fluctuations disappear when using daily runoff.Posterior estimates with reference acceptance probabilities of 1.0 and 0.95 can significantly improve the runoff simulation over all seasons, but underestimate the runoff in summer, and overestimate the runoff in winter.

Results of subset parameter inversion
Our global sensitivity analyses across 13 flux towers and 20 MOPEX basins (Hou et al., 2012;Huang et al., 2013) suggest that simulated LH and runoff are most sensitive to three subsurface parameters: f drai , Q dm , and S y .Our full-set parameter inversion shows that the posteriors for the most sensitive parameters are more consistent than the unidentifiable parameters, which is expected.Because the other parameters are less identifiable, the inverse problem will be less ill-posed by fixing the trivial parameters.In this section, we test the feasibility of only inverting a subset of identifiable parameters to determine if similar or improved model skill may be achieved compared to using the full-set parameter inversion results.We first conducted the inversion with a reduced set of parameters using monthly observation at the two flux tower sites and the MOPEX basin.Only the most identifiable parameters are calibrated, the other parameters are fixed as the default values in CLM4.
Figure 9 shows the posterior distribution of the reduced set of parameters at the US-MOz site.Compared with the results of ten parameters (see Fig. 3), the medians of the posterior distribution of f drai and Q dm are smaller, the width of the posterior bounds of f drai and S y is unchanged, while that of Q dm expands.
Figure 10 shows the calculated monthly mean heat flux using posterior estimates of parameters at the US-MOz site.The simulations using posterior estimates of the reduced parameter set significantly improve the heat flux simulation over all seasons, and are similar to the results of inversion with ten parameters.
Inversion at the US-ARM site also shows that in general, the posterior bounds start to narrow, and the multi-modal patterns disappear, compared to the inversion results with the full-set of parameters.Using posterior estimates of the re- duced parameter set can significantly improve the latent heat flux simulations compared to the results using the full-set of parameters, especially from October to December, and from January to May.Results from US-ARM and US-MOz are consistent, so only one set of figures is shown.
Figure 11 shows the posterior distribution of the reduced parameters at the MOPEX basin.Compared with the results of ten parameters (see Fig. 7), the medians of posterior distribution of f drai and Q dm are smaller, while that of S y does not change, and the widths of posterior distribution of all parameters stay the same.
Figure 12 shows the calculated monthly mean runoff using posterior estimates of parameters at the MOPEX basin.Inversion with reduced parameters improves the runoff simulation, which is similar to the posterior simulation with ten parameters, while the simulation with the default parameter is far away from observation.Overall, inverse modeling with a reduced set of parameters identified from previous sensitivity analysis yield comparable predictions obtained from the full-set parameter inversion, and since the inverse problems become less ill-posed with fewer unknowns, the convergence of inverse modeling is faster and the resulting posteriors are more consistent without multi-modal patterns.However, the simulations of heat flux at the US-MOz and runoff at the MOPEX basin are comparable between the inversion for the reduced and full set of parameters.Theoretically, one may expect improvement using the reduced parameter set because the inversion becomes less ill-posed, but in practice, getting a faster convergence of the solution may be the main advantage, which is important especially when calibrating parameters for computationally intensive models.

Impacts of temporal resolution of heat flux observation on inverse modeling
Both monthly and daily heat flux data have been used in the inverse modeling at selected flux tower sites.Using monthly observation data, the inversion with reference acceptance probabilities (except for p ra of 0.5 at US-ARM) is able to identify proper parameter estimates to improve heat flux simulation.Using daily observation data, inversion improves the heat flux simulation only with reference acceptance probabilities of 1.0 and 0.95, indicating that using data of higher temporal resolution might need a relatively more stringent acceptance criterion (i.e., higher p ra ).Comparing Figs. 1 and 5, inversion using daily instead of monthly observations favors values of f over , Q dm , s towards the lower bounds at the US-ARM site.At the US-MOz site, inversion using daily observation also favors values of f over and Q dm towards the lower bounds.Hence, finer temporal resolution of observation data favors smaller f over and Q dm .For specific sites, it may also lead to changes of other parameters.
Regarding inversion using runoff data, we also found that finer temporal resolution of runoff observations leads to more dispersion of the posterior distributions of most parameters, except for f over , S y and θ s .Using monthly observation data, inversions with all reference acceptance probabilities are able to improve monthly mean runoff simulations over all seasons.However using daily observation data, inversion improves runoff simulation only with reference acceptance probabilities of 1.0 and 0.95.
Overall, finer temporal resolution of observation data leads to more dispersion of the posterior distributions and increases the risk of using a relaxed rejection standard.These are likely related to increased measurement errors, data redundancy, and over-fitting with higher temporal frequency observations.
It is worth pointing out that temporal patterns in observational data may affect the performance of inversion.For example, the smaller seasonal variations of LH at US-ARM present unfavorable conditions for calibrations compared to US-MOz.At US-ARM, precipitation has a larger seasonal cycle with relatively dry winters and wet summers.Because annual rainfall is lower, LH is more limited by soil moisture availability.In contrast, US-MOz receives more rainfall throughout the year so LH is more controlled by solar radiation and hence peaks in July.Because US-ARM is more moisture limited, LH is more sensitive to model parameterization of soil hydrology.The inability of the model to capture the correct timing of the peak LH despite parameter calibration suggests that there may be structural limitations of the parameterizations used in CLM4 that cannot be adequately addressed by inverse modeling of uncertain parameters.

Impacts of soil and vegetation cover on inverse modeling
Compared to US-MOz, inverse modeling at the US-ARM site identifies smaller Q dm , and greater f over and θ s .In addition, the bounds of posterior distribution identified by the inversion show more consistency across different reference acceptance probabilities for f over , f drai , Q dm , b, and s at US-ARM than US-MOz, especially with monthly heat flux data.At US-MOz, the bounds of posterior distribution are mainly consistent for f max , f over , Q dm , S y , and θ s .These inversion results are consistent with the sensitivity analysis performed by Hou et al. (2012), which shows larger sensitivity to the respective parameters at the two sites related to soil and vegetation properties.The best estimated parameters are different at sites with different climate, land use, and soil conditions; hence soil and vegetation cover may inform the selection of sensitive parameters that can be used in reduced parameter sets for inverse modeling.It is therefore necessary to analyze parameter sensitivity and identifiability across the flux tower sites and MOPEX basins and classify them into different groups/classes with similar climate and soil conditions, and then evaluate parameter transferability within each class or between classes through inverse modeling studies.

Impacts of reference acceptance probability
In this study, we set the reference acceptance probability in the inverse modeling to relax rejection standard to allow more freedom in searching for optimal parameter estimates.However, relaxing the rejection standard leads to broadening of the bounds of posterior distribution and multi-modal be-haviors.That is, the posterior estimates tend to be more "accurate" but less "precise", and the corresponding inversion process usually takes longer to converge.

Impacts of different types of observations on inverse modeling
Inverse modeling using heat flux at US-ARM and runoff at the MOPEX basin, which is located close to US-ARM, provides an opportunity to assess the impacts of data type on inverse modeling.Note that a basin is not homogeneous in climate and land cover, and there are differences in spatial scales and heterogeneity between processes that influence the flux tower site and the MOPEX basin.Nevertheless, they are geographically close and have comparable climate and soil conditions, so it is possible to resolve the impacts of using different data types of observations for calibration.Comparing Figs. 1 and 7, the posterior distributions that optimize the simulations of heat flux can differ from those that optimize the simulations of runoff.Since the calibrated model parameters are directly related to the soil's hydrological processes including surface and subsurface runoff, it is not surprising that model inversion leads to more significant improvements in runoff (Fig. 8) than heat flux (Fig. 2) compared to simulations that use the default parameter values.It is also possible that the default parameters that control hydrological processes may be more poorly defined in CLM in general or for the particular sites being evaluated so that there is more room for improvements.The simulations of heat flux can nevertheless be improved by inverting hydrologic parameters because surface heat flux is influenced by soil moisture, which is closely related to runoff processes.The improvement in simulations of heat flux is particularly more noticeable at US-MOz than at US-ARM, where structural uncertainty may have larger effects due to its stronger dependence of heat fluxes on soil moisture availability.In addition, being a cropland site, LH at US-ARM, may be influenced by other factors such as crop management that are not well represented in the model.
Although inverse modeling leads to larger improvements in the runoff simulations compared to the simulations using the default parameter values, the runoff simulations with the posterior estimates still deviate quite significantly from the observed runoff in late summer and fall.This indicates possible model biases in runoff due to model structural errors in the hydrologic parameterizations or the quality and spatial resolution of the forcing data discussed earlier.
The LH flux can be measured by flux tower, which is representative of more local conditions, while runoff is a composite response of a drainage basin of a large area.These provide complementary information for model calibration.We did not perform an integrated inversion because energy flux observations are available only at the flux tower sites (US-ARM and US-MOz), and runoff data are only available at the MOPEX basins, each representing the best available observations at each scale, respectively.Therefore, in this study, we focused on evaluating the potential of improving CLM simulations using the best available observations at appropriate scales.However, our study clearly demonstrated the potential of multi-objective calibration, which will be attempted in future studies.

Improvements through Bayesian model averaging
For each reference acceptance probability for the MCMC-Bayesian inversion, one can obtain a set of posterior distributions of the unknowns.Bayesian model averaging is used to integrate the different sets of predictions by weighting the posteriors according to their posterior model probability.
By integrating inversion results of different reference acceptance probabilities, Bayesian model averaging produces smoother posterior distributions.Figure 13 shows the posterior distributions of the parameters through Bayesian model averaging at the US-ARM site.The black lines represent the prior distributions based on prior information.The red and blue curves represent the posterior distributions of the parameters using monthly and daily heat flux observations, respectively.These two sets of posterior distributions are similar to each other for most parameters, except for f over .Daily heat flux favors smaller C s , f over , Q dm , s , and greater f max , K s , θ s .The posterior distributions using monthly and daily observations at the US-MOz site are also similar, but daily heat flux favors smaller C s , f over , f drai , Q dm , S y , s and θ s , and greater f max and K s .
Figure 14 shows the posterior distribution of the parameters through Bayesian model averaging at the MOPEX basin.The posterior distributions using monthly and daily runoff observations are also similar.Daily observation favors smaller θ s and greater S y , b and K s .It is noted that the differences between the posterior distributions from monthly and daily data are even smaller from inversions using runoff compared to inversions using heat flux, especially for f drai and Q dm .This may be related to the characteristic timescales of the physical processes.Surface heat flux may have less dayto-day variability (hence larger data redundancy) compared to runoff, which responds more directly to precipitation that has larger temporal variability than temperature during the wet season.These differences could be site and season dependent so analyses over a larger number of sites can provide further insights on the sensitivity of model inversion to data temporal frequency.
Model integration represents a compromise of all possibilities of the inversion setup.In general, it results in "safer" (i.e., more likely to be unbiased) estimates but lower resolution (i.e., wider posterior distributions).

Model validation
In the above analyses, we compared observed and model calibrated responses to check whether smaller misfits can be achieved through the calibration process, and to evaluate the different calibration power of runoff versus heat flux observations, monthly versus daily data, and different tuning parameters.In an inverse study, it is important to validate the inversion approach.It is straightforward to validate the results when true values of the unknown input parameters are available.Otherwise, people may design "synthetic" models by assuming the "true" parameter values are available, then "generate" the corresponding "true" responses, which are then used for testing the inversion approach.An alternative way of validation is to separate the data set to training (for calibrating the parameters) and testing periods, assuming the parameters are intrinsic to the system and not time-varying.Fig. 15a and b show the observations as well as model simulated monthly and daily runoff calculated using default and optimal parameter values.The inversion (training) time period is 2002-2005, and validation periods are 2000-2001 and 2006-2008.The RMSEs are calculated for the validation periods only.We found that RMSEs are reduced more for monthly data than for daily data.Sub-monthly variations involve more internal/external factors and processes that are more complicated.It is harder to capture the higher frequency components of the output variability.In general, runoff calculations using optimal parameters from the training period can significantly improve the model misfits during the testing periods, and the major patterns of inter-annual and seasonal variability are well captured.

Conclusions
In this study, we demonstrated the possibility of inverting hydrologic parameters using surface flux and runoff observations in CLM4.Calibrating model parameters using the deterministic least-square fitting method provides little improvement in simulating heat flux and runoff, but using the calibrated values as initial guesses in the MCMC-Bayesian calibration reduces the discrepancies between simulated and observed responses, however the convergence rate is unaffected by the choice of initial guesses.
Focusing on the MCMC-Bayesian inversion method, we conducted inverse modeling at two flux tower sites and one MOPEX basin.We also discussed the impacts of relaxing the rejection standard, data temporal resolution, data types, and soil and vegetation on parameter inference.Informed by our previous sensitivity analysis, we also performed inversion with reduced parameter dimensionality.Moreover, Bayesian model averaging is adopted to integrate the posterior estimates with different reference acceptance probabilities.The major conclusions are as follows.
1. Inversion results at the flux tower and MOPEX sites using monthly and daily surface flux and runoff observations show that the MCMC-Bayesian inversion approach in general can improve the simulation of CLM under different climates and environmental conditions.
2. Temporal resolution of observations has clear impacts on the results of inverse modeling using heat flux data, but the impacts are smaller using runoff data.Due to data redundancy and quality, finer temporal resolution of observations may yield biased estimates and multimodal posterior distributions.
3. Significant improvements can be achieved to better match the observed heat flux and runoff by using the estimated parameters compared to using the default parameter values.The improvement is more obvious for runoff because the calibrated parameters are directly related to runoff processes, and also because default parameter values yield runoff simulations that deviate far from observations.However, improvements in heat flux can also be quite significant especially in areas (e.g., forest) with strong energy and water constraints.Soil and vegetation cover have important impacts on parameter sensitivities, leading to different posterior distributions of parameters at different sites.
4. Reducing the parameter set can make the inverse problem less ill-posed.Numerically, it also speeds up the convergence.In this study, inverse modeling with the reduced parameter set favors parameter estimates closer to the lower bounds than using the full set of parameters.
5. Bayesian model averaging that integrates the posterior estimates with different reference acceptance probabilities can smooth the posterior distribution and provide more reliable parameter estimates, but at the expense of wider uncertainty bounds.
Overall, the MCMC-Bayesian inversion approach is found to provide effective and reliable estimates of model parameters at the site and watershed level to improve CLM simulations of surface flux and runoff.
To apply the method for inversion over a region or globally, there are a number of challenges, including computational requirements and availability and quality of observation data.The analyses presented in this study should be extended to a larger number of sites with a wider range of climate, hydrologic, and vegetation/soil conditions to determine if and how model parameters may be transferrable based on site conditions to larger areas or river basins.Exploring model inversion at the river basin level rather than site level using combinations of local flux measurements, area averaged flux data (e.g., derived from satellite), and basin total runoff, each with their own uncertainty estimates, may provide an alternative strategy for calibrating model parameter values for each river basin.
To reduce the computational demand, we will also test the performance of the MCMC-Bayesian inversion approach using surrogates (i.e., approximated relationships between inputs and output responses) as alternatives to the CLM4 numerical simulator.

Fig. 1 .Fig. 2 .
Fig. 1.Posterior distribution of the parameters with four different reference acceptance probabilities (p ra ) using monthly heat flux data at the US-ARM site.
Posterior distribution of the parameters with four reference acceptance probabilities 805 using monthly heat flux data at the US-MOz site.806

Fig. 3 .
Fig. 3. Posterior distribution of the parameters with four reference acceptance probabilities using monthly heat flux data at the US-MOz site.

807Figure 4 .Fig. 4 .
Fig. 4. Simulated heat fluxes using the posterior estimates of parameters at the US-MOz site.
Posterior distribution of the parameters with four reference acceptance probabilities 810 using daily heat flux data at the US-ARM site.811

Fig. 5 .
Fig. 5. Posterior distribution of the parameters with four reference acceptance probabilities using daily heat flux data at the US-ARM site.

Fig. 6 .
Fig. 6.Simulated heat fluxes using the posterior estimates of parameters with daily observations at the US-ARM site.

Fig. 7 .
Fig. 7. Posterior distribution of the parameters with four reference acceptance probabilities using monthly runoff data at the MOPEX basin.

Fig. 8 .
Fig. 8. Simulated runoff using the posterior estimates of parameters at the MOPEX basin.

820Figure 9 .Fig. 9 .Fig. 10 .
Fig. 9. Posterior distribution of the reduced parameter set from previous sensitivity analysis at the US-MOz site.

Fig. 11 .
Fig. 11.Posterior distribution of the reduced parameter set from previous sensitivity analysis at the MOPEX basin.

Fig. 12 .
Fig. 12. Simulated runoff using the posterior estimates of parameters at the MOPEX basin.

830Figure 13 .
Figure 13.Posterior distribution of the parameters through Bayesian model averaging at the US-831

Fig. 13 .
Fig. 13.Posterior distribution of the parameters through Bayesian model averaging at the US-ARM site.

Fig. 14 .
Fig. 14.Posterior distribution of the parameters through Bayesian model averaging at the MOPEX basin.

Fig. 15 .
Fig. 15.Comparison among the observations, default and optimal simulations of (A) monthly and (B) daily runoff, during the inversion (2002-2005) and validation(2000-2001 and 2006-2008)  periods at the MOPEX basin.The RMSEs are calculated for the validation periods.
s Saturated soil matrix potential (mm) Soil water Cosby et al. (1984).Mean values and SDs are from Table 5 in Cosby 9 K s Hydraulic conductivity (mm s −1 ) Soil water et al. (1984), except for SD of s, which is from Table 4 in Cosby 10 θ s Porosity Soil water et al. (1984).