the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Comparing the Impacts of Single and MultiObjective Optimization on the Parameter Estimation and Performance of a Land Surface Model
Abstract. In land surface models (LSMs), precise parameter specification is crucial to reduce the inherent uncertainties and enhance the accuracy of the simulation. However, due to the multioutput nature of LSMs, the impact of different optimization strategies (e.g., single and multiobjective optimization) on the optimization outcome and efficiency remains ambiguous. In this study, we applied a revised particle evolution Metropolis sequential Monte Carlo (PEMSMC) algorithm for both single and multiobjective optimization of the Common Land Model (CoLM), constrained by latent heat flux (LE) and net ecosystem exchange (NEE) measurements from a typical evergreen needleleaf forest observation site. The results reveal that the revised PEMSMC algorithm, demonstrates a robust ability to tackle the multidimensional parameter optimization challenge for LSMs. The sensitive parameters for different target outputs can exhibit conflicting optimal values, resulting in singleobjective optimization improving the simulation performance for a specific objective at the expense of sacrificing the accuracy for other objectives. For instance, solely optimizing for LE reduced the rootmeansquare error (RMSE) of the simulated and observed LE by 20 % but increased the RMSE of the NEE by 97 %. Conversely, multiobjective optimization can not only ensure that the optimized parameter values are physically sound but also balances the simulation performance for both LE and NEE, as evidenced by the decrease in RMSE for LE and NEE of 7.2 W/m2 and 0.19 μmol m2 s1, respectively. In conclusion, these findings reveal that comprehensively integrating the various available observational data for multiobjective optimization is preferable for parameter calibration in complex models.
 Preprint
(1539 KB)  Metadata XML

Supplement
(1919 KB)  BibTeX
 EndNote
Status: final response (author comments only)

RC1: 'Comment on hess2024121', Anonymous Referee #1, 26 Jun 2024
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess2024121/hess2024121RC1supplement.pdf
 AC1: 'Reply on RC1', Cong Xu, 11 Sep 2024

RC2: 'Comment on hess2024121', Jasper Vrugt, 24 Aug 2024
Summary: In this study, the authors applied a revised particle evolution Metropolis sequential Monte Carlo called PEMSMC to single and multiobjective optimization of the Common Land Model (CoLM) using measurements of the latent heat flux (LE) and net ecosystem exchange (NEE) from a typical evergreen needleleaf forest observation site. The authors conclude that the revised PEMSMC algorithm is a robust method for multidimensional parameter estimation of LSMs. The optimal parameters for LE and NEE exhibit a tradeoff, necessitating the estimation of these parameters against LE and NEE data simultaneously.
Evaluation: This paper is about an important topic in hydrologic modeling, namely the training of a land surface model so that its simulated output matches as closely and consistently as possible multiple different measurement types. Overall, the paper is reasonably well written, albeit at times using incorrect methodological terminology and language. I worked quite a bit on Bayesian methods and am always interested in the application of these methods to Earthsystems, I am not particularly excited and convinced by the work presented in this paper. I will summarize my comments next. Based on these comments I recommend a major revision. This gives the authors an opportunity to revise their work. This may involve quite a bit of additional work as my comments will point out.
General comments:
While reading this manuscript, many thoughts went through my mind. In the first place, it was a trip down memory lane. The presented material reminded me of work done by Luis Bastidas and others about 3 decades ago, and my own work of about 15 years ago. Then, I was surprised to see that the authors do not reference much prior work and present the algorithms as if they invented all this themselves. Furthermore, the authors implemented a Bayesian approach, but they do not verify at all whether the assumptions they made were actually met in practice nor do they take advantage of their posterior distribution in analyzing model performance. Last but not least, I am not convinced yet that the mutation operator of their PECSMC algorithm (which is a crossover step!) leaves the target distribution invariant. I will now address each of these points in more detail. Before I do so, I must apologize for the fact that I heavily advertise my own work in my review below. In general, I do not like doing this, but this proved difficult in the present case. The authors’ paper demonstrates a keen interest in methodology. This methodology has various problems and shortcomings. To point this out, I must unfortunately refer to our own published works. The authors work with MATLAB, so it should be easy to evaluate/test my suggestions and test their method against “stateofthe art” methods in MATLAB I am pointing at. I hope my comments below clarify my concerns.
 The conclusion that Landsurface models exhibit a tradeoff in describing different data types has been well known to the community. In a series of papers in the 1990s Bastidas and coauthors have convincingly shown that LSM training results in different “optimal” parameter values if trained against different data types. They used a multiobjective optimization framework for this, along with a multiobjective sensitivity analysis method to rank the relative importance of individual parameters in describing different data types. This problem did not resolve with the use of a more complex LSM but is a result of (among others) epistemic uncertainty (model structural errors) and measurement errors of the controlling variables (exogeneous variables). This paper uses a Bayesian procedure and arrives at the same conclusions of Bastidas et al.
Thus, fundamentally, the conclusion of the authors is not new. Then, let me look at the methodology used. Does this warrant publication in HESS? Before I move on to the methodology, I would want to clarify that the wording “multiple objective” is erroneous. In essence, the authors are lumping together in their likelihood function two different data streams. They do this by multiplying the likelihoods of LE and NEE. This is not multiobjective, as the authors only have 1 likelihood (= combined likelihood of NEE and LE). With the choice of a Gaussian likelihood, the authors’ method is equivalent to weighted least squares. With a diagonal measurement error covariance matrix where the first n entries are the measurement error variances for the latent heat flux data and the next m entries are for the measurement error variances for the NEE data, you will arrive at exactly the same maximum aposteriori (MAP) density solution [uniform prior is used]. What the authors did is maximum likelihood estimation with a weighted likelihood function.
 The PEMSMC algorithm the authors present & modify reminded me of my own work on particleDREAM (Vrugt et al: Hydrologic data assimilation using particle Markov chain Monte Carlo simulation: Theory, concepts and applications), http://dx.doi.org/10.1016/j.advwatres.2012.04.002. Given the large overlap between the author’s PEMSMC method and particleDREAM I was surprised to see that there is not a single reference in their methodological description to the particleDREAM paper. It is possible that the authors missed this work, but then their socalled mutation step in Equation 18 is taken directly from the particleDREAM paper, and/or related papers on DEMC and the DREAM algorithm. The authors use the same symbols in their paper for the jumprate, gamma, parents, r_{1} and r_{2}, etc. as we used in our publications. The authors should properly cite and acknowledge past work.
 Fundamentally, I am not confident that the Metropolis acceptance probability defined after Equation (18) leaves the target distribution invariant. We struggled with this in the ParticleDREAM paper and presented in Appendix B (Page 476) a correct formulation of the acceptance probability so as to ensure ‘detailed balance’. I suggest the authors study this Appendix and determine whether their particle transitions guarantee an exact inference of the target distribution. This requires careful demonstration. As a side note, the PEMSMC sampler has many elements in common with the ABCPopulation Monte Carlo sampler used in Sadegh and Vrugt, (2013: see Appendix A, Page 4845). We also implemented a DREAM resampling step in that algorithm.
 In addition to my previous comment, even if the method is theoretically correct, then one cannot simply make a change to an algorithmic recipe and claim this revised method is robust (as the authors do in their abstract). We cannot conclude anything about robustness using only the posterior distributions of the LE and NEE data. To inspire confidence in the revised algorithm one needs to demonstrate that the method works well on a large range of problems – meaning it converges to the known posterior distribution in a series of benchmark experiments. Otherwise, claims of robustness are meaningless, and we must simply trust that the algorithm correctly infers the target distribution of the LSM parameters.
 Then, one more time about Equation (18), how is the resampling implemented? Are N candidate points generated simultaneously, and then pairwise the acceptance probability is computed to determine whether to accept the proposals or not. Or is the implementation sequential, that is, a proposal is created using Equation (18) and then this is accepted (or not) with Metropolis probability α. If accepted, the proposal replaces its parent, and then the next candidate point is created? The algorithmic recipe suggests this latter approach. The reason I ask this is that the first parallel implementation does not guarantee reversibility. This proof only holds for the latter, sequential, approach. See Appendix (Page 444445) of Vrugt and Ter Braak, https://link.springer.com/article/10.1007/s1122200891049
 How does the algorithm handle the parameter boundaries? What is done to a particle if it is sampled outside the prior parameter ranges (by the authors’ mutation step)? Detailed balance requires that the proposals are assigned a zero likelihood, or better, the candidate points are folded [e.g. see Vrugt and Ter Braak, 2011: https://doi.org/10.5194/hess1537012011]. Which mechanism did the authors implement?
 Then, on a related note, the authors do not address the question so as to why the machinery of a particle SMC method is required first of all to infer the posterior parameter distribution. I am most familiar with the DREAM algorithm as I developed this myself with Cajo Ter Braak (2008). I seriously question whether based on the listed algorithmic variables of S = 100 and N = 200 the SMC algorithm will succeed in generating samples of the target distribution. I hazard to predict that the DREAM algorithm (MATLAB toolbox: DREAM Package) will be computationally more efficient, in large part because the burnin will be substantially smaller as one only needs to run N = 3 Markov chains. This does not require a tempering schedule to move particles from a prior to a posterior distribution. This type of bridge sampling only complicates methodology. Most people are familiar with the DREAM algorithm, and so why do we need new machinery if “old methods” can do the job – and as I bet more efficiently than what is presented in the present paper. Also, the multitry variant of DREAM allows a further speedup of the convergence speed to the target (posterior) distribution as the chains are evaluated in parallel and multiple proposals are created in each chain in parallel as well. The best proposal is then accepted with a modified Metropolis acceptance probability. This methodology is described in https://doi.org/10.1029/2011WR010608
 The authors assume a normal likelihood for the residuals of the LSM model. Why did they choice a normal likelihood. This is the default choice but must be supported by evidence. This require aposterior checking of the residuals of the time series of measured and simulated latent heat fluxes and NEE. I bet the residuals will exhibit autocorrelation, a nonconstant variance, and deviate from normality. This would disqualify the use of the normal likelihood function. Good statistical practice requires the authors to evaluate that the assumptions about the likelihood function are met in practice. For example, consider Schoups and Vrugt, 2010. https://doi.org/10.1029/2009WR008933
 The authors multiply the likelihoods of the latent heat flux and NEE as if these two data types are independent. Is this a valid assumption? As an alternative, one could consider a composite likelihood, where the two likelihoods are additive – one can find applications of this in statistical literature, specifically in the application to spatial data. Then, this would constitute a novelty and justify publication in a highly rated journal such as HESS.
 Then, the toolbox of DREAM in MATLAB (called DREAM package) has builtin distributionadaptive likelihood functions, such as the generalized likelihood and universal likelihood functions: https://doi.org/10.1016/j.jhydrol.2022.128542. These likelihood functions let the data speak for themselves – and let the residuals determine the most appropriate form of the likelihood function. This includes treatment of heteroscedasticity, correlation and nonnormality. This guarantees that the residual properties will match the likelihood assumptions – and inspire much more confidence in the posterior parameter distributions derived from the latent heat flux and NEE data. Certainly, if the authors use a Bayesian approach, as they do in this paper, they must demonstrate that their residual assumptions hold. Otherwise, the parameter estimates are not particularly meaningful.
 I do not understand why the authors use a complicated training SMCbased procedure, whereas there is relatively little they do with the knowledge of the posterior parameter distributions. In the first place, the authors should present the Bayesian predictive distributions (time series of confidence and prediction intervals of the simulated output) in their time series figures (e.g. Figure 4). What is the coverage of the prediction limits? As it stands right now, the authors focus their attention on the MAP solutions and ignore in large part parameter uncertainty. A maximum likelihood method would have done the job. Then, stochastic gradient descent would have found the MAP solution and a firstorder approximation around this optimum would have given a linear estimate of parameter uncertainty.
 The authors refer to Equation 15 as the “analytical method”. I am not familiar with this terminology. Equation 15 simply states that the measurement error standard deviation is equal to the square root of the sample variance of the residuals. They essentially use this as a sufficient statistic and embed this in their likelihood function. This is a common approach, but in the author’s, implementation raises two important questions. A) how was the sample variance of the residuals determined? This requires knowledge of the MAP (maximum aposteriori density) LSM parameter values, and B) the estimate of the sample variance of the residuals will be biased as the residuals are likely to exhibit serial correlation. The “true” measurement error variance can only be determined after decorrelating this time series of MAP residuals.
 The authors refer to their particle resampling step as a mutation step. This is wrong. A mutation is a random alteration to the DNA of a particle. This can happen to any parameter at any time and is fully random. The differential evolution (DEI) step the authors use from DEMC, DREAM or ParticleDREAM) is a crossover step. The proposed changes to the DNA of the particles are based on an underlying mechanism [= Equation 18], and augmented with an inconsequential random perturbation, zeta, to claim ergodicity. This is a crossover step wherein the DNA of two parents is combined to generate offspring (candidate particles). Note that in the original DREAM algorithm, there is no mutation step – only a crossover step. Then, why did the authors not consider the use of more than 1 pair (r1 – r2, r3 – r4, r5r6, etc.), of differences to generate candidate points? We have shown that this enhances considerably the variability in the candidate points. Indeed, the pair r1/r2 has a 1/N*1/(N1) selection probability, where N significant the population size. r1,r2,…,r6 on the contrary has a selection probability, of 1/N*1/(N1)*1/(N2) etc.; Thus one can generate a much large variation in the proposals with multiple pairs. The sideeffect of this is that you may be able to reduce the population size.
 Do I understand correctly that after convergence you have N = 200 samples from the target distribution? This is very small for a 17dimensional parameter estimation problem and does not support an accurate depiction of the posterior parameter distribution. Again, why not just use an adaptive multichain Monte Carlo method such as DREAM_ZS or MTDREAM_ZS? This machinery will do the job for you, while providing as byproduct a) automatic convergence monitoring (univariate and multivariate scale reduction factors, etc.), b) diagnostic checks of the residuals, c) confidence and prediction limits of the Bayesian forecast (simulation) PDF, d) scoring rules (CRPS, LS, QS, etc.) of the predictive distribution and access to distributionadaptive likelihoods.
 In principle, the authors have access to the predictive distribution of the model, but in their model evaluation resort only to common deterministic measures of model performance. This includes the NSE and RMSE in Equations 19 and 20. Their use entails a large loss of information about model performance. This is just a side note: The authors should consider evaluating the full predictive distribution using scoring rules. I hesitate to advertise my own work, but here it is, DistributionBased Model Evaluation and Diagnostics: Elicitability, Propriety, and Scoring Rules for Hydrograph Functionals: https://doi.org/10.1029/2023WR036710. This merely serves as a note to alert the authors against the use of deterministic measures of goodness of fit, in case one has access to the full Bayes predictive distribution. This allows for the use of informationtheoretic principled metrics – which offer more protection against misinformation, disinformation, etc.
 Then, the authors use different sensitivity analysis methods to decide which parameters are sensitive and which ones are not. A critical assumption in this analysis is that the parameters are independent. This assumption is convenient but may not be realistic in practice. As an idea, the authors could determine sensitivity using their posterior parameter samples. This would equate to probabilistic variancebased sensitivity analysis and relies on highdimensional model presentation (HDMR), an extension of Sobol to correlated variables. HDMR and HDMR with extended bases are computationally quite demanding, so this may pose problems with their LSM. Nevertheless, given their strong interests in computational methods I thought I’d point at the HDMR/HDMR_EXT toolboxes, which are available in MATLAB. Alternatively, they should consider the multicriteria sensitivity analysis method of Bastidas, in evaluating sensitivity in the presence of more than one data type. But, realistically, why not do inference on all the parameters and then assess parameter sensitivity from the posterior LSM parameter distribution(s)? One could use measures such as the KullbackLeibler divergence (= divergence of the logarithmic score, see Vrugt, 2024) to determine the distance between the marginal prior and marginal posterior distribution of each parameter. This is cheap to compute and will convey which parameters are most sensitive and which others are not. Indeed, for parameters that are sensitive one would expect its posterior marginal distribution to be small relative to its marginal prior. On the contrary, for a parameter that is insensitive it is common to see that its prior and posterior marginals are nearly equivalent. Maybe this approach simplifies the paper.
 Figure 2: The xlabels are not readable and strangely chosen. If the authors want to demonstrate that the different optimization scenarios lead to different marginal distributions of the parameters, then why not use a single xaxis, linearlyscaled, and just plot the 3 distributions on top of this. Thus the same as now but with a linear scale of the xaxis. This would show immediately the differences between the methods. Then, I have my doubts whether the marginal distribution of the parameters shown are truly Gaussian. This may be an artefact of an insufficient sample of the target distribution, as commented on earlier in 7.
 Figure 4: Maybe this is explained in the text and I missed it but why are the observations indicated with a pink interval? Noisy observations? Personally, I would prefer plotting the data as discrete points, alternatively, one can think of averaging the data (measurements) so there are fewer data points in return and then accompany these timeaveraged estimates with error bars. As it stands right now, I see three deterministic simulations and then a wide range of possible data values. If the authors are interested in quantifying measurement uncertainty, then the nonparametric estimator of de Oliveira and Vrugt comes to mind: https://doi.org/10.1029/2022WR032263. Again, this is a sidenote, but it may help the authors present their data better. Maybe the authors can use this to their advantage.
 Section 4.2: This is not a difference between single and multiobjective methods. All the results correspond to a single objective, albeit with one or more weighted data streams. I commented on this before.
 Then, the authors use the terminology of optimization scenarios. Personally, I do not like the terminology of optimization in a Bayesian context. Optimization focuses on finding the single best solution whereas Bayesian inference is fundamentally different in that it wants to find a distribution of statistically acceptable solutions – this should include the MAP parameter values (= ML with uniform prior) but cannot be considered optimization. Some authors call it Bayesian calibration, I prefer Bayesian training or estimation. I leave this to the authors.
In summary, this paper needs a lot of work before it can be judged to make a significant contribution to the literature. I am sorry for highlighting my own work, but I felt this was relevant and necessary as the authors’ methodology has important flaws, shortcomings and needs to be reconsidered. Amongst others, I believe that a) the authors should properly demonstrate that their method is indeed robust and that the changes made to the PECSMC sampler leave the target distribution invariant, b) they must properly recognize past literature contributions, consider how their method differs from past methods, and articulate why we would need the SMC machinery after all to obtain the posterior LSM parameter distribution if current state of the arts method can do this job (and the maximum likelihood method/weighted least squares will do in the current implementation of the authors), c) test the residual properties (diagnostic checks of autocorrelation, distribution and variance homogeneity) so as to demonstrate that the assumptions of the normal likelihood function are indeed met, d) present the confidence and prediction limits of the Bayes predictive distributions, e) clean up their language of single and multiple objective, mutation and crossover step, etc. and f) possibly, consider more powerful model evaluation metrics rather than the deterministic measures used by the authors. No doubt, these comments and those listed above will involve substantial work. But this should significantly enhance the quality of the work presented in this paper.
I hope my comments are useful to improve your manuscript,
Jasper A. Vrugt
jasper@uci.edu
Citation: https://doi.org/10.5194/hess2024121RC2 
RC3: 'Comment on hess2024121', Anonymous Referee #3, 27 Aug 2024
Dear Authors,
thank you very much for doing this work. I think the paper could make a contribution to HESS after some work. I leave here my main comments, i hope they help to improve the manuscript:
1 Literature review. I miss references to the work of Jasper Vrugt, mainly to Vrugt et al., 2012 (the DREAM paper), you might have a look at the work of Carlo Albert about ABC (during 2015 and maybe reply rom Vrugt later on), Kavetski et al. 2018 and Fenizia et al 2018 look through the introduction +references of these 2 papers; and the work from Prieto et al. 2021; 2022 about hydrological mechanisms identification and the diagnostic metrics there in. Otherwise the introduction is a bit repetitive and convoluted but it missses inormation, e.g. about the choose or development of diferent likelihoods (later on you assume a normal gaussian but i do not see the justification), model diagnostics metrics and why.
2 benchmark: i am missing a benchmark to compare. Maybe, one good idea might be to use the package from Vrugt for DREAM as benchmark  i am aware the author had everything ready to be applied
3 For the equations i suggest to use a properly maths notation. At least for me is helpul and i am sure for readers too. E.g. why is everything italic? i suggest you to differenciate vectors, matrices, random variables, etc (ie bold, capital letters , and so on). In Prieto et al., 2019; 2021; 2022 you can find examples for this and in whatever paper from Vrugt i am sure too.
4 Posterior pdf: please for the posterior of the parametere use the full Bayes equation and then say that the left hand is proportioinal to the right hand so that non Bayesian can follow it. Indeed this is meet because you use only one model (eg see prieto et al.., 2021, 2022)
5 concern: why the likelihood is a notmal likelihood? could you please justify and then analize the residuals of the posteriors? are you also meaning that the variables are independent and then the likelihooods can be multiplied? i suggest you to have a look at ABC here just in case it can help
6 diagnostic metrics: also, could you please take the advantage of doing probabilistic analysis to evaluate the posterior pdf using probabilitic metrics to look at reliability, precision and bias  ie not only deterministic (related) metrics, this only inspects one side of the history. Based on this, the advantages highlighted on the discussion section could be more defended.
7 for me the text is a bit confusing when talking about multiple objectives, i guess most of the readers tend to think about multiple objectie functions which is not the case because there is 1 likelihood  other thing is that there are 2 target variables
8 maybe a naive question, but do you need all the SA methods in the main manuscript?
Once again for your work
Hopefully my comments help in improving the paper
In case the Editor asks for a revised version of the paper, i am very happy to serve as reviewer of the revised version
Best Wishes,
Reviewer
Citation: https://doi.org/10.5194/hess2024121RC3
Viewed
HTML  XML  Total  Supplement  BibTeX  EndNote  

255  70  26  351  32  11  12 
 HTML: 255
 PDF: 70
 XML: 26
 Total: 351
 Supplement: 32
 BibTeX: 11
 EndNote: 12
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1