Multi-objective Parameter Optimization of Common Land Model Using Adaptive Surrogate Modeling

Parameter specification usually has significant influence on the performance of land surface models (LSMs). However, estimating the parameters properly is a challenging task due to the following reasons: (1) LSMs usually have too many adjustable parameters (20 to 100 or even more), leading to the curse of dimensionality in the parameter input space; (2) LSMs usually have many output variables involving water/energy/carbon cycles, so that calibrating LSMs is actually a multi-objective optimization problem; (3) Regional LSMs are expensive to run, while conventional multi-objective optimization methods need a large number of model runs (typically ∼ 10 5 –10 6). It makes parameter optimization computationally prohibitive. An uncertainty quan-tification framework was developed to meet the aforemen-tioned challenges, which include the following steps: (1) using parameter screening to reduce the number of adjustable parameters, (2) using surrogate models to emulate the responses of dynamic models to the variation of adjustable parameters , (3) using an adaptive strategy to improve the efficiency of surrogate modeling-based optimization; (4) using a weighting function to transfer multi-objective optimization to single-objective optimization. In this study, we demonstrate the uncertainty quantification framework on a single column application of a LSM – the Common Land Model (CoLM), and evaluate the effectiveness and efficiency of the proposed framework. The result indicate that this framework can efficiently achieve optimal parameters in a more effective way. Moreover, this result implies the possibility of calibrating other large complex dynamic models, such as regional-scale LSMs, atmospheric models and climate models.


Introduction
Land surface models (LSMs), which offer land surface boundary condition for atmospheric models and climate models, are widely used in weather and climate forecasting.They are also a tool for studying the impacts of climate change and human activities on our environment.Parameters of LSMs usually have significant influence on their simulation and forecasting capability.It has been shown that tuning even one or two parameters may significantly enhance the simulation ability of a LSM (e.g., Henderson-Sellers et al., 1996;Liang et al., 1998;Lohmann et al., 1998;Wood et al., 1998).How to specify the parameters in a LSM properly, however, remains a very challenging task because the LSM parameters are usually not directly measurable at the scale of model applications.
Compared to traditional hydrological models, the parameter calibration approach has not been practiced as much in LSM community, especially for large spatial-scale applications.The major obstacles to calibrating LSMs over a large spatial scale are (1) there are too many parameters to calibrate, namely, the curse of dimensionality in parameters; (2) dimensionality of the output space is too high (i.e., many processes such as water/energy/carbon cycles are simulated simultaneously); (3) conventional optimization methods, especially the multi-objective approach, need a large number (∼ 10 5 -10 6 ) of model runs; and the large complex dynamic system models such LSMs are usually expensive to run (i.e., costing many CPU hours).There have been numerous attempts to use multi-objective optimization to calibrate the parameters of LSMs and significant improvement in LSM performance measures as a result of optimization have been reported (e.g., Bastidas et al., 1999;Gupta et al., 1999;Leplastrier et al., 2002;Xia et al., 2002).However, the optimization efforts in the past were usually limited to cases involving only point or limited spatial domain-scale applications of LSMs (Liu et al., 2003(Liu et al., , 2004(Liu et al., , 2005)).To take a multiobjective optimization approach to the calibration of LSM parameters for large-scale applications, the key is to reduce the number of model runs to an appropriate level that we can afford.
Surrogate-based optimization is one of the most commonly used approaches to optimizing large complex dynamic models.Several books and literature reviews have described the advances of surrogate-based optimization in recent years (e.g.,Jones, 2001;Ong et al., 2005;Jin, 2011;Koziel and Leifsson, 2013;Wang et al., 2014).Surrogate-based optimization has been applied to economics, robotics, chemistry, physics, civil and environmental engineering, computational fluid dynamics, aerospace designs, etc. (Gorissen, 2010).On the development of surrogate-based optimization, Jones et al. (1998) proposed EGO (effective global optimizer) for expensive models using design and analysis of computer experiments (DACE) stochastic process model, namely, kriging interpolation method, as surrogate model.Castelletti et al. (2010) developed a multi-objective optimization method for water quality management using radial basis function (RBF), inverse distance weighted and n-dimensional linear interpolator as surrogates.Loshchilov et al. (2010) investigated the use of the ranked-based support vector machine (SVM) and demonstrated that for surrogate-based optimization capturing the relative value of the objective functions is more important than reducing the absolute fitting error.Pilát and Neruda (2013) developed a surrogate model selector for multi-objective surrogate-assisted optimization.In hydrology and water resources, Razavi et al. (2012) has summarized recent applications, advantages and existing problems.Wang et al. (2014) evaluated the influence of initial sampling and adaptive sampling methods for surrogateassisted optimization of a simple hydrological model, the Sacramento Soil Moisture Accounting (SAC-SMA) model.Song et al. (2012) optimized the parameter of a distributed hydrological model -distributed time-variant gain model's (DTVGM) -with an SCE-UA algorithm using the Multivariate Adaptive Regression Splines (MARS) method (Friedman, 1991) as surrogate.
In our recent works, we proposed a framework that can potentially reduce the number of model runs needed for parameter calibration of large complex system models (Wang et al., 2014).This framework involves the following steps: (1) a parameter screening step using global sensitivity analysis to identify the most sensitive parameters to be included in the optimization; (2) surrogate modeling that can emulate the response surface of the dynamic system model to the change in parameter values; (3) an adaptive sampling strategy to improve the efficiency of the surrogate model construction; and (4) a multi-objective optimization step to optimize the most sensitive parameters of the dynamic system model.In this paper, we will illustrate this parametric uncertainty quantification framework with the Common Land Model (CoLM), a widely used, physically based LSM developed by Dai et al. (2003Dai et al. ( , 2004) ) and Ji and Dai (2010).The work related to parameter screening and surrogate modeling-based optimization (adaptive surrogate model-based optimization strategy -ASMO) method for single objective has already been published (Li et al., 2013;Wang et al., 2014).This paper will emphasize the analysis of different surrogate model construction methods and multi-objective optimization methods and results.
This paper contains the following parts: Sect. 2 introduces the basic information of CoLM, the study area and data set, the adjustable parameters and the output variables to be analyzed; Sect. 3 presents an inter-comparison of five surrogate modeling methods, and discusses how many model runs would be sufficient to build a surrogate model for optimization; Sect. 4 carries out single and multiple objective optimization using an ASMO strategy; Sect. 5 provides the discussion and conclusions.

Model and parameters
The Common Land Model proposed by Dai et al. (2003Dai et al. ( , 2004) ) and Ji and Dai (2010) is one of the most widely used LSM in the world.It combines the advantages of the LSM (Bonan, 1996), the biosphere-atmosphere transfer scheme (BATS) (Dickinson et al., 1993) and the Institute of Atmospheric Physics LSM (IAP94) (Dai and Zeng, 1997;Dai et al., 1998).CoLM considers physical processes of energy and water transmission in soil vegetation, snow cover and atmosphere.It also implements glacier, lake, wetland and dynamic vegetation processes.Similar to previous research presented in Li et al. (2013), we select 40 adjustable parameters from CoLM.The parameter names, physical meanings and value ranges are shown in Table 1.This study considers six output variables simulated by CoLM: sensible heat, latent heat, upward long-wave radiation, net radiation, soil temperature and soil moisture.The normalized root mean squared error (NRMSE) is used as the objective function in our analysis: where i is the index of output variables, j is the index of time step, N is the total number of observations, y sim i,j and y obs i,j are the simulated and observed values, respectively.Objec-tive functions represent the performance of model simulation and a smaller objective function means better performance.

Study area and data sets
The study area and associated data sets are from the Heihe river basin, the same as in Li et al. (2013).The Heihe river basin, which is located between 96 The forcing data used include downward shortwave and long-wave radiation, precipitation, air temperature, relative humidity, air pressure and wind speed (Hu et al., 2003); the observation data used to validate the simulation of CoLM include: sensible heat, latent heat, upward long-wave radiation, net radiation, soil temperature and soil moisture.The soil temperature and moisture were measured at depth 10, 20, 40 and 80 cm.In CoLM, the soil is divided into 10 layers and the simulated soil temperature and soil moisture are linearly interpolated to the measured depth.Currently, we have 2 years of observation data.The data from year 2008 was used for spin-up and that of 2009 was used for parameter screening, surrogate modeling and optimization.The simulation time step is set to 30 min and the simulation outputs are averaged to 3 h in order to compare with the observation data.

Comparison of surrogate models
After the sensitive parameters are identified using global sensitivity methods (see Li et al., 2013), the next step is to calibrate the sensitive parameters through multi-objective optimization.Since the calibration of CoLM in real-world applications can be very expensive, we aim to establish a surrogate model to represent the response surface of the dynamic CoLM.The surrogate model, also called response surface, meta-model, statistical emulator, is a statistical model that describes the response of output variable to the variation of input variables.Because the surrogate model only considers the statistical relationship between input and output, it is usually much cheaper to run than the original large complex dynamic model ("original model" for short).Parameter optimization usually needs thousands, or even up to millions of model runs.It will be impossible to calibrate large complex dynamic models if running the original dynamic model is too time-consuming.If we can do parameter optimization with a surrogate model instead of an original model, the time of running the original model will be dramatically reduced, making it possible to calibrate the large complex dynamic models, such as LSMs, atmospheric models, or even global climate models.However, optimization based on surrogate models can be a challenging task because the response surface might be very bumpy and has many local optima.Razavi et al. (2012) gave a comprehensive review of the surrogate modeling methods and applications in water resources, and discussed the pitfalls of surrogate modeling as well.
In this research, we first compared five different surrogate models: multivariate adaptive regression spline (MARS), Gaussian process regression (GPR), Random Forest (RF), support vector machine (SVM), and artificial neural network  2).
As shown in Fig. 1, for some cases, such as upward longwave radiation, the fitting ability of the training set does not change significantly with sample size, but for soil moisture, larger sample size leads to better fitted surrogate models.Such phenomenon indicated that the specific features of the response surfaces have significant influence on the fitting ability, and good surrogate models must have the ability to adapt to those features.As shown in Fig. 1, GPR has the best fitting ability for almost every case except soil temperature.As described in Appendix 2, the hyper-parameters used by GPR can be adaptively determined using the maximum marginal likelihood method.
Figure 2 shows the NRMSE of the testing sets, indicating the risk of over fitting.In Fig. 2 we can note more remarkable findings: (1) the error of a surrogate model decreases as the sample size increases.The marginal benefits of using additional samples become less or even negligible if the sample size is larger than 400; (2) among the five different surrogate models, GPR has the best performance, while ANN ranks the second.Random Forest and MARS have lower accuracy.For some output variables (e.g., sensible and latent heat), the performance of SVM seems acceptable, while for other variables (e.g., soil temperature), SVM's performance is not satisfactory; (3) the convergence speeds for the six output vari- ables are different.For net radiation, soil temperature and soil moisture, the fitting error decreases to nearly 0 if the sampling points are more than 200; while for sensible heat, latent heat and upward long-wave radiation, the marginal benefit of adding more points is still significant beyond more than 200 sample points.Since the GPR method can consistently give the best performance for all six output variables, we choose GPR in the multi-objective optimization analysis presented later.

Single-objective optimization
Before we conduct multi-objective optimization, we first carried out single-objective optimization for each output variable using the GPR surrogate model.The SCE method (Duan et al., 1992(Duan et al., , 1993(Duan et al., , 1994) ) is used to find the optima of the surrogate models.In order to figure out how many sample points are sufficient to construct a surrogate model for optimization, different sample sizes (i.e., 50, 100, 200, 400, 800, 1200, and 2000) are experimented.To evaluate the optimization results based on the surrogate model, we also set up two control cases: (1) no optimization using the default parameters as specified in CoLM, and (2) optimization using the original CoLM (i.e., no surrogate model is used).The second case is referred to as "direct optimization".The control cases are used to confirm the following hypotheses: (1) parameter optimization can indeed enhance the performance of CoLM; (2) optimization using the surrogate model can achieve similar optimization results as using the original model, but with fewer model runs.
The optimal parameters given by single-objective optimization are shown in Fig. 3.In each subfigure the optimal parameter values are normalized to (0, 1).The bold black line is the optimal parameter set obtained by direct optimization using the original CoLM, and other lines are optimal parameters given by surrogate models created with different sample sizes.The optimization results reveal that (1) parameter optimization can significantly improve the simulation ability of CoLM for all output variables.(2) For sensible heat, upward long-wave radiation, net radiation, soil moisture, the optimal parameters obtained by surrogate model optimization runs are very similar to those obtained by direct optimization.The optimal parameters obtained for different sample sizes are also close to each other.For latent heat and soil temperature, however, the optimal parameters given by surrogate model optimization and direct optimization are significantly different.The discrepancy between the results with different sample sizes is also significant, comparing to the previous four outputs.(3) Surprisingly, for four of the outputs, namely, some variables (e.g., sensible heat, upward long-wave radiation, net radiation and soil moisture), sample size does not have significant influence on the optimization results.As shown in Table 3, even a surrogate model constructed with 50 samples is similar to the one constructed with 2000 samples and with the direct optimization.For soil temperature, 200 samples are sufficient, and for latent heat, more than 400 samples are enough.Interestingly, the LH50's optimization result for sensible heat is even smaller than that of LH2000.This is because LH sampling is random and the LH50 sampling may have produced a sample point very close to the global optimum, while the best sample point of the LH2000 sampling may be further away from the global optimum.Consequently, the number of samples required for surrogatebased optimization varies for different outputs because of the randomness of sampling designs, and the complexity of response surfaces.A more complex surface needs more sample points to build an effective surrogate model, compared to a simple surface.Even so, this result is very encouraging in that with the help of surrogate models we can possibly reduce the number of model runs required by optimization down to hundreds of times.(4) The number of original model runs that SCE takes before convergence is also listed in Table 3.The result indicated that SCE can get better, or similar optimal NRMSE, but the number of model runs is larger than that using the surrogate model.If the original dynamic model costs too much CPU time to run, surrogate-based optimization can be more efficient than the SCE.(5) Different output variables require different optimal parameters, indicating the necessity of multi-objective optimization.For example, P6, the Clapp and Hornberger "b" parameter, is sensitive to many outputs.For sensible heat, latent heat and soil moisture, the optimal value of P6 is high, while for upward long-wave radiation, net radiation and soil temperature, the optimal value of P6 is low.In order to balance the performance of all output variables, it is necessary to choose a compromised value for P6.Multi-objective optimization is an approach that can provide such a compromised optimal parameter that balances the requirements of many output variables.

Multi-objective optimization
The results of single-objective optimization from the previous section have highlighted the necessity for multi-objective optimization.Many multi-objective optimization methods have been proposed and validated in numerous studies (e.g., Boyle et al., 2000;Boyle, 2000;Gupta et al., 1998Gupta et al., , 1999;;Yapo et al., 1998;Vrugt et al., 2003b;Bastidas et al., 1999;Leplastrier et al., 2002;Pollacco et al., 2013;Xia et al., 2002), but in the context of this research, we need a method that can satisfy the following constrains: (1) the method  (2) for practical reasons, it should provide a single best parameter set instead of a full Pareto optimum set with many non-dominated parameter sets.The Pareto optimal set usually contains hundreds of points, but for large complex dynamic models such as regional or global LSMs, it is generally impractical, and also unnecessary to run the model in an ensemble mode with hundreds of model runs.For regional or global LSMs coupled with atmospheric models, providing only one parameter set that has good simulation ability for most outputs is a more economical and convenient choice.
In multi-objective optimization, there have been many methods that can transform multiple objectives to single objective.Among them, the weighting function-based method is the most intuitive and widely used one.In this paper, we assign higher weights to the outputs with larger errors.In the research of Liu et al. (2005), the RMSE of each outputs were normalized by the RMSE of the default parameter set, and each normalized RMSE was assigned equal weights.Van Griensven and Meixner (2007) developed a weighting system based on Bayesian statistics to define "high-probability regions" that can give "good" results for multiple outputs.However, both of Liu et al. (2005) and van Griensven and Meixner (2007) tended to assign higher weights to the outputs with lower RMSE, and lower weights to the outputs with higher RMSE.This tendency, although reasonable in the probability meaning, conflicts with our intuitive motivations that we want to emphasis on the poorly simulated outputs with large RMSE.Jackson et al. (2003) assumed Gaussian error in the data and model so that the outputs were in a joint Gaussian distribution, and the multi-objective "cost function" was defined on the joint Gaussian distribution of multiple outputs.In Gupta et al. (1998), a multiple weighting function method is proposed to fully describe the Pareto frontier, if the frontier is convex and model simulation is cheap enough.If one outputs is more important than others, a higher weight should be assigned to it.Marler and Arora (2010) reviewed the applications, conceptual significance and pit-falls of weighting function-based optimal methods, and gave some suggestions to avoid blind use of it.
In this study, we use three weighting functions to convert the multi-objective optimization into a single-objective optimization.Case 1: assigning more weight if the output is simulated more poorly as compared to the other outputs.The summed up objectives should have the same unit, so we use NRMSE as the objective function.The weighting function is in which the NRMSE i is the normalized root mean squared error of each output variable that is defined in Eq. ( 1), w i is the weight of each output and n i=1 w i = 1.Table 4 shows the RMSE and NRMSE of CoLM using default parameterization scheme, and the weight of each output is proportional to the NRMSE.Case 2: Liu et al. (2005) normalized the RMSE of each output with the RMSE of simulation result given by default parameters.The weighting function is and assign equal weights to each normalized output.Case 3: van Griensven and Meixner (2007) defined the global optimization criterion (GOC) based on Bayesian theory for multi-objective optimization.If the number of observations of each output are the same, the GOC is defined as where is the squared error, and SE i,min is the squared error of optimal solution.SE i,min is dynamically updated during the optimization procedure.In order to use the information offered by the surrogate model more effectively, we developed an adaptive surrogate modeling-based optimization method called ASMO (Wang et al., 2014).The major steps of ASMO are as follows: (1) construct a surrogate model with initial samples, and find the optimal parameter of the surrogate model.( 2) Run the original model with this optimal parameter and get a new sample.(3) Add the new sample to the sample set and construct a new surrogate model, and then go back to the 1st step.The effectiveness and efficiency of ASMO have been validated in Wang et al. (2014) using 6D Hartman function and a simple hydrologic model SAC-SMA.As shown in the comparison between ASMO and SCE-UA, ASMO is more efficient in that it can converge with less model runs, while SCE-UA is more effective in that it can get closer to the true global optimal parameter.So making a choice between ASMO and SCE-UA is a "cost-benefit" trade-off: if the model is very cheap to run, SCE-UA is preferred because it is more effective to find the global optimum; while if the model is very expensive to run, ASMO is preferred because it can find a fairly good parameter within a limited number of model runs.Such parameter set can provide only the approximate global optimum, but this approach is much cheaper than using traditional approaches such as SCE-UA.
We carried out multi-objective optimization with ASMO using weighting functions defined in Eqs. ( 2), ( 3) and ( 4).The optimization results are shown in Table 5.The RMSEs of each case were compared with that given by the default parameterization scheme, and the relative improvements were calculated.Obviously, for all three cases, all of the six outputs were significantly improved except soil temperature.All three cases sacrificed the performance of soil temperature, but case 2 (Liu et al., 2005) decreased the least (only 0.78 %), case 3 (van Griensven and Meixner, 2007) decreased the most, and the case 1 (weights proportional to NRMSE) is the moderate one.The results indicated that all three types of weighting functions can balance the conflicting requirements of different objectives and effectively give an optimal parameter set with ASMO algorithm.In the following studies, we only involve the moderate case (case 1).
To demonstrate the effectiveness and efficiency of surrogate-based optimization, we also carried out the direct optimization using SCE-UA.The weighting function adopted was Eq. ( 2), and the optimization results are shown in Figs. 4 and 5. Figure 4 presents the default parameter, the optimal parameter given by ASMO and that given by SCE-UA.Figure 5 shows the improvements given by ASMO and SCE-UA comparing to the default parameters.From Fig. 5 we find that all of the outputs have improved nearly 10 % except soil temperature, and the improvements made by ASMO are similar to that by SCE-UA.The results indicated that multi-objective optimization can indeed enhance the performance of CoLM using either the ASMO or SCE-UA method.The ASMO method converged after 11 iterations, namely, the total number of model runs is 411, while the number of model runs for SCE-UA is 1000, which is the maximum model runs set for SCE-UA.Obviously, ASMO is a more efficient method compared to SCE-UA in this case.
We also used the Taylor diagram (Taylor, 2001) to compare the simulation results for the calibration period and the validation period (see Figs. 6 and 7).The optimization results given by SCE-UA and ASMO using Eq. ( 2) as weighting function are compared against the performance of default parameterization scheme.Since only 2 years of observation data for the six output variables are available, we use the first year's (2008) data as the warm-up period, the second year's (2009) data as calibration period, and then use the previous year's (2008) data as the validation period.The missing records have been removed from the comparison.
As indicated in Fig. 6, the performance of optimized parameters given by both SCE-UA and ASMO (Case C and   D in the Taylor diagram) are better than the default parameterization scheme (Case B) except soil temperature.Even though soil temperature simulation is degraded, the correlation coefficients given by all three cases are higher than 0.9, indicating that this imperfection will not cause significant inconsistency in the land surface modeling.In Fig. 7, the performance of the validation period is shown quite similar to that in the calibration period, indicating that the optimal parameters are well identified and the over-fitting problem is avoided.
The four energy fluxes (sensible/latent heat, upward longwave radiation, net radiation) and soil surface temperature have very good performance.However, the performance of soil moisture seems unsatisfactory.The correlation coefficient of soil moisture of Case B (default parameter) is less than 0, while with the help of SCE-UA and ASMO optimization the correlation coefficient is only slightly larger than 0. The possible reasons might be as follows: (1) the default soil parameters of CoLM is derived from the soil texture in the 17-category Food and Agricultural Organization State Soil Geographic (FAO-STATSGO) soil data set (Ji and Dai, 2010), which provides the top-layer (30 cm) and bottomlayer (30-100 cm) global soil textures and has a 30 s resolution.The resolution and accuracy of this data set may not be accurate enough in A'rou station, where frequent freezing and thawing occur.A finer soil database, such as "the Soil Database of China for Land Surface Modeling" (Shangguan et al., 2013), or an in situ field survey for soil texture, should be used to improve the quality of the default parameterization scheme; (2) simulating freezing/thawing processes is a challenging task in land surface modeling, and we still have insufficient knowledge about the details of the physical processes.Parameter optimization can improve the model performance if the model physics are correct, but is helpless if the model structure is inconsistent with the true underlying physical processes.Although CoLM's performance of simulating frozen soil and snow cover has been evaluated in the experiment in Valdai, Russia (Dai et al., 2003), the situation of Heihe in China can be very different.For instance, in CoLM the soil depth is set to 2.86 m globally, but actually the soil depth varies in different places.Fundamentally such a simplification may not introduce significant error to the simulation of energy flux, but it definitely influences the performance of hydrological processes such as soil moisture.Further, the altitude of Heihe is much higher than Valdai, and the influence of human activities is also significantly different.All these reasons can potentially influence the perfor-  mance of CoLM and cannot be mitigated by parameter optimization.
In the optimization results, five of the outputs were improved but only soil temperature became worse.In multiobjective optimization, a compromise is necessary.In this case study, soil temperature requires small P6 and large 36, which conflict with all other outputs.Consequently, improv-ing every output is impossible and some outputs might be sacrificed.If the cost is affordable and the gain is big enough, such compromise might be worthwhile.In this case study, the smallest weight was assigned to soil temperature.In the optimal solution, the RMSE of soil temperature increases from 2.66 to 2.90

Discussion and conclusions
We have carried out multi-objective parameter optimization for a LSM (CoLM) at the Heihe river basin.Although there have been other studies, such as multi-objective calibration of hydrological models (Gupta et al., 1998;Vrugt et al., 2003b), LSMs (Gupta et al., 1999), single column landatmosphere coupled models (Liu et al., 2005), and soilvegetation-atmosphere transfer (SVAT) models (Pollacco et al., 2013), the novel contribution of this research lies in the significant reduction of model runs.In previous research, a typical multi-objective optimization needs ∼10 5 -10 6 or even more model runs.For large complex dynamic models which are very expensive to run, parameter optimization is impractical because of lack of computational resources.In this research, we managed to achieve a multi-objective optimal parameter set with only 411 model runs.The performance of the optimal parameter set is similar with the one obtained by SCE-UA method using more than 1000 model runs.Such a result indicates that the proposed framework in this paper is able to provide optimal parameters more efficiently.In future work, we are going to extend the uncertainty quantification framework to other large complex dynamic models, such as regional-scale LSMs, atmospheric models and climate models.We will look into testing the scalability of the screening, surrogate modeling and optimization techniques on more complex models with more adjustable parameters.We will also investigate the influence of uniformity and stochasticity of initial sampling points, and compare the suitability of different sampling methods.In addition to examining the main and total effects of the parameters, we will also evaluate the interactions among parameters.We will continue to improve the effectiveness, efficiency, flexibility and robustness of GPR approach for surrogate modeling, and test with more complex models.Since weighting function-based multi-objective optimization methods are simple, intuitive and effective, an inter-comparison of different weighting systems can be an interesting topic worthy of further research.Further, we intend to investigate ways to identify Pareto optimal parameter sets using a surrogate-based optimization approach.Discussion and collaborations are warmly welcomed on this and ongoing works.The computer code used in this study is available from the first author, which going to be published as part of the "UQlab" software package in the future.and ν = 5/2, as follows: In this paper, a value of ν = 5/2 was used.
To adaptively determine the values of hyper-parameters l and σ n , we use maximum marginal likelihood method.Because of the properties of Gaussian distribution, the logmarginal likelihood can be easily obtained as follows: where K = K (X, X).In the training process of GPR, we used the SCE-UA optimization method (Duan et al., 1993) to find the best l and σ n .

A3 Random Forest
Random Forest (Breiman, 2001) is a combination of Classification and Regression Trees (CART) (Breiman et al., 1984).Generally speaking, tree-based methods split the feature space into a set of rectangles and fit the samples in each rectangle with a class label (for classification problems) or a constant value (for regression problems).In this paper only the regression tree was discussed.Suppose x = (x 1 , x 2 , . .., x n ) is the n-dimensional input feature vector and y is the output response, the regression tree can be expressed as follows: where M is the total number of rectangles, m is the index of rectangle, R m is its corresponding region and c m is a constant value equals to the mean value of y in region R m .To effectively and efficiently find the best binary partition, a greedy algorithm is used to determine the feature to split and the location of split point.This greedy algorithm can be very fast especially for a large data set.Because of the major disadvantages of a single tree, such as over fitting, lack of smoothness and high variance, many improved methods have been proposed, such as MARS and Random Forests.A Random Forest constructs many trees using randomly selected outputs and features, and synthesizes the outputs of all the trees to obtain the prediction result.A Random Forest only has two parameters: the total number of trees t, and the selected feature number m.To construct a Random Forest one observes the following steps: 1. Bootstrap aggregating (bagging): from total N samples (x i , y i ) i = 1, 2, . .., N , randomly select one point at one time with replacement, and replicate N times to get a resample set containing N points.This set is called a bootstrap replication.We need t bootstrap replications for each tree.
2. Tree construction: for each splitting of each tree, randomly select m features from the total M, and select the best fitting feature among the m to split.The m selected features should be replaced in every splitting step.
3. The prediction result of a Random Forest is given by averaging the output of t trees.
Random Forests have outstanding performance for very high-dimensional problems, such as medical diagnosis and document retrieval.Such problems usually have hundreds or thousands of input variables (features), but each feature provides only a little information.A single classification or regression model usually has very poor skill that is only slightly better than random prediction.However, by combining many trees trained with random features, a Random Forest can give improved accuracy.For big-data problems that have more than 100 input features and more than one million training samples, Random Forests become the only choice because of its outstanding efficiency and effectiveness.

A4 Support vector machine
A support vector machine is an appealing machine learning method for classification and regression problems depending on the statistical learning theory (Vapnik, 1998(Vapnik, , 2002)).The SVM method can avoid the over-fitting problem because it employs the structural risk minimization principle.It is also efficient for big data because of its scarcity.A brief introduction to support vector (SV) regression is presented below.
The aim of SVM is to find a function f (x) that can fit the output y with minimum risk given a N point training set (x i , y i ) i = 1, 2, . .., N .Take a simple linear regression model for example, the function f (x) can be where w is the weighting vector and x is the n-dimensional input feature vector.This function is actually determined by a small subset of training samples called support vectors (SVs).Nonlinear problems can be transferred to linear problems by applying a nonlinear mapping from low-dimensional input space to some high-dimensional feature space: where φ (x) is the mapping function.The inner product of the mapping function is called the kernel function: K x, x = φ(x) T φ x and this method is called kernel method.The commonly used kernel functions are linear kernel function, polynomial, sigmoid and the RBF.In this paper we use RBF kernel: where x − x is the Euclidian distance between x and x , and γ is a user defined parameter that controls the smoothness of f (x).
To qualify the "risk" of function f (x), a loss function is defined as follows: The loss function means regression errors less than tolerance ε are not penalized.The penalty-free zone is also called εtube or ε-boundary.As explained in statistical learning theory (Vapnik, 1998), the innovative loss function is the key point that SVM can balance empirical risk (risk of large error in the training set) and structure risk (risk of an over-complex model, or over fitting).The problem of simultaneously minimizing both empirical risk (represented by regression error) and structure risk (represented by the width of ε-tube) can be written as a quadratic optimization problem: where K is the kernel function matrix with K ij = K(x i , x j ).
Solving the dual problem and we can get the predictive function: where the vectors (α * − α) are the SVs.

A5 Artificial neural network
Artificial neural network (Jain et al., 1996) is time-honored machine learning method comparing to the former four.It is a data-driven process that can solve complex nonlinear relationships between input and output data.A neural network is constructed by many interconnected neurons.Each neuron can be mathematically described as a linear weighing function and a nonlinear activation function: w ij x j , (A21) where x j is the j th input variable, w ij is the weight and I i is the weighted sum of the ith neuron.The output of the ith neuron f i (I ) is given by the nonlinear activation function of the weighted sum input.Here we use Sigmoid function.Minsky and Papert (1969) showed that single layer neural network can only solve linear problems.Cybenko (1989) extended ANN to multiple layers and demonstrated that multilayer ANN can infinitely approximate any nonlinear function (the universal approximation theorem).The training procedure of ANN is optimizing the value of weights.There are many training methods for ANN and we use the Levenberg-Marquardt (LM) (Marquardt, 1963)

Figure 1 .Figure 2 .
Figure 1.Inter-comparison of five surrogate modeling methods: error of training set.

Figure 4 .
Figure 4. Optimal value of CoLM given by multi-objective optimization (comparing default parameter, optimal parameter given by ASMO and SCE-UA).

Figure 5 .
Figure 5. Comparing the improvements given by ASMO and SCE.

Figure 7 .
Figure 7. Taylor diagram of simulated fluxes during validation period (here we use the warm-up period as validation period, 1 January 2008 to 31 December 2008).

Table 1 .
Adjustable parameters and their categories, meanings and ranges.

W. Gong et al.: Multi-objective parameter optimization of common land model tion
located at the upstream region of the Heihe river basin.Its geographic coordinate is 100 • 28 E, 38 • 03 N, altitude is 3032.8m above sea level and the land cover type is alpine steppe.

Table 2 .
(McKay et al., 1979)of CoLM in A'rou Station(Li et.al.,  2013).ANN).A brief introduction of these methods is provided in the Appendix.To build a surrogate, we need to choose a sampling method first.The sampling method used in this study is the Latin hypercube (LH) Sampling(McKay et al., 1979).The sample sizes are set to50, 100, 200, 400, 800, 1200, Li et al. (2013)ter-comparison results are shown in Figs.1 and 2, in which the x axis is the sample size, and y axis is the NRMSE (i.e., the ratio of the root mean square error (RMSE) of the simulation model and the surrogate model).Figure1shows the error of the training set, namely, the NRMSE between the outputs predicted by the surrogate model and the outputs of the training samples, and Fig.2shows the NRMSE of the testing set.Since every sample set of each size was independently generated, we use the 2000 points set to test 50, 100, 200, 400, 800 and 1200 points set, and use the 1200 one to test the 2000 one.For each output variable, we only construct surrogate models for the most sensitive parameters based on the screening results obtained byLi (2012)andLi et al. (2013)(see Table

Table 3 .
The NRMSE between simulated and observed outputs after single-objective optimization.

Table 4 .
Weights assigned to each output variables (weighting system case 1).

Table 5 .
Inter-comparison of different weighting systems.

Hydrol. Earth Syst. Sci., 19, 2409-2425, 2015 www.hydrol-earth-syst-sci.net/19/2409/2015/ rifice
• C (only 0.24 • C larger), but other outputs in RMSE can all be improved by about 10 %.We think the sac-of soil temperature is worthwhile because a negligible degradation of one output can lead to a significant improvement of all other outputs.