Multi-variable evaluation of land surface processes in forced and coupled modes reveals new error sources to the simulated water cycle in the IPSL climate model

. Evaluating land surface models (LSMs) using available observations is important to understand the potential and limitations of current Earth system models in simulating water- and carbon-related variables. To reveal the error sources of a land surface model (LSM), fiveour essential climate variables have been evaluated in this paper (i.e., surface soil moisture, evapotranspiration, leaf area index, and 25 surface albedo, and precipitation) via simulations with IPSL LSM ORCHIDEE (Organizing Carbon and Hydrology in Dynamic Ecosystems), particularly focusing on the difference between (i) forced simulations with atmospheric forcing data (WATCH-Forcing-DATA-ERA-Interim: WFDEI) and (ii) coupled simulations with the IPSL atmospheric general circulation model. Results from statistical evaluation using satellite- and ground-based reference data show that ORCHIDEE is well equipped to 30 represent spatiotemporal patterns of all variables in general. However, further analysis against various landscape/meteorological factors (e.g., plant functional type, slope, precipitation, and irrigation) suggests potential uncertainty relating to freezing/snowmelt, temperate plant phenology, irrigation, as well as contrasted responses between forced and coupled mode simulations. The biases in the simulated variables are amplified in coupled mode via surface–atmosphere interactions, indicating a strong link between 35 irrigation–precipitation and a relatively complex link between precipitation–evapotranspiration that reflects the hydrometeorological regime of the region (energy-limited or water-limited) and snow- albedo feedback in mountainous and boreal regions. The different results between forced and coupled modes imply the importance of model evaluation under both modes to isolate potential sources of uncertainty in the model. 40 land investigate

The Global Climate Observing System (GCOS, 2010) designates the fiveour selected variables as being essential climate variables (ECVs), thereby allowing us to take advantage of recent progress in their global -scale observation. Using satellite data, researchers have developed various retrieval algorithms to acquire SSM (Jackson et al., 1999;Wigneron et al., 2007Wigneron et al., , 2017, ET (Zhang et al., 2010;Miralles et al., 90 2011;Zeng et al., 2014), LAI (Zhu et al., 2013), and albedo (Schaaf et al., 2002;Qu et al., 2014), and precipitration (Adler et al., 2003), which can be used as reference data for LSM evaluation. Empirical upscaling products from global in situ observations (Jung et al., 2011(Jung et al., , 2019 can also be used. The selected variables are particularly interesting for land surface processes: SSM is a recognized driver of surface-atmosphere interactions (Seneviratne et al., 2010), constraining the partitioning of sensible/latent 95 heat and plant activity and determining ET and vegetation dynamics (e.g., Gu et al., 2006). ET affects atmospheric humidity (usually described by the vapor pressure deficit) and cloud formation, creating feedback systems among SSM, ET, and precipitation (Yang et al., 2018). Accounting for long-term vegetation dynamics, which can be measured by LAI, interlinked with such hydrological processes, is important in monitoring carbon cycle and ecosystem services that are related to climate change (IPCC, 100 2014) and natural disasters (Adikari and Noro 2010). Another important parameter in the surface energy exchange is the surface albedo, which controls the reflection of incident solar radiation and is interlinked with hydrological processes (especially through surface snow cover) and vegetation dynamics (Bonan, 2008).
To investigate the potential sources of model uncertainty, we considered various landscape factors 105 ("factor analysis") in addition to the traditional statistical evaluation. This work aims at increasing knowledge about the features and limitations of ORCHIDEE and is a practical example of in-depth model evaluation focusing on the differences between forced and coupled modes. The remainder of this paper is organized as follows. Section 2 describes the simulation setting, the reference datasets, and the factor analysis. Section 3 presents results for the spatiotemporal patterns of the model uncertainties and factor 110 analysis. Finally, Sections 4 and 5 provide a discussion and conclusions, respectively. Earth System model (ESM). This global process-based model of the land surface describes the complex links between the terrestrial biosphere and the water and the energy and carbon exchanges between the land surface and the atmosphere (Krinner et al., 2005). The used version in the IPSL-CM6 ESM for the CMIP6 simulations (Boucher et al., 2020), which is known as tag 2.0, was previously described in many papers (Raoult et al., 2019;Boucher et al., 2020;Cheruy et al., 2020;Tafasca et al., 2020), and we only 120 summarize its main features in this paper, with some details on the related parametrizations to the fiveour studied ECVs.
The land cover is described with 15 plant functional types (PFTs), including one for bare soil, as seen in the full list in Table 2, and they can all coexist in each grid-cell, where the taken fractions taken here are from the CMIP6 datasets (Boucher et al., 2020). For each PFT, the transpiration serves as a 125 coupling flux between the water, energy budget, and photosynthesis process, which drive the evolution of the biomass and LAI owing to generic equations with PFT-specific parameters (Krinner et al., 2005).
Evapotranspiration (ET) is controlled by the energy and water budget via a bulk aerodynamic approach, where four parallel fluxes are distinguished: sublimation, interception loss, soil evaporation, and transpiration. In each grid-cell, the first two fluxes proceed at a potential rate from the grid-cell fractions 130 with snow and canopy water, respectively. The soil evaporation and transpiration originate from the complementary snow-free fractions covered by bare soil and vegetation, which depend on LAI, where the effectively foliage-covered fraction by foliage exponentially increases with the LAI withhaving a coefficient of 0.5, while the light extinction is controlled through the canopy, hence the photosynthesis process. The two fluxes both depend on the soil moisture, where the transpiration is limited by the 135 stomatal resistance, as it increased when the soil moisture dropped from the field capacity to the wilting point. The soil evaporation is not limited by the resistance but only by upward capillary fluxes, which control the soil propensity to meet the evaporation demand.
The soil moisture (SM) dynamics are described over a soil depth of 2 m and discretized into 11 soil layers to solve the Richards equation. SM in the top 10-centimeters is regarded as SSM. The hydraulic 140 conductivity and retention properties depend on the SM owing to the Van Genuchten-Mulaem equations, with the parameters depending on the soil texture (Tafasca et al., 2020), and they are read from the map of Zobler (1986). The infiltration is limited by the surface hydraulic conductivity, and it is calculated with a time splitting procedure inspired by the Green-Ampt equation, where a sharp piston-like wetting front is assumed (d' Orgeval et al., 2008;Vereecken et al., 2019). The surface runoff is made of non-infiltrated 145 water (infiltration-excess runoff); however, ponding is allowed in flat areas, and it can reinfiltrate at later time steps. This so-called reinfiltration fraction linearly decreases from 1 to 0 in totally flat grid-cells1 in totally flat grid-cells to 0, where the mean grid-cell slope exceeds 0.5% (Ducharne et al., in prep). For CMIP6, the ORCHIDEE does not include the irrigation effect on the soil moistureSM, ET, and vegetation growth, although the model can simulate this anthropogenic pressure (Xi et al., 2018). 150 The snow processes are described by a 3-layer scheme of intermediate complexity (Wang et al., 2013), in which the snow albedo and insulating properties depend on the snow density and age. The ORCHIDEE 2.0 also includes a revised parameterization of the interplay between the vegetation and the snow albedo, and the optimized parameters match the remote sensing albedo data from the Moderateresolution Imaging Spectroradiometer (MODIS) sensor, distinguishing the visible and near-infrared 155 (NIR) bands (Boucher et al., 2020;Peylin et al., in prep). For the calculation of the heat diffusion, which includes the soil freezing effects (permafrost), the soil is extended to 90 m, and the moisture content of the deepest hydrological layer is extrapolated to the entire profile between 2 and 90 m. The thermal soil properties depend on the soil texture, moisture, and carbon content . 160

Forced and coupled simulations
To separate the errors caused by the ORCHIDEE model structure/parameterization from the ones resulting from the simulated climate through land--atmosphere coupling, we compared a forced and a coupled simulation. In the coupled simulation, the ORCHIDEE LSM is coupled to the LMDZ6A atmospheric GCM (Hourdin et al., 2020), as embedded in the IPSL-CM6 ESM for the CMIP6 simulations 165 (Boucher et al., 2020;Cheruy et al., 2020). The only difference between the atmospheric physics used in this paper and the one used for CMIP6 concern the parameterization of deep and shallow convection and their interaction to improve the description of the intertropical convergence zone and the El Nino Southern Oscillation. The description of these differences and their impact on precipitation and other variables controlling the near surface climate can be found in Mignot et al., (2021). 170 The coupled simulation was run over 1985-2014 (following a 5-yr spin up) using a 'nudging' approach to constrain the large-scale atmosphere dynamics toward the synoptic atmospheric conditions (Cheruy et al., 2013). To this end, the simulated wind fields (zonal and meridional wind components) are relaxed toward the ERA-Interim winds (Dee et al., 2011) by adding a correction term in the evolution equation for the wind. By reducing the internal variability, this method allows the direct comparison of 175 the observations and simulations, and it was successfully used for evaluating the coupled land-atmosphere parameterizations (Cheruy et al., 2013;Wang, et al., 2016), including in with the IPSL-CM6 ESM (Cheruy et al., 20200).
In the forced simulation, which covers 1979-2009, the required near-surface meteorological data by the ORCHIDEE LSM (liquid and solid precipitation, incoming longwave and shortwave radiation, 2-180 m air temperature and specific humidity, 10-m wind speed, surface pressure) are prescribed from the downscaled and bias-corrected reanalysis data [WATCH-Forcing-DATA-ERA-Interim (WFDEI)], provided at the 0.5° resolution with a 3-hourly time step (Weedon et al., 2011;Weedon et al., 2014).
Precipitation is bias-corrected using monthly data from the Global Precipitation Climatology Centre (GPCC Version V6, Schneider et al., 2014), with a specific correction of undercatch errors following 185 Adam and Lettenmaier (2003) and the simulation covers 1979-2009. The spatial resolution differs between the two simulations, reflecting the grid of the atmospheric data: the coupled simulation has a coarser resolution (144 × 142, corresponding roughly to 2.5° in longitude and 1.25° in latitude) than that of the forced simulation (0.5° grid). To make the evaluation consistent and simple, we used the same spatial resolution for our analyses, and we oversampled the 190 LMDz grid mesh to the finer resolution (0.5°) such as to keep as much spatial information as possible from the high-resolution offline grid mesh. To investigate variability patterns on seasonal to interannual scales, all the data were aggregated into monthly time steps. Fiveour interlinked variables (SSM, ET, LAI, and albedo, and precipitation) were considered in this evaluation, and the study region was 60°S-90°N, 180°W-180°E (i.e., Antarctica and Greenland were excluded). 195

Surface soil moisture
The SSM product provided by European Space Agency Climate Change Initiative (ESA CCI) (Liu et al., 2012) was used as a reference. It is a merged product comprising multiple SSM data derived from various passive and active microwave satellites (i.e., SMMR, SSM/I, TMI, AMSR-E, Windsat, SMOS, AMSR2, 200 AMI-WS, ASCAT-A, and ASCAT-B) providing a long-term (19789-20185) SSM dataset with 0.25° resolution. The CCI-SSM product has been evaluated extensively against in situ observations (e.g., Al-Yaari et al., 2019b), and their accuracy has been reported as being relatively high compared to that of other existing products such as SMOS-L3, LPRM, and AMSR2 (Ma et al., 2019).
Because it includes low-quality data flags for snow, dense vegetation, and radio-frequency 205 interference (RFI) (Oliva et al., 2012), we applied data screening following Al-Yaari et al. (2016). We screened out all the pixels where the provided uncertainty was larger than 0.06 m 3 /m 3 (volumetric water content). Next, any data records in which the SSM was not in a valid range (either >0.6 or <0.0) (Fernandez-Moran et al., 2017;Dorigo et al., 2013) were excluded. Finally, to exclude any areas covered by snow or dense vegetation and other unreliable regions, we kept only those areas in which the quality 210 flag was zero (fine-quality pixels). The screened dataset was then aggregated into 0.5° × 0.5° and monthly time steps. This screening process removed 3.6% of all the original pixels.
We performed an initial check on the time series of the global average of CCI-SSM and found an artificial trend therein that depended on the availability of the observation data ( Supplementary Fig. S65).
As reported by other researchers (e.g., Loew et al., 2013), this artificial trend could lead to 215 misinterpretation of long-term signals. To mitigate such artificial trends and initialization errors of each data, we selected a stable period (i.e., without discontinuous jump in time series) during 1993-1999 for comparison with both the forced and coupled simulations. Because of the differing natures of LSMsimulated and observed SSM (e.g., dependence on meteorological forcing data/atmospheric model, model parameterization), their absolute SSM [m 3 /m 3 ] values (i.e., magnitudes) are not comparable (Reichle et al., 220 2004). In addition, since the CCI-SSM product is scaled by the comparison with a different LSM (GLDAS-Noah), a direct comparison between CCI-SSM and ORCHIDEE may lead to misleading results, as they have different soil representation (Raoult et al., 2019). Given this issue, the LSM-and satellitebased SSM were compared with statistically normalized values rather than absolute values of SSM (e.g., Polcher et al., 2016). Therefore, a spatiotemporal normalization (Equation 1) was applied to each co-225 masked dataset to eliminate systematic biases among the datasets and make the comparison reliable ( Supplementary Fig. S75B): where SSMnorm is the normalized SSM, and SSM and σ SSM are the mean and standard deviation, respectively, of all the available SSM sampled along spatial and temporal dimension during the period. 230

ET
In a preliminary study, we compared a ground-based machine-learning ET product (Jung et al., 2011;, three remote-sensing-based physical model products (Miralles et al., 2011;Zhang et al., 2010;Zeng et al., 2014), and their ensemble (see Supplementary Figs. S2, S7). We found that they showed 235 similar spatiotemporal structures although they differed in absolute values in some regions, and the ground-based product was the most consistent with the ensemble. Therefore, we decided to use the ground-based ET (mm/d) as a representative, from 1987 to 2009. It is also advantageous in that it is derived from upscaling of FLUXNET data (Jung et al., 2011; and is independent from specific ETretrieval algorithms. The original spatial resolution (1°) of the data was resampled into 0.5° resolution to 240 match that of forced simulation, and original temporal resolution was monthly time steps. A preliminary check of the time series and spatial patterns of the reference data revealed no artifact patterns (e.g., no abrupt jump in time series as found in CCI-SSM), so we used them with no pixel screening or normalization.

LAI 245
We used the global LAI dataset of Zhu et al. (2013), referred to hereinafter as LAI3g, which is based on a neural-network algorithm in conjunction with the third-generation Global Inventory Modeling and Mapping Studies (GIMMS 3g) and Moderate-resolution Imaging Spectroradiometer (MODIS) LAI product, with an original spatial resolution was 0.5° and half-monthly temporal resolution. Considering the common period among LAI3g and the coupled/forced simulations, we selected 1987-2009 as the 250 comparison period, and we resampled all data at 0.5° spatial resolution and aggregated them into monthly time steps.

Albedo
We used the MODIS albedo product (Qu et al., 2014) as reference data, which provided the bihemispherical reflectance (white-sky albedo) for the visible and NIR bands. The original 500-m spatial 255 resolution and 16-d temporal resolution were resampled (i.e., upscaled) into 0.5° resolution and monthly time steps. The common period between simulations and observation, 2003-2009, was used for evaluation. The pixels with retrieval failure of albedo were excluded from the analysis.

Precipitation
WIn addition to the four ECVs, we also evaluated the simulated precipitation because it is the primary 260 factor that influences the hydrological variables (Qian et al., 2006;Decharme & Douville, 2006). To this endIn addition, we used the GPCC dataset Version V6 (Schneider et al., 2014), which was also used to constructbias-correct bias in the WFDEI meteorological forcing of the offline ORCHIDEE simulation (section 2.1.2). This gridded product at 0.5° provides monthly precipitation derived from qualitycontrolled observed precipitation from over 65,000 world-wide stations world-wide, and accounts for a 265 climatological correction of undercatch based on Legates and Willmott (1990). (Schneider et al., 2014).
As this is the forcing data, precipitation output in the forced simulation is identical to the reference. Therefore, model evaluation regarding precipitation was only conducted for the coupled simulation.

Data processing 270
For consistency between the observed and simulated data, we subjected the former to aggregation or resampling towards the 0.5° spatial resolution and monthly time steps for each variable, as described above. Due to the presence of data gaps in the reference data sets, which are either because of the acquisition issues or the quality control and data screening, we masked the simulated datasets to match the spatial-temporal data availability of the corresponding reference data. For the SSM, the dense snow 275 regions (with a snow water equivalent exceeding 48 mm) in the simulated data were further excluded so as to avoid unreliable comparisons with uncertain references. Also, co-masking was performed after the spatiotemporal resampling, followed by the statistical normalization (only for the SSM). The resulting coverage of the selected comparison period is summarized in Table 1 for each variable.
After the above-mentioned pre-processing, to compare the spatial patterns of the observed and 280 simulated data, we focus on three accuracy criteria calculated at the 0.5° scale along monthly time steps: the bias, Pearson's correlation coefficient (CC), and root-mean-square error (RMSE). The criteria were calculated along temporal axis for each pixel (i.e., the result was shown as one global map for a criterion).
The statistical significance of the bias (compared to zero) and CC was assessed at each pixel with Student's t-test and Pearson's test, respectively, with a p-value of 5% in both cases. Note that the 285 evaluation periods were different among SSM (1993--1999), ET, LAI and precipitation (1987--2009), and Aalbedo (2003--2009). However, the impact of the chosen period on the evaluation is likely to be limited (see Supplementary Table S1).

Factor analysis
To reveal features of the simulations in detail, the accuracy criteria were evaluated against various 290 landscape/meteorological factors (Figure 1), namely PFT, LAI, irrigation, precipitation, slope, snow, and ET. For each factor, time series were averaged temporally to make only one global map (i.e., the classification criteria were applied on long-term basis). The value of each factor was classified into a specific number of levels (classes), which were used as ordinal scales. Each factor was classified as given in Table 2, and each factor is described in detail below: 295 1) For PFT, we used the input dataset that is used in ORCHIDEE. This includes fractional coverages in each pixel of 15 PFTs. We created a dominant PFT map by picking up the PFT class that have maximum fractional coverage for each pixel.
2) For LAI, we used the LAI3g data (above subsection), classifying them into three levels (see Table 1 for the specific class definitions). 300 3) For irrigation, we used a global map of irrigation areas (Siebert et al., 2010), which indicates the fractional coverage (%) of an irrigated area with 5-arc-min spatial resolution. It was classified into six levels.
4) For precipitation, we used the pluri-annual mean of GPCC during the same period as the investigated ECVs. It was classified into five levels. 305 5) For slope, this classification was done by referring to the ETOPO51 DEM (51 arc-minute global relief model of Earth's surface; NOAA, 1988Amante & Eakins, 2009, which is also used in ORCHIDEE to control reinfiltration of the water. 6) WFor SWE, we used the pluri-annual mean of the forced SWE for the factor analysis of the forced simulation, and that of the coupled SWE for the coupled simulation.corresponding ORCHIDEE SWE, 310 which The SWE was classified into five levels. 7) For ET, we used the pluri-annual mean of Jung et al. (2011; during the same period as the investigated ECVs.
Here, dominant PFTs, irrigation, slope, and SWE were only used for the factor analysis, while LAI, precipitation, and ET are also used validation and come from independent sources. The PFT fractions and 315 SWE, however, were not independent from our simulations, but we assumed it was not problematic for factor analysis, which mostly aims at suggesting process-based explanations to the main model errors.

Spatial and temporal patterns of model errors
Overall, the spatial structures of the ECVs simulated in both modes were consistent with those of the 320 reference productss, as shown by comparing the corresponding pluri-annual mean maps (Supplementary Figures S1-S54; Fig. 1A). To refine this comparison, Figure 2 shows the spatial bias patterns of the fivefour variables (normalized SSM, ET, LAI, and albedo, and precipitation), in both forced and coupled modes, and of the precipitation in coupled mode. Difference between forced and coupled modes are also shown in Supplementary Figure S6. Spatiotemporal averages of bias, RMSE, and correlation coefficients 325 are summarized in Table 3. The spatial patterns of the temporal CC are also shown for SSM and albedo ( Figure 3) for further discussion.
The Pprecipitation bias in the forced mode is very small in the most region since the simulation relies on bias-corrected precipitation (WFDEI), which relies on the GPCC dataset used here as reference data. Yet, it is not everywhere negligible (Figure 2A), and the forced ORCHIDEE precipitation (WFDEI) 330 is higher than GPCC in small tropical pockets, the US Great Plains, and boreal zones, which are prone to precipitation undercatch because of strong winds and/or a large fraction of snowfall (Becker et al., 2013).
The largest precipitation biases in coupled mode, in absolute value, are found in the wettest areas (humid tropics) and mountain ranges ( Figure 2B), which is consistent with the analysis of Cheruy et al. (2020), in terms of bias sign and spatial pattern. 335 Figure 2CA-DC showed that the spatial pattern of normalized SSM bias in forced and coupled modes were consistent and delineated the biased regions clearly. The strong negative biases in normalized SSM was observed over the boreal region (except Eastern Siberia) with high SWE values ( Figure 1JD), suggesting the relation to snow or permafrost. Note that satellite observation uncertainties in such snowy regions could also be a reason for the discrepancy. The farm belt of India and China (with a lot of irrigation 340 in Figure 1HC) exhibit a systematic lower bias in SSM. Apart from those, arid (North Africa, middle of Australia, Nnorth China) and tropical (Congo and Amazon Basin) regions also showed lower correlation ( Figure 3A-C), part of which can be attributed to the inherent feature that CC tends to be low when the range in which the sample varies is narrow. To better identify the error sources in SSM, we plotted the mean seasonal cycles (i.e., monthly climatology) separately for each latitude zone (Figure 4). Substantial 345 parts of the time series were consistent between simulation and observation (except grayed-out period due to insufficient sample size and low reliability of the reference data). The underestimated simulated SSM values compared to the CCI-SSM values in the summer season in 30-60°N ( Figure 4B) may be attributed to anthropogenic water input due to irrigation because this region includes large-scale agricultural fields ( Figure 1HC). In the low-latitude regions, the simulated values tend to underestimate 350 SSM in the dry season, and to show larger seasonal change ( Figure 4C, D).
Most of the areas exhibited small ET biases in absolute value ( Figure 2E, F), suggesting that ORCHIDEE is highly capable of representing global ET. The coupled simulation tended to simulate larger ET values than did the forced simulation, which can be explained to some degree by the larger mean precipitation biases in the coupled simulation than that in the forced simulationone (, as shown by the 355 larger positive bias for the coupled simulation which are positive on average (in Table 3). Regions with large ET biases were distributed in the tropical (Amazon and Congo Basin, the maritime continent), mountainous (the Rockies, Andes, and Himalayas), and agricultural (especially in India) regions.
Mountainous regions tended to be characterized by a positively biased precipitation in the coupled simulation simulation ( Figure 2BC), which caused a positive ET bias of ET in the coupled simulation 360 (especially in North/South America). Tropical regions exhibited complex responses in ET between the coupled and forced simulations. The maritime continent (Indonesia and the other tropical Pacific islands) had negative ET biases for both simulations. Congo and a large part of the Amazon exhibited contrasting patterns between the simulations (the uncoupled one had a negative bias whereas the coupled one had a positive bias). The link between ET and precipitation in the coupled ET simulation and the simulated 365 precipitation was only straightforward in the Congo, i.e.,where the positively biased precipitation (water input) led to the a positive bias of ET. In a part of the maritime continent, the coupled ET was negatively biased despite the a positive bias of precipitation. By contrastConversely, the coupled ET was positively biased in the Amazon despite the a negative bias of precipitation.
Positive bias of LAI was observed in large areas globally ( Figure 2GF, HG). Given the strong 370 similarity between the forced and the coupled bias maps, it is suggested that the bias comes mostly from the surface component, such as PFT maps, or reference data itself. In fact, LAI retrievals by spaceborne sensors like MODIS may be saturated for large values of LAI (Zhao et al., 2016), resulting in underestimation of LAI in reference. Despite such a positive -bias tendency, the boreal region in Eastern Siberia, the shores of the Great Lakes in North America, and the basin of the Mekong River all exhibited 375 negative bias of LAI. In addition, there were hotspots of negatively biased LAI in such regions as the Zambezi River system lying across Angola and Zambia. Contrasting biases between the simulations were observed around the Himalayas.
In most regions, the simulated bias for total albedo was small, and spatial pattern of the bias were generally similar between the forced and the coupled modesa consistent spatial pattern with reference was 380 obtained for total albedo, with showing strong spatial similarity between the forced and the coupled modes ( Figure 2IH, IJ). The largest biases were the overestimation in the mountainous regions (especially the Himalayas in the coupled mode) and the underestimation in the boreal and polar regions, where snow affects the albedo. In addition, simulated and observed albedo were uncorrelated (or negatively correlated) in many regions apart from the boreal one ( Figure 3C-F). Low correlation coefficients in the arid and 385 tropical region can be attributed to the temporal invariance of the land surface. However, even in some temperate and semi-arid zones where temporal variance is likely to be high, low correlation was observed.
In such regions, seasonal changes of the land surface (caused mainly by vegetation phenology and the snowfall/snowmelt cycle) may not be described well in ORCHIDEE. In fact, the global monthly climatology ( Figure 5A, F) showed a global mean overestimated NIR albedo in JJAexcept MAM and 390 underestimated visible albedo in MAM. The main source of the NIR albedo overestimation seemed to be that in the temperate zone (30-60°N; Figure 5C), suggesting overestimated vegetation cover (having high reflectance in NIR spectral region) there from summer to autumnin the summer. There was a systematic overestimation of albedo in the tropical band ( Figure 5D, E, I, J), and a small underestimation in the snowrelated season (winter to spring) of the boreal band ( Figure 5B, G). 395

Factor analysis
The bivariate linear regressions between simulated ECV bias and factors (Table 4), and the boxplots against each factor class (Figures 6-8) firstly reveal a large bias variability within each class, resulting in a large part from the spatial variability of the simulated variables across the various climates and biomes of the globe. However, some controls could be identified despite this variability. It is particularly the case 400 for irrigation, which has an obvious impact on the simulated hydrological variables (SSM, ET, LAI, and precipitation; Figure 6A, C, E, G): both the coupled and forced models show negatively biased values in the largely irrigated areas (classes 5 and 6), except for the forced -mode SSM. This is understandable because the simulations overlook irrigation, which creates artificial water input to the soil, resulting in additional ET and plant growth in reality. Interestingly, the coupled simulation underestimates the 405 observed values more than does the forced one ( Figure 6A, C, E, G), which probably relates to a positive feedback driven by surface-atmosphere coupling (Mahfouf et al., 1995;Liu et al., 2003;Wang, T. et al., 2015). Since the forcing WFDEI precipitation is based (i.e., real-world precipitationon in situ rain gauges which ) integrates the impact of the real-world irrigation, this factor has a relatively weak effect in the forced mode. 410 The contrasting ET -bias pattern between forced and coupled modes in the Congo and the Amazon ( Figure 2ED, EF) was also confirmed in the factor analysis of precipitation (classes 4 and 5, which probably correspond to tropical regions; Figure 6D), PFT (class 2: broadleaf evergreen in Figure 7A), LAI (class 3 in Figure 7B), and ET (class 3 in Figure 8A). This also explains the contrasting correlation sign of ET-bias with P, SSM, ET and LAI in Table 4. 415 The factor analysis confirms the positive bias of LAI in the tropical regions, which are characterized by high precipitation (classes 4 and 5 in Figure 6F), broadleaf evergreen forest (PFT 2 in Figure 7C), high LAI (class 3 in Figure 7D), and high ET (class 4 in Figure 8B). However, some of the positive bias in such tropical regions might be compensated by the negative bias of the simulated precipitation (especially in the Amazon; Figure 2BC, also confirmed by class 3 in Figure 7J), resulting in 420 a smaller bias of LAI in the coupled simulation than that in the forced simulation. Negative LAI bias in the boreal region is also confirmed by the PFT factor analysis (classes 8, 9, and 15 in Figure 7C). Eastern Siberia is the main place with negative LAI biases ( Figure 2G, H). Possible explanations include persistent snowpack reducing the vegetation growing season; underestimated maximum LAI in the model; and errors in the reference LAI product especially at high latitudes, due to the less reliable assessment of solar 425 reflectance from space .
For albedo, the effect appeared in the factor analysis against slope (class 3 in Figure 8C, D) and SWE (classes 4 and 5 in Figure 8F, G) as a discrepancy between the coupled and forced simulations. In the steep regions, the coupled simulation tended to be positively biased because of the precipitation bias.
In the high-SWE region, the negative bias of albedo was enhanced in the forced simulation. This is due 430 to the already mentioned compensation of the positive bias in the mountainous region with the negative bias in the boreal and polar regions ( Figure 2IH, IJ). The NIR albedo in the tropical region (classes 2 and 3 in Figure 7E and class 3 in Figure 7F) tended to be slightly large for both simulations. This is consistent with the positive bias of LAI in such regions ( Figure 2GF, GH), although the range of bias was small.

Discussion 435
In general, the ORCHIDEE simulations show good spatial/temporal consistency with the reference data, except for issues related to external water addition/subtraction and surface-atmosphere coupling. An example of the external source of water input is irrigation. Largely irrigated areas obviously lead to strong negative bias was observed in not only ET, LAI, and precipitation but also SSM over largely irrigated areas. Specifically, a lack of description of the additional water input and man-made vegetation over irrigated agricultural land led to lower SSM and LAI, which in turn led to lower ET. In the coupled simulation, the lower SSM also led to lower humidity and lower precipitation, resulting in enhanced underestimation of SSM in the next time step (i.e., positive feedback). The enhanced SSM 450 underestimation caused enhanced ET underestimation, as well as enhanced LAI underestimation through the parametrizations of carbon assimilation and vegetation phenology. The Uunderestimation of precipitation in the coupled simulation over irrigated areas (e.g., India in Figure 2BC; classes 5 and 6 in Figure 6G) supports the validity of this scheme, and such an emphasizing effect in the coupled model, which is consistent with other reports (Mahfouf et al., 1995;Liu et al., 2003;Wang, T. et al., 2015). The 455 spatial similarity between the bias maps of SSM, ET, and precipitation (Figure 2A-FE) over central-south Africa, Australia, and a large part of south and east Asia also suggests the a strong interlink between them in the coupled mode. A potential interpretation is that precipitation is the first-order control on SSM and ET in the region (i.e., water-limited ET). It also can be interpreted as a result from positive feedback between precipitation, SSM, and ET in such regions, as reported by Yang et al. (2018). This is consistent 460 with the results of Yang et al. (2018), who have reported the positive feedback of SSM-precipitation and the positive correlation of SSM-ET and ET-precipitation in such transient zones (i.e., a climate that is neither extremely dry nor extremely wet).
However, there seem to be other secondary factors need to that should be considered regarding land-atmosphere feedback and their influence on coupled precipitation and ETthe hydrometeorological 465 regime (i.e., energy-limited or water-limited). In particular, ET is not controlled solely by precipitation but also by radiation (Cheruy et al., 2020), and temperature determines the potential ET (Dirmeyer, 2001;Nasonova et al., 2011). The complex response of ET to precipitation presented in the present study suggests the importance of those factors.: in the coupled simulation, the positive precipitation bias in the Congo Basin ( Figure 2C) created a positive ET bias ( Figure 2E) in a straightforward manner. By contrast 470 For example, there may be a negative feedback in the Amazon and the maritime continent between precipitation and ET because these areas are strongly energy-limited (Seneviratne et al., 2010;McVicar et al., 2012) in comparison to the Congo. In the maritime continent, positive precipitation bias meant more cloud coverage than reality, which decreased the available energy and ET. OppositelyIn contrast, the negative precipitation bias in the Amazon meant less cloud coverage, larger available energy, and larger 475 ET than in reality.
Although such feedback explains the overestimated ET in the Congo and the Amazon in the coupled simulation, it does not explain the underestimated ET there in the forced simulation. The A pPotential explanations arereason for it iscould be excessive water stress on ET, insufficient soil water holding capacity, underestimated precipitation or radiation in WFDEI, and/or an overestimation of ET by 480 the Jung product, in the regions of high precipitation in the forced mode, although that is not clear in the coupled mode because of the positive P bias, which could cancel the negative ET bias. In addition, cConversely, too weak water stress in dry areas (either for transpiration or soil evaporation) can also explain the negative correlation between the forced-mode ET bias and the precipitation (Table 4). A solution wcould be to activate a resistance to soil evaporation, increasing with the top soil dryness (Cheruy 485 et al., 2020). Such contrasting results between the forced and coupled modes imply the importance of model evaluation under both modes to isolate the potential error sources.
Compared with the forced mode, the positively biased precipitation simulated by the coupled mode may positively bias the albedo, particularly in the mountainous areas (high slope class in Figure 8C-E) via considerable snow cover. This probably arose from incomplete atmospheric simulation of the local 490 climate (Cheruy et al., 2020) such as an updrift along a mountain slope. Coarse spatial resolution of the atmospheric simulation in the coupled mode can also make it difficult to represent the impact of mountainous topography on local climate (Decharme and Douville, 2006). Theoretically, the overestimation of albedo should decrease the available energy at the surface, thereby decreasing ET and surface temperature. The slight negative bias of ET in the Himalayas ( Figure 2FE) despite the positive 495 bias of precipitation ( Figure 2BC) can be explained by the decrease in available energy due to the increased albedo ( Figure 2JI). Such an ice-albedo interaction in the ORCHIDEE-LMDZ coupled mode has also been reported over the boreal region (Wang, T. et al., 2015; particularly pronounced in spring temperature over Eastern Siberia). Taking the ice-albedo feedback into consideration with the secondary factors (i.e., radiation and temperature) that affect ET, the link between precipitation and ET in the 500 coupled mode is rather complex in the mountainous and boreal regions. Moreover, the deficit of available energy may reduce photosynthesis thus vegetation growth, causing a peaky underestimation of LAI in the Himalayas in the coupled mode ( Figure 2H2G), which is not observed in the forced mode ( Figure 2GF).
Part of the positive biases in normalized SSM in the Eastern Siberia and polar region ( Figure 2AC, BD) may be attributed to freezing/snowmelt and related vegetation phenology, as excessively large 505 or .fast Ssnowmelt that is considerable and/or very fast occurs in the spring in ORCHIDEE ( Figure 5B, G)., leading to overestimated SSM. However, there is likely to another control factors, are likely involved in SSM overestimation, such as wetlands, permafrost, and albedo. The negative albedo bias found in the boreal zone ( Figure 2I, J) in spite of the positive snowfall bias (Figure 2A, B) can also be explained by the excessive snowmelt. This Uunderestimation of albedo in many boreal zones areas, also noted in 510 (Cheruy et al., (2020), was expected to lead to overestimated ET, but it did not lead to an obvious ET bias because of the underestimated LAI ( Figure 2GF, GH). Given the spectral features of land cover (Petty, 2006), the NIR albedo is related largely to an abundance of vegetation, i.e., LAI. Therefore, uncertainty in snow and LAI leads to uncertainty in the surface albedo, which further propagates uncertainty in the energy balance and water cycles. Such a complicated relationship should be treated in the special tuning 515 of ORCHIDEE for high latitudes (Druel et al., 2017;Guimberteau et al., 2018). In addition to highlatitude regions, vegetation seasonality in the temperate zones seemed uncertain. In the temperate forests, the model is likely to simulate spring green-up that is considerable and/or very fast ( Figure 5C), which causes an overestimated NIR albedo and discrepancies in LAI and albedo seasonality.
Regardless of the origin (i.e., satellite, reanalysis, or in situ), observations inevitably contain 520 inherent uncertainty, which leads to uncertainty in the model assessment. SSM retrieval over substantially high/low vegetation, tropical/arid regions, and highly heterogeneous and high-roughness regions remains challenging (Ma et al., 2019). Therefore, some part of the low SSM correlation in arid/tropical regions ( Figure 3A, B) can be attributed to uncertainties in the satellite products in addition to an inherent feature of CC. Snow cover and RFI (Oliva et al., 2012) may also cause uncertainties in satellite-based SSM 525 estimation, although we attempted to remove such uncertain pixels by means of a preliminary quality check. Using multiple data sources (e.g., the Soil Moisture Active Passive (SMAP) product ( [Ma et al., 2019;Al-Yaari et al., 2019bEntekhabi et al., 2010)) as reference for model evaluation (Eyring et al., 2016b) is a promising way to address such uncertainties. A brief attempt with SMOS-IC product (Fernandez-Moran et al., 2017) was shown in Supplementary Fig. S86. Inconsistency between the model-530 simulated SSM depth (up to 10 cm) and the penetration depth of satellite sensors (several centimeters) may also cause uncertainties in the assessment, although using normalized SSM instead of absolute SSM is likely to mitigate the effect to some extent.
The satellite-based LAI product (Zhu et al., 2013) may be affected by the saturation issue of optical satellite data (i.e., MODIS) in regions with high LAI. The snow albedo of the MODIS product 535 (MCD43) has a slightly larger uncertainty (RMSE ≈ 0.07) (Stroeve et al., 2005;Stroeve et al., 2013) than that of the snow-free daily mean albedo (RMSE = 0.034) (Wang, D. et al., 2015). However, this does not alter our conclusion about the ORCHIDEE albedo uncertainty in the snow region, but some of the uncertainty might be attributed to the error in satellite observation.
We depended largely on satellite-derived data for the SSM, LAI, and albedo evaluations. By 540 contrast, we used a FLUXNET-based product (Jung et al., 2019) for the ET evaluation, which has potential uncertainties arising from (i) the statistical upscaling process (model tree ensemble: Jung et al., 2009), (ii) the input data required in machine-learning prediction, and (iii) the heterogeneous distribution of ground stations. Because the latter potential issue is particularly important for hardly accessible regions such as tropical and mountainous areas, progress in the data coverage of the FLUXNET network is 545 desirable. Although ET products derived from satellite data (Miralles et al., 2011;Zhang et al., 2010;Zeng et al., 2014) can also be used, unlike the other variables (SSM, LAI, and albedo), the retrieval of ET is not done directly from the satellite observations but depends largely on the process-oriented models.
Therefore, in addition to the uncertainties in the satellite observations themselves, such products have uncertainties that arise from ancillary data (e.g., atmospheric conditions, land cover) required in the model, 550 as well as from imperfections in the model structure/parameterization (preliminary comparison among the different data sources can be found in Supplementary Fig. S97)..
The difference between the forced ORCHIDEE precipitation (WFDEI) and GPCC (Figure 2A) probably comes from undercatch correction: based on Legates and Willmott (1990) for GPCC, and on Adam and Lettenmeier (2003) for WFDEI. Schneider et al. (2014) acknowledge that for the GPCC 555 product, "the biggest uncertainty issue is the correction of the systematic gauge-measuring error (general undercatch of the true precipitation)", but this is very likely true for all precipitation products.
Note that the present study is based on a specific LSM (i.e., ORCHIDEE 2.0), atmospheric model (i.e., LMDZ6A), and forcing data (WFDEI). Future work should include addressing the uncertainties that arise from the LMDZ model structure/parameterization, as well as the resolution in the numerical 560 simulation (Hourdin et al., 2013). Uncertainties that arise from the atmospheric model have been analyzed for some evaporation and SSM by Cheruy et al. (2020). For China, WEDFI-based simulations have performed better than Princeton Global meteorological Forcing and Climatic Research Unit-National Center for Environmental Prediction with ORCHIDEE . However, because varying the forcing data has a comparable impact to varying the LSM in the forced simulation (Guo et al., 2006), the 565 uncertainty in selecting the forcing data should also be kept in mind. Other future work should be factor analysis against other hydrometeorological parameters such as radiation, temperature, and precipitation frequency (Qian et al., 2006;. We confirmed the selection of study period did not make substantial difference in the result (Table S1) in terms of global mean; however, it should be also checked in the regional scale in future. 570

Conclusions
This paper has presented an in-depth evaluation of fiveour interlinked essential climate variables (namely surface soil moisture, evapotranspiration, leaf area index, and albedo, and precipitation) simulated by ORCHIDEE land surface model under different simulation modes (either forcing by WFDEI or coupled with LMDZ). Statistical evaluation was conducted using various reference-data sources (ESA CCI, 575 upscaled FLUXNET, GIMSS 3g, MODIS products, and GPCC), and factor analysis was conducted against various landscape factors (namely plant functional type, leaf area index, irrigation, precipitation, slope, snow water equivalent, and evapotranspiration). Although ORCHIDEE consistently represented the spatiotemporal patterns of each essential climate variable in general, some issues were found relating to water cycles and their different consequences between the forced and coupled simulations. Errors 580 relating to freezing/snowmelt, artificial water input such as irrigation, and precipitation bias propagated through surface-atmosphere coupling in the coupled mode. The factor analysis revealed a strong link between irrigation and precipitation (that further affected surface soil moisture, evapotranspiration, and leaf area index, particularly in the coupled mode) and a relatively complex link between precipitation and evapotranspiration that reflected the hydrometeorological regime of the region (energy-limited or water-585 limited) and the snow-albedo feedback in mountainous and boreal regions. In addition, the description of vegetation and snow seasonality seemed to be an issue in ORCHIDEE. Considerable Excessive and/or very too fast green-up in temperate forest may lead an overestimation of leaf area index and near infrared albedo. Considerable Excessive and/or very too fast snowmelt in spring in the boreal region may result in the underestimation of albedo in such regions, which can affect energy balance and water cycles. The 590 different results between the forced and coupled modes stress the importance of model evaluation under both modes to determine each potential error source in model simulation.
The other reference data are freely available from the following links. All the links were confirmed to be 610 accessible on 26 January 2021.
-The latest version of ESA CCI SM. https://www.esa-soilmoisture-cci.org/node/145. ESA also provides the previous version (version 4.4 was used in this study) upon request. The original version used in this study (Zhu et al., 2013) can be obtained on request at http://cliveg.bu.edu/modismisr/lai3g-fpar3g.html.

Code availability
The version of the ORCHIDEE model used for this study (revision 4783) is very close of tag 2.0, freely available from http://forge.ipsl.jussieu.fr/orchidee/browser/tags/ORCHIDEE_2_0/ORCHIDEE/ (Peylin et al., 2020). Tag 2.0 is based on revision 4783, with updates regarding the number and format of output 630 variables to comply the CMIP6 requirements, and a few very minor bug corrections regarding the carbon cycle. The code of revision 4783 can be obtained upon request to the corresponding author.

Competing interests 635
The author declare that they have no conflict of interest.

References
Adam, J. C., and Lettenmaier, D. P.: Adjustment of global gridded precipitation for systematic bias. J.
International Peylin, P., Ghattas, J., Cadule, P., Cheruy, F. , Ducharne, A., Guenet, B., Lathière, J., Luyssaert S., Maignan, F., Maugis, P., Ottlé, C., Polcher, J., Viovy, N., Vuichard, N., Bastrikov, V., Guimberteau, M., Lanso, A.-S., MacBean, N., Mcgrath, M., Tafasca, S., and Wang, F Qu, Y., Liu, Q., Liang, S., Wang, L., Liu, N., and Liu, S. Direct-estimation Table 4. Spatial correlation coefficients (SCC) between biases (in forced and coupled modes) and potential explanatory factors. PFT was excluded in the Table because it is nominal scale. Statistically insignificant SCCs appear in italic. SSM tended to be underestimated in high P, SSM, ET, LAI regions for both forced and coupled modes. Between forced and coupled ET, opposite association with P, ET, and LAI was observed. LAI in both modes were positively biased in high P, 995 ET, SSM region (probably corresponding to the tropical region). Albedo and coupled P were strongly associated with slope. Irrigation is likely to bias SSM and ET negatively, and the effect was more enhanced in the coupled mode.  Table S1. Pluri-annual land averages (excluding Greenland and Antarctica) of the simulated 1005 variables over the different periods used for evaluation. The chosen period does not markedly influence the observed mean, and thus the bias.     Figure S6: Pluri-annual mean relative differences between forced and coupled modes for normalized SSM, ET, LAI, albedo, and 1115 precipitation, in % of the forced value. All data were temporally averaged for each study period (defined in Table 1).