Comprehensive evaluation of satellite-based and reanalysis soil moisture products using in situ observations over China

Soil moisture (SM) plays a critical role in the water and energy cycles of the Earth system; consequently, a long-term SM product with high quality is urgently needed. In this study, five SM products, including one microwave remote sensing product – the European Space Agency’s Climate Change Initiative (ESA CCI) – and four reanalysis data sets – European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis – Interim (ERA-Interim), National Centers for Environmental Prediction (NCEP), the 20th Century Reanalysis Project from National Oceanic and Atmospheric Administration (NOAA), and the ECMWF Reanalysis 5 (ERA5) – are systematically evaluated using in situ measurements during 1981–2013 in four climate regions at different timescales over the Chinese mainland. The results show that ESA CCI is closest to the observations in terms of both the spatial distributions and magnitude of the monthly SM. All reanalysis products tend to overestimate soil moisture in all regions but have higher correlations than the remote sensing product except in Northwest China. The largest inconsistency is found in southern Northeast China region, with an unbiased root mean square error (ubRMSE) value larger than 0.04. However, all products exhibit certain weaknesses in representing the interannual variation in SM. The largest relative bias of 144.4 % is found for the ERAInterim SM product under extreme and severe wet conditions in northeastern China, and the lowest relative bias is found for the ESA CCI SM product, with the minimum of 0.48 % under extreme and severe wet conditions in northwestern China. Decomposing mean square errors suggests that the bias terms are the dominant contribution for all products, and the correlation term is large for ESA CCI. As a result, the ESA CCI SM product is a good option for long-term hydrometeorological applications on the Chinese mainland. ERA5 is also a promising product, especially in northern and northwestern China in terms of low bias and high correlation coefficient. This long-term intercomparison study provides clues for SM product enhancement and further hydrological applications.


Introduction 30
Soil moisture (SM) is a key state variable in the climate system and controls the exchange of water, energy, and carbon fluxes between the land surface and the atmosphere (Western and Blöschl, 1999;Robock et al., 2000;Ochsner et al., 2013;McColl et al., 2017;Peng and Loew, 2017;Qiu et al., 2018). SM can influence runoff generation, drought development and many other processes of hydrology and agriculture (Markewitz et al., 2010;Das et al., 2011;Sevanto et al., 2014;Akbar et al., 2018).
Despite the small total mass of SM compared to other water cycle components, it is essential for numerical weather prediction 40  and has been recognized as an essential climate variable (ECV) (GCOS, 2010).
In situ measurements have been acknowledged as the most accurate method to determine SM values, but they cannot fulfill the demand of high spatial and temporal resolution for hydrometeorological use (Bárdossy and Lehmann, 1998). Furthermore, the temporal coverage of in situ measurements is usually not long enough. Therefore, satellite-based products, reanalysis products and numerical model products are often used (Peng et al., 2017). Although model outputs are spatially and temporarily 45 continuous, large uncertainties still exist in model simulations because of the physical structure, parameters, and other reasons (Schellekens et al., 2017). Reanalysis products are generally more accurate, yet they still inherit some uncertainties of the models (Berg et al., 2003), and their spatial resolutions are not high enough for regional application (Crow and Wood, 1999).
Despite the short temporal coverage and the limitation of only measuring the surface SM (Petropoulos et al., 2015), satellitebased products are very promising (Chauhan et al., 2003;Bogena et al., 2007;de Jeu et al., 2008) because they are often based 50 on observations with high spatial resolution (Busch et al., 2012). For this reason, satellite-based products are normally taken as reference datasets to evaluate model outputs and reanalysis products (Crow and Ryu, 2009;Lai et al., 2014). To choose the most appropriate SM product for long-term hydrological and meteorological studies, more evaluation work needs to be done.
Several evaluation studies have been conducted to find a qualified remote sensing SM product (Li et al., 2009;Zhang et al., 2012;Lai et al., 2014;Peng et al., 2015;An et al., 2016;Ma et al., 2016;Zhu et al., 2018). The SM product from the European 55 Space Agency (ESA) Climate Change Initiative (CCI) program has attracted attention in recent years (Dorigo et al., 2018) and has been proven to have good quality in some regions of the world (Dorigo et al., 2015Chakravorty et al., 2016;Ikonen et al., 2018;González-Zamora et al., 2019). Peng et al. (2015) evaluated the ESA CCI product along with four other datasets in Southwest China and found that it has the potential to provide valuable information. Based on observational data and eight model products, An et al. (2016) further confirmed that the CCI SM can be applied over China. Ma et al. (2016) compared the 60 ESA CCI and ERAI products with in situ measurements and found that both products show reliable time-series results.

ERAI SM
ERAI is a famous reanalysis product produced by the European Centre for Medium-Range Weather Forecasts (ECMWF, 2009). The data assimilation system is based on the IFS (Cy31r2), which includes a four-dimensional analysis with a 12-hour analysis window. The ERAI data used in this study are on a fixed grid of 80 km and have a temporal resolution of four times daily and monthly. ERAI starts in 1979 and is continuously updated in real time (Paul et al., 2011). ECMWF simulates SM at 95 4 depths: 0-7 cm, 7-28 cm, 28-100 cm, and 100-255 cm. As suggested by An et al. (2016), the data at depths of 7-28 cm are linearly interpolated to a depth of 10 cm for evaluation.

NCEP SM
NCEP is the 2nd reanalysis product provided by the National Centers for Environmental Prediction and Department of Energy (NCEP-DOE, (Kanamitsu et al., 2002)). The product is available since Jan 1979 with a spatial resolution of approximately 200 100 km. The temporal resolution includes 4 times daily and monthly data. NCEP has two layers of SM between 0-10 cm and 10-200 cm, in which the first layer was chosen for evaluation.

NOAA SM
The Twentieth Century Reanalysis Project (20CR) led by the Earth System Research Laboratory Physical Sciences Division from the National Oceanic and Atmospheric Administration (NOAA) and the University of Colorado Cooperative Institute for 105 Research in Environmental Sciences (CIRES) also produces a long-term SM product. The version of V2c is used here, spanning the entire twentieth century from 1851 to 2014 (Compo et al., 2011). The NOAA SM product is generated with a spatial resolution of 2 degrees at six hours (also monthly) and with 4 subsurface levels (0, 10, 40, 100 cm), among which the data at 10 cm depth are used.

ERA5 SM 110
ERA5 is the latest reanalysis product produced by ECMWF, covering the period from 1979 to present. The product uses a new version of the ECMWF assimilation system IFS -IFS Cycle 41R2, which yields hourly analysis fields (C3S, 2017). The spatial (31 km) and temporal (hourly) resolutions of ERA5 are rather high compared to other existing reanalysis datasets. ERA5 will eventually cover the period from 1950 to the present, and one of its key improvements is better SM (Komma et al., 2008). It has more reliable results than ERAI at three sites in Australia. Land surface models driven by ERA5 also show consistent 115 improvements, especially in surface SM, compared to those driven by ERAI (Albergel et al., 2018). Similar to ERAI, ERA5 also has 4 levels of SM data, in which the SM is interpolated to 10 cm for evaluation.

In Situ SM and Preprocessing of Datasets
The in situ SM observations were generated by three SM datasets as follows: https://doi.org/10.5194/hess-2020-611 Preprint. Discussion started: 4 January 2021 c Author(s) 2021. CC BY 4.0 License.
(2) Soil water content from agricultural-meteorological stations The in situ SM measurements are obtained from the National Meteorological Information Center of China (NMIC, 2006). The data were collected at 778 agricultural-meteorological stations with a temporal resolution of 10 days since May 1991. As there 125 are too many missing observations after 2013, the evaluations of the different datasets are performed until December 2013.
There are 7 layers for the in situ observations, and the data at 10 cm depth are utilized. In addition, the observed SM is expressed as the relative water content ( ′ , unit=%), while the SM in all other products is in the unit of volumetric water content ( , unit=m 3 m -3 ). Therefore, the observed SM is calculated by: where is the field capacity, is the dry bulk density, and is the water density with a value of 1.0 (unit=g cm -3 ).
(3) Mass percent of measured SM Another dataset including SM, field capacity and dry bulk density in China was recorded from 1981 to 1998, in which SM was presented as a mass percentage three times each month (Robock et al., 2000) to avoid auxiliary calibration. The volumetric soil moisture is calculated by: 135 in which is the mass percent of measured soil moisture. Within a certain period, the two parameters of and can be treated as constant. Considering that the field capacity and the dry bulk density are not measured at all stations, data from 119 stations are selected from 1981 to 2013.
The selection of appropriate SM values is based on several criteria. First, if there were multiple data points in the same time 140 period, the ISMN SM value was selected if available, or the average of the remaining two datasets was calculated. Second, SM values greater than 3 times the standard deviation were deleted. All the in situ observations, remote sensing data, and reanalysis data were averaged to monthly data at a depth of 10 cm. The distributions of the available stations are presented in Fig. 1, and detailed information on all the above SM products is listed in Table 1.

Land Surface Air Temperature, Precipitation, and Radiation 145
The land surface air temperature and precipitation data are obtained from the National Meteorological Information Center (NMIC) at a spatial resolution of 0.25° spanning from 1961 to the latest (http://data.cma.cn/site/index.html). By interpolating from Chinese ground high density stations (over 2400 observation stations), the CN05.1 dataset includes daily mean temperature, maximum/minimum temperature and precipitation (Wu and Gao, 2013).

The radiation data were downloaded from the China Meteorological Forcing Dataset by merging China Meteorological
Programme's (WCRP) Global Energy and Water Exchanges (GEWEX-SRB) downward shortwave radiation, Princeton forcing data and so on (Yang et al., 2010). The spatial resolution is 1 degree with a 3-hourly temporal resolution.
The self-calibrating Palmer drought severity index (SC-PDSI) was utilized to determine the performance of all products under different drought/well conditions (Wells et al., 2004). By adjusting the climatic characteristics and calculating the duration 155 factors based on the characteristics of the climate at a given location, the SC-PDSI has been widely used in recent decades.
The SC-PDSI fit Palmer's 11 categories to allow for comparisons across time and space. A negative value indicates drought conditions, and a positive value indicates a wet spell. The source code to the SC-PDSI can be downloaded via the National Agricultural Decision Support System (NADSS; online at http://nadss.unl.edu/).

Study Area and Evaluation Strategies 160
China is located on the eastern coast of Asia, immediately to the west of the Pacific Ocean. It extends roughly from 3. The 30-year averaged annual mean precipitation is treated as the climate mean precipitation to define the division of the climate zone.
To better explain the disagreement between all the SM products and in situ observations, the mean square errors (MSEs) of each product in individual regions are decomposed to quantify the contributions of the correlation term, standard deviation term and bias term. 170 in which R is the correlation coefficient; and are the standard deviations of the products and the observations, respectively; and and are the means of the product and the observations, respectively. On the right-hand side of the equation, the first term (correlation term) shows the correspondence between the SM product and the in situ observations. The second term (standard deviation term) explains the degree of similarity of variations, and the third term (bias term) shows the 175 accuracy of the product. With a better understanding of the error structure of the datasets, we can well explain the discrepancy between the SM products and the in situ observations (Dorigo et al., 2010). The distribution of the relative root mean square error (rRMSE, which is defined as averaged SM observations divided by the RMSE of all SM products) for all stations is shown in Fig. 3 to represent the relative error of the SM dynamical range. Generally, compared to all the reanalysis products, ESA CCI has the lowest rRMSEs, indicating better performance of this remote sensing dataset. Large rRMSEs are found in the Yangtze-Huai region and in the south of Northeast China, which may be attributed to 195 the high SM values. A possible explanation for the poor performance might be that these two regions are strongly influenced by monsoon precipitation.

Temporal Variability of SM
According to Table 2, all temporal variabilities of SM are averaged over the Northeast China, North China, Yangtze-Huai, and Northwest China regions, which are abbreviated as NE, NC, YH, and NW, respectively, below. 200

Temporal Evolution
The temporal evolutions of in situ observations and grid point SM values from the five datasets are averaged over each research region during JJA, as displayed in Fig. 4. Generally, all the reanalysis products overestimate the SM, while remote sensing underestimates the SM except for the NE region. All products perform well in the NC region, and the worst performance is found in the NW region. ERAI extremely overestimates SM in all the research regions, while NOAA and NCEP SM has the 205 lowest bias among the reanalysis datasets in the NE and NC regions. Reanalysis can better reproduce the variation characteristics than remote sensing; for example, all reanalyses can reproduce the peak of JJA SM in 1998 in the YH region except for remote sensing. ERAI SM has the largest positive bias for all regions. All products show a poor performance in correlation in the NW region, implying that the mechanism in this region is not captured by models.
The Taylor diagrams presenting the statistics of the comparison between ESA CCI, NCEP, ERAI, NOAA, ERA5 and in situ 215 observations over four regions are shown in Fig. 5. Most correlation coefficient values are between 0.5 and 0.6 for ERA5, implying a good variability performance. Lower correlations are found for ESA CCI and ERAI SM, demonstrating that both products correlate poorly with the observations. All products exhibit poor correlations in the NW region. Generally, the NOAA SM is always highly overestimated in all regions, and ESA CCI SM is always underestimated. Figure 6 shows the temporal evolution of SM seasonality averaged spatially over different regions. There exists a negative and a positive bias between remote sensing and reanalysis with respect to SM observations, respectively. The difference in ESA CCI is smaller than all reanalysis products, especially in warm seasons, partly due to different land surface types on satellite measurements, as well as various soil parameters used in the retrieval algorithms (Chakravorty et al., 2016). All reanalysis SM series have a larger dynamic range than in situ observations and remote sensing SM values. ERA5 SM showed a similar 225 variation tendency as the observations, while its bias was larger than that of ERAI, NCEP, and NOAA.

Seasonality 220
The seasonal cycle of SM in the NE region is obvious, partly due to the land surface vegetation types. The observed SM in all regions decreases from January, reaches its minimum from April to June, and then increases to its maximum from July to September, which can be reproduced by ERAI, NOAA and ERA5. ESA CCI and NCEP are closer to the observations in all regions, while the smallest biases occur in warm seasons and cold seasons for ESA CCI and NCEP, respectively. Additionally, 230 ERAI and ERA5 can both reproduce the trend of SM, but the bias is smaller for ERA5. ESA CCI yields the worst seasonal cycle results considering the changing tendency, especially in cold seasons, which may be because of snow or frozen soil during these periods. autocorrelations for all seasons than other reanalysis products, implying that NOAA models should take into account the influence of some other variables on soil moisture in the future, for example, temperature and precipitation. All reanalysis SM data show greater autocorrelation in winter except for the NCEP SM data, indicating better performance of NCEP during cold seasons. ERA5 shows better performance than ERAI, especially with a close autocorrelation coefficient in the NE region. In the NC region, all products fail to capture the SM variation tendency, especially during extreme drought and wet periods.
NOAA and ERA5 can capture the basic trend, but the variation range does not match the measured value. The variation 255 amplitudes of NCEP and ERAI are obviously smaller than the observations. In the YH region, ERAI and ERA5 can roughly reproduce the trend of change, but the magnitude of the change is large. There In the NW region, none of these products are able to reproduce the variation characteristics, especially with worse performance in drought periods than in wet periods. According to the correlation (in Table 3), ERA5 has the best performance, but it shows a fictitious increase from 1981 to 1993.
Generally, ERA5 is of the best quality, with the simulation in NE and NC regions passing the 0.05 significance test. Available data in the NC, YH and NW regions before 1995 are missing for ESA CCI, which might be a possible reason for the poor 265 performance. NCEP has a smaller overall change in SM than observations, and all the datasets exhibit certain weaknesses in representing the interannual variation in SM.

Decomposition of the Mean Square Error (MSE)
The comprehensive performance of the five products over the four regions from 1981 to 2013 is evaluated. The contribution to MSE is decomposed into a correlation term, standard deviation term and bias term according to Eq. (2). As shown in Fig.  270 9, the contribution of the bias term is much larger than the correlation term for the ERAI and NCEP SM data in all regions, indicating that reducing biases for both products is the direction we need to follow to further improve the quality of SM products.
The MSE of ESA CCI SM is the smallest for all regions and that of ERA5 is the second smallest in the NC and NW regions.
The large correlation term of ESA CCI and NOAA is found in all regions, pointing to the need for improving the capacity of 275 changing variations. Additionally, all products present poor performance in the NW region with a high correlation term. By https://doi.org/10.5194/hess-2020-611 Preprint. Discussion started: 4 January 2021 c Author(s) 2021. CC BY 4.0 License.
decomposing the MSE, we can see that in addition to the NW region, ESA CCI has a larger correlation term than the bias term, implying that the main error of ESA CCI comes from the poor performance of the variation tendency. The ERA5 SM dataset performs inconsistently in that its main difference comes from the correlation term in the NC and NW regions, while the bias terms are dominant in the NE and YH regions. Furthermore, the standard deviation term has little effect on MSE for all datasets 280  Figure 10 shows the averaged relative bias under different humid/arid conditions. The SC-PDSI is utilized (Wells et al., 2004).

SM Performance Under Various Climate Background 285
The relative bias was calculated as the bias of SM products and the in situ observations to be divided by the in situ observations under different conditions. The ESA CCI SM data showed a positive relative bias under severe drought conditions but a negative relative bias under severe wet and normal conditions. The NCEP and NOAA SM data have smaller relative biases under severe wet conditions, while ERAI and ERA5 SM data perform better under normal conditions. The largest difference 290 is found for the ERAI products under severe drought conditions, with an average relative bias of 44.6%. The best performance is found for ESA CCI SM under severe drought conditions and NCEP SM under severe wet conditions, with averaged relative biases of 4.7% and 9.5%, respectively. For the performance in different regions (Fig. 11), the relative bias of all SM products in the NE region is noticeably high, partly due to the various land cover types in this region. The averaged relative bias for ESA CCI under drought conditions is smaller than that under better conditions. Although ERAI and ERA5 always highly 295 overestimate the SM value, they show better performance in the NW region, especially under dry conditions. The smallest relative biases are found for the ESA CCI and NCEP SM products during all conditions. Figure 12 shows scatter plots of (a) precipitation, (b) temperature, and (c) radiation anomalies versus observed SM anomalies over different regions. Obvious positive/negative correlations are found between precipitation/temperature and SM in the NE and NC regions, respectively. The correlation between net radiation and SM (Fig. 12c) is low, which is partly due to the 300 combined influence of longwave and shortwave radiation. The correlation coefficient is low for all meteorological variables in the NW region, which may be attributed to the special soil type there.
In the winter, SM decreases in all regions mainly because of decreased precipitation. Lower evaporation caused by sudden cooling may explain why SM increases in early winter. SM reaches a local minimum in the spring in most of the regions except the NE region, as a temperature rise leads to higher evaporation, while precipitation does not increase much in this season. In 305 the NE region, ice and snow melting partially compensates for soil water loss and helps maintain a relatively stable SM.
Increased precipitation in the summer gives rise to an evident increase in SM. In the autumn, SM continues to increase in the YH and NC regions, which might be attributed to lower evaporation caused by lower temperatures.
https://doi.org/10.5194/hess-2020-611 Preprint. Discussion started: 4 January 2021 c Author(s) 2021. CC BY 4.0 License. ERA5 (~0.28125°) has a higher spatial resolution than ERAI (~0.75°), which can be directly reflected in their spatial patterns of SM distribution. ERA5 can well reproduce the spatial distribution and time series of SM over mainland China with high 310 correlations with observations. Looking at the monthly variation in the SM and interannual variation in the SM anomaly, ERA5 has better performance than ERAI. It is proposed that ERA5 will eventually replace ERAI, and we do see improvements in the ERA5 product. However, ERA5 overcorrects the problem of small variation in ERAI, which leads to unrealistically large standard deviations in ERA5 that affect its accuracy. Therefore, more improvements are still needed to improve the quality of the ERA5 SM product. 315

Conclusions
To evaluate the performance of long-term SM products over mainland China, one satellite-based product and four reanalysis datasets from 1981 to 2013 are selected for comparison with in situ measurements at different time scales.
Overall, ESA CCI has the best performance with the highest spatial resolution and accuracy, making it a good option for longterm hydrometeorological applications in China. The 0.25°*0.25° resolution of the ESA CCI product produces the finest spatial 320 pattern of SM, making it more beneficial for regional application than other SM products. However, ESA CCI is not useful because of its poor correlation and missing values, especially in Northeast China.
ERAI and ERA5 can well reproduce the tendency of the time series and perform best at stations, but they overestimate the seasonal variation in SM. ERA5 is also a promising product with better performance in several aspects than ERAI, highlighting the importance of incorporating more observations at finer spatial resolution. 325 NCEP cannot reproduce the spatial pattern of SM in China, the time series of NCEP SM data is poorly correlated with observations, and the variation amplitude of its seasonal cycle is much larger than that of the observations. NOAA is able to reproduce the basic spatial pattern, but it systematically overestimates SM in China and shows little seasonal variation. All the SM products used in the present study cannot adequately simulate the interannual variation in the SM anomaly.
The mismatch between SM layers in analysis products and observations, as well as their spatial mismatch, should be 330 investigated in the future (Choi and Hur, 2012;Crow et al., 2012). Furthermore, subdaily SM model products considering the advantages of individual models under different weather regimes and climate scenarios would be merged in future work (Chen and Yuan, 2020).

Name
Soil Depths (