The added value of brightness temperature assimilation for the 1 SMAP Level-4 surface and root-zone soil moisture analysis over 2 mainland China 3

Jianxiu Qiu, Jianzhi Dong, Wade T. Crow, Xiaohu Zhang, Rolf H. Reichle, Gabrielle J. 4 M. De Lannoy 5 Guangdong Provincial Key Laboratory of Urbanization and Geo-simulation, School of Geography and Planning, Sun 6 Yat-sen University, Guangzhou, 510275, China 7 Southern Laboratory of Ocean Science and Engineering (Guangdong, Zhuhai), Zhuhai, 519000, China 8 USDA ARS Hydrology and Remote Sensing Laboratory, Beltsville, MD 20705, USA 9 National Engineering and Technology Center for Information Agriculture, Nanjing Agricultural University, Nanjing, 10 China 11 Jiangsu Key Laboratory for Information Agriculture, Nanjing Agricultural University, Nanjing, China 12 Global Modeling and Assimilation Office, NASA Goddard Space Flight Center, Greenbelt, MD, USA 13 Department of Earth and Environmental Sciences, KU Leuven, Heverlee, Belgium 14 15

with and without DA can be estimated using the ratio of their correlations with just one noisy but independent ancillary 57 remote sensing product. This approach was applied to the SMAP L4 system using ASCAT soil moisture retrievals.

58
Their results show that the added value of SMAP DA is closely related to both rain gauge and vegetation density.

59
However, due to the limited availability of independent root-zone soil moisture (RZSM) products for performing 60 statistical error estimation, this method is only applicable for SSM estimates.

74
The primary objective of this study is to determine the DA efficiency, i.e., performance improvement in DA results

75
relative to the open-loop (OL) baseline of the L4 product, as a function of a variety of system aspects, including errors 76 in CLSM forcing (e.g., precipitation), errors in key CLSM parameters (e.g., relating to vegetation), mean errors in 77 CLSM structure (e.g., surface and root-zone coupling), and errors in the radiative transfer modeling (RTM) that links 78 the modeled soil moisture and temperature estimates to the observed Tb.

79
To this end, we first evaluate the performance of L4 SSM and RZSM estimates using a very large number (n = 2474) 80 of soil moisture profile measurement sites (generally acquired at sub-surface depths between 10 and 50 cm) within 81 mainland China. Next, the in-situ measurements are used to assess the DA efficiency of the L4 system, which is defined 82 https://doi.org/10.5194/hess-2020-407 Preprint. Discussion started: 9 September 2020 c Author(s) 2020. CC BY 4.0 License. as the skill difference between the L4 estimates and model-only estimates derived without SMAP Tb assimilation.

83
Additionally, we apply a machine-learning technique to quantify by how much various control factors drive the spatial 84 variations in the efficiency of the L4 system. In this way, we seek to prioritize future enhancements to the L4 system.

86
This section briefly describes the SMAP L4 soil moisture product (Section 2.1), the extensive network of in-situ soil 87 moisture observations over mainland China (Section 2.2) and the ancillary data sources and metrics used in the skill 88 assessment (Sections 2.3 and 2.4). Next, we introduce the double instrumental variable (IVd) method employed to 89 determine the errors in control factors that cannot be determined using ground observations (Section 2.5). Finally, we 90 describe the random forest (RF) regression method used to identify the main factor(s) (out of the 8 control factors from 91 both CLSM and RTM aspects) that affect the spatial variations in SMAP L4 DA efficiency and L4 performance 92 (Section 2.6).  (Lucchesi, 2013), with 99 precipitation corrected using the daily, 0.5-degree, gauge-based Climate Prediction Center Unified (CPCU) product 100 (Xie et al. 2007). The L4 product provides global, 9-km, 3-hourly surface (0-5 cm) and root-zone (0-100 cm) soil 101 moisture estimates along with related land surface fields and analysis diagnostics. For the present study, we aggregated 102 all soil moisture estimates to daily-average (00:00 to 23:59 UTC) data. A baseline, model-only, ensemble CLSM 103 simulation without the assimilation of SMAP Tb observations (but using the same perturbations as in the L4 system) 104 is referred to as the "open-loop" (OL) run.

105
The SMAP L4 assimilation system includes a zero-order "tau-omega" forward RTM (

134
Ground observations falling within the same 9-km EASE grid were averaged for comparisons against the collocated 135 9-km L4 and OL soil moisture estimates. A total of 2287 individual 9-km EASE grid cells within mainland China are 136 included in the analysis. Among them, 92.35% of grid cells contain one in-situ site, 7.26% contain two sites, 7 grid 137 cells contain three sites, and the remaining two grid cells contain four and five sites respectively. 138 2.3 Explanatory data products 139 As discussed above, our hypothesis is that the efficiency of the SMAP L4 system will be sensitive to the ability of the 140 ensemble-based L4 analysis in filtering errors that exist in the OL (that is, CLSM), in the model forecast Tb (that is, 141 the RTM), and in the SMAP Tb observations. We therefore considered two separate categories of factors that 142 potentially control spatial variations in DA efficiency. The factors are summarized in Table 1.

155
The control factors take a variety of forms. Some factors are based on estimates of the errors fed into the L4 system as 156 (e.g., the error in CLSM rainfall forcing data). Other factors consist of the magnitude of the variable itself (e.g., the 157 vertical variability of clay fraction). Note that LAI is used in both ways: LAI error is used to predict OL performance 158 (because LAI is an important input into CLSM) while mean LAI is used to explain DA performance (because increased 159 LAI is associated with decreased soil moisture information content in microwave observations).

160
Note that the LAI used in the L4 system is a climatology derived from satellite observations of the Normalized 161 Difference Vegetation Index. Therefore, to indicate the magnitude by which each grid cell's LAI typically deviates 162 from its long-term climatology, we use the temporal standard deviation of anomaly time series of the benchmark LAI

214
Observed CP (CPobs) was based on comparisons between 0-10 cm "surface" estimates and 0-50 cm "root-zone" in situ 215 observations and used as a benchmark. In contrast, SMAP L4 CP estimates (CPOL) was based on the comparison of 0-216 5 cm "surface" estimates and 0-100 cm "root-zone" estimates. Therefore, the surface versus root-zone storage contrast 217 in the observation time series is less than that of the L4 estimates. This will likely cause the observed correlation 218 between surface and root-zone time series to be systematically higher than the analogous vertical correlation 219 calculation for L4 estimates. However, this bias is partially corrected for by the second term in Eq.
(1)since the 220 observed α ratio will, by the same token, tend to be smaller (i.e. closer to one) than α sampled from the L4 analysis.

221
Such ability to compensate for vertical depth differences is a key reason we apply CP, rather than simple correlation, where the αx is a scaling factor; Bx is a temporal constant bias and εx is zero-mean random error.

238
which are based on the lag-1 (day) time series (at day t) of x and y, respectively. Therefore, assuming that the errors of 239 two independent products are serially white, the covariance between instrumental variables and products can be written In this way, the error in the L4 LE (measured by IVd-based correlation with truth) can be estimated robustly using the 246 FLUXCOM LE product described in Section 2.3.2.

248
A random forest (RF) regression approach was used to rank and quantify the importance of the 8 control factors 249 introduced above (Table 1)   that CPOL should thus be smaller than CPobs. In addition, the vertical variability of the clay fraction seems to show little 288 spatial variation across mainland China (Fig. 2c). With respect to CLSM LAI error, regions in southern China that 289 have generally higher LAI show larger standard deviation in SPOT LAI time series (Fig. 2d and 2h). The IVd-based 290 estimates of SMAP L4 LE error, which represent a potential control factor for water-balance errors in CLSM, generally 291 show low-level of error across mainland China (Fig. 2e).

292
For O-F Tb residuals describing RTM-related error, a higher standard deviation of O-F Tb residuals is observed in the 293 North China Plain (Fig. 2f), which is very consistent in spatial distribution with areas displaying the highest and most 294 significant SSM prediction improvement (Fig. 1c). This is expected, as mentioned above

312
Given the sampling errors of ΔR, which is based on a two-year validation period, and the relatively low spatial 313 variability in RZSM skill (Figs. 1f), the performance of RF is acceptable. In addition, ground-measurement upscaling 314 error is likely a significant contributor to unexplainable spatial variability for ΔR in Fig. 1

328
Based on the RF results, the Tb error is quantified as the most prominent factor in determining DA efficiency (i.e., ΔR 329 = RL4 -ROL)followed by precipitation error and microwave soil roughness (Fig. 3b). The RF-derived ranking of 330 control-factor importance for RZSM is similar to that of SSM in that the same three factors are still the most 331 https://doi.org/10.5194/hess-2020-407 Preprint. Discussion started: 9 September 2020 c Author(s) 2020. CC BY 4.0 License.
explanatory. However, in contrast to SSM, the importance of Tb error for RZSM decreased dramatically from >30% 332 to ~15%. Other modeling error sources (e.g., the vertical variability of soil properties) have only very limited impact 333 on SMAP DA improvement.

334
As seen in Fig. 3c

344
The qualitative rankings provided by the RF analysis in Fig. 3

371
For precipitation, this decomposition is illustrated in Fig. 5. Note that, as expected, low-quality precipitation tends to 372 degrade the skill (i.e., correlation versus ground observations) of OL SSM and RZSM estimates (see Fig. 5a-b). This 373 degradation provides an enhanced opportunity for SMAP L4 DA to provide added value. As a result, ΔR tends to be a 374 proportional function of precipitation skill (i.e., higher precipitation skill leads to lower ΔR, see Fig. 5c  384 385 Figure 6 is analogous to Fig. 4  OL does get worse with increasing roughness, there is more room for improvement as the roughness increases, which 389 makes it plausible that ΔR increases with increasing soil roughness (see Fig. 6a-b). 390 391 Figure 6: As in Fig. 4 but for ΔR as a function of microwave soil roughness.

393
Besides the above three control factors that dominate the DA efficiency, we also examine the top factor that affects 394 SMAP L4 performance, i.e., vertical-coupling errors (Fig. 7 ). This means that some of the opportunity presented by the larger OL RZSM errors is squandered by sub-optimal 403 DA. As a result, the increase in RZSM DA efficiency associated with biased SSM-RZSM coupling (Fig. 7d) is smaller 404 than the analogous increase in SSM DA efficiency (Fig. 7c). the groups with lowest mean ΔR in Fig. 4a and Fig. 6a, the averages of ΔR from all groups are significantly higher 415 than 0 (p<0.01).

416
As expected, precipitation error is the dominant factor for explaining the skill of the OL estimates. In contrast, the 417 SSM-RZSM coupling error is the dominant factor for explaining the skill of the L4 results, which shows DA is able to 418 correct for precipitation errors. https://doi.org/10.5194/hess-2020-407 Preprint. Discussion started: 9 September 2020 c Author(s) 2020. CC BY 4.0 License.