SMAP retrieval assimilation improves soil moisture estimation across irrigated areas in South Asia

A soil moisture retrieval assimilation framework is implemented across South Asia in an attempt to improve regional soil moisture estimation as well as to provide a consistent regional soil moisture dataset. This study aims to improve the spatiotemporal variability of soil moisture estimates by assimilating Soil Moisture Active Passive (SMAP) near surface soil moisture retrievals into a land surface model. The Noah-MP (v4.0.1) land surface model is run within the NASA Land Information System software framework to model regional land surface processes. NASA Modern-Era Retrospective Analysis for 5 Research and Applications (MERRA2) and GPM Integrated Multi-satellitE Retrievals (IMERG) provide the meteorological boundary conditions to the land surface model. Assimilation is carried out using both cumulative distribution function (CDF) corrected (DA-CDF) and uncorrected SMAP retrievals (DA-NoCDF). CDF-matching is implemented to map the statistical moments of the SMAP soil moisture retrievals to the land surface model climatology. Comparison of assimilated and modelonly soil moisture estimates with publicly available in-situ measurements highlight the relative improvement in soil moisture 10 estimates by assimilating SMAP retrievals. Across the Tibetan Plateau, DA-NoCDF reduced the mean bias and RMSE by 8.4% and 9.4% even though assimilation only occurred during less than 10% of the study period due to frozen soil conditions. The best goodness-of-fit statistics were achieved for the IMERG DA-NoCDF soil moisture experiment. SMAP retrieval assimilation corrected biases associated with unmodeled hydrologic phenomenon (e.g., anthropogenic influences due to irrigation). The highest influence of assimilation was observed across croplands. Improvements in soil moisture translated into improved 15 spatiotemporal patterns of modeled evapotranspiration, yet limited influence of assimilation was observed on states included within the carbon cycle such as gross primary production. Improvement in fine-scale modeled estimates by assimilating coarsescale retrievals highlights the potential of this approach for soil moisture estimation over data scarce regions.

to irrigation-induced groundwater pumping (Rodell et al., 2009;Asoka et al., 2017). Global land surface models (LSM), in general, do not include groundwater pumping modules. An inverse technique of estimating the amount of groundwater pumped could potentially be developed if accurate soil moisture estimates are available (apart from the other water budget contributing variables). Soil moisture records may be able to provide the much needed information about the extent and amount of groundwater pumping across the whole of South Asia. Accurate soil moisture estimation across South Asia is, therefore, an important need.
In situ soil moisture measurements across South Asia are scarce (apart from having limited accessibility). To fill this knowledge gap and to evaluate the utility of leveraging data assimilation as a feasible option in this region, we demonstrate the utilization of Soil Moisture Active Passive (SMAP; Entekhabi et al. (2010)) retrieval assimilation to improve soil moisture estimation across South Asia. Section 2 describes the prominent features of the study domain; Sect. 3 provides details regarding the various datasets and the data assimilation framework utilized; Sect. 4 highlights the important results of the DA 70 experiments; and Sect. 5 summarizes the main conclusions of this study.

Study domain
The study domain discussed in this paper encompasses the mountainous region in South Asia and the adjoining areas, Fig. 1.
The HinduKush-Himalayan mountain range and the Tibetan Plateau, represented by grid cells with elevation > 2000 m in Fig.   1(a) constitute high mountain Asia. Ten major rivers (Indus,Ganges,Brahmaputra,Irrawaddy,Salween,Mekong,Yangtze,75 Yellow, Tarim, Amu, and Syr) originate in this region and flow towards the low elevation areas where they serve as sources of freshwater for the residing populace. Agriculture-based irrigation is a primary consumer of the freshwater transported downstream by the rivers (Wester et al., 2018). Table A1 present the soil texture conditions within the domain. The NCEP/STATSGO+FAO (Natural Resources Conservation Service) soil texture classification is used to categorize the grid cells into 16 individual classes (Note: 80 soil classes that did not have any grid cell types in the study domain are excluded from the figure legend). The predominant soil texture type found within the domain is loam followed by clay loam. Landcover categorization (see Fig. 1(d) and Table   A1 columns 4 to 6) is based on the NCEP/MODIS-based International Geosphere-Biosphere Programme (IGBP) (Friedl et al., 2002) classification (Note: similar classes are lumped together, for example different forest types are grouped into a singular forest class). The predominant landcover types present within the study domain are barren, croplands, and shrublands. 85 The Food and Agriculture Organization (FAO) of the United Nations provides a global map of fraction areas that are equipped for irrigation as part of the Global Map of Irrigation Areas (GMIA) product, which is provided at a 5-arc minute (0.0833 • ) resolution (Siebert et al., 2005). The GMIA product was used in this study to represent the total irrigation-equipped area within each grid cell, see Fig. 7(c). The grid cells with high irrigation percentages correspond well (spatially) with grid cells belonging to the landcover type croplands in Fig. 1(d).

Methodology and datasets
This section describes the methodology developed to implement the assimilation of SMAP soil moisture retrievals into the land surface model as well as the various datasets utilized in the analysis results detailed in Sect. 4.

NASA Land Information System
The NASA Land Information System (LIS) is a software framework which facilitates high performance computing for land 95 surface modeling and data assimilation purposes (Kumar et al., 2006;Peters-Lidard et al., 2007). The NASA LIS framework was used to run the Noah-MP land surface model (LSM) and to assimilate SMAP soil moisture retrievals (Fig. 2).

Dynamic vegetation option
Canopy stomatal resistance Ball-Berry method (Ball et al., 1987) Runoff and groundwater Simple groundwater model, SIMGM (Niu et al., 2007) Supercooled liquid water and frozen soil permeability NY06 (Niu and Yang, 2006) Surface-layer drag coefficient General Monin-Obukhov similarity theory (Brutsaert, 2013) Snow surface albedo Biosphere-Atmosphere Transfer Scheme (Yang and Dickinson, 1996) Partitioning of rain and snowfall Jordan91 (Jordan, 1991) Snow and soil temperature Semi-implicit option Lower boundary of soil temperature Noah native option Meteorological boundary conditions MERRA-2 (Gelaro et al., 2017), IMERG (Huffman et al., 2015) 3.1.1 Noah-MP land surface model The Noah-MP (version 4.0.1) LSM (Ek et al., 2003;Niu et al., 2011;Yang et al., 2011) was run within LIS to simulate the relevant land surface processes across the study domain. Noah-MP was run on an equidistant cylindrical grid with a spatial 100 resolution of 0.05 • x 0.05 • at a 15 minute timestep. Table 1 outlines the Noah-MP physics in this study.
Noah-MP was selected for this study due to the multilayer representation of soil, explicit modeling of frozen soil permeability (Niu and Yang, 2006), and representation of snowpack and soil interface processes. Noah-MP (version 4.0.1) includes coupled energy, water, and carbon cycle simulation routines. The soil profile is divided into four layers with thicknesses of 5, 35, 60, and 100 cm, respectively. A three-layer (maximum) snow structure is implemented above the surface soil layer to capture 105 snowpack dynamics and the snowpack-soil interface fluxes for areas that experience snowfall . Noah-MP was (separately) forced with meteorological fields from Modern-Era Retrospective analysis for Research and Applications (MERRA2, Gelaro et al. (2017)) and Integrated Multi-satellite Retrievals for Global Precipitation Measurement (IMERG, Huffman et al. (2015)). The IMERG Final run was used. External irrigation and groundwater pumping were not explicitly modeled. Thus, there was an information gap regarding these two water sources in the modeled water cycle. L-band radiometry offers all-weather, diurnal sensing of the surface dielectric properties. The surface dielectric properties are a function of the near-surface soil moisture. Several mitigation features directed at preventing signal contamination due to radio frequency interference (RFI) are built into the radiometer electronics and algorithms. Quality flags are included in the 120 metadata to provide location specific details such as retrieval error, retrieval uncertainty, frozen ground conditions, presence of steep topography, and vegetation information (O'Neill et al., 2019).

In situ soil moisture measurements for model evaluation
Ground-based soil moisture measurements were obtained from the International Soil Moisture Network, an international, multi-agency cooperation that provides global, in situ soil moisture measurements for the validation of model and remote 125 sensing-based products (URL= https://ismn.earth/en/). Station measurements from four separate networks: 1) Ngari, 2) Naqu, 3) Maqu (Su et al., 2011;Zeng et al., 2016), and 4) CTP-SMTMN (Yang et al., 2013) were colocated with the land surface model grid for evaluation of the modeled estimates. The colocation was based on a simple arithmetic averaging of stations located within each grid cell.
The different networks represent varying local climates, although all networks are located at high elevations and have rela-130 tively cold climates. The Ngari network is located in an arid region, Naqu and CTP-SMTMN networks are situated in a semiarid region, and Maqu experiences a relatively humid climate, Fig. 1(a). The total number of stations available for evaluation is 101.
Soil moisture measured at a depth of 5 cm below the surface was compared with model estimated surface soil moisture (soil layer depth = 0 to 5 cm). Measurements across the Tibetan Plateau are the only publicly available soil moisture measurements within the study domain between the years 2015 to 2020.

ALEXI evapotranspiration for model evaluation
To study the influence of soil moisture assimilation on related geophysical fluxes, the Atmosphere-Land Exchange Inverse (ALEXI) evapotranspiration product was used. ALEXI estimates evapotranspiration (ET) using multi-sensor thermal infrared observations (Anderson et al., 2007(Anderson et al., , 2011. A two-source (soil and canpoy) land surface model is coupled to an atmospheric boundary layer model in order to derive energy fluxes based on thermal imagery and insolation estimates derived from geo- FluxSat is a satellite-based product that employs machine learning, reflectance data from Moderate-resolution Imaging Spectroradiometer (MODIS), and eddy covariance measurements to estimate global gross primary production (Joiner and Yoshida, 2020). Gross primary production (GPP) is an important variable within the carbon cycle. It represents the rate at which carbon is assimilated into the plant biomass per unit area per time during photosynthesis (Gough, 2011). GPP impacts the water cycle as plants transpire water during photosynthesis, thereby acting as moisture sources for the atmosphere and moisture sinks 150 within the soil (Philander, 2008). FluxSat is developed by training neural networks using MODIS reflectance data to upscale GPP obtained from eddy covariance flux tower measurements (Joiner and Yoshida, 2020). FluxSat GPP was used here to study the influence of soil moisture assimilation on the carbon cycle.

GOME-2 fluorescence for model evaluation
In addition to GPP from FluxSat, solar-induced fluorescence (SIF) retrievals were also utilized to investigate the influence of also utilized to develop the model cumulative distribution functions (CDFs) that were later used for CDF-matching during the assimilation run discussed in Sect. 3.3.3.

Open loop (OL)
The OL run represents a model-only run, i.e., the Noah-MP model was run in an ensemble configuration without any external observations assimilated. The OL run serves as a baseline for Noah-MP's land surface modeling capability across South Asia an asymptotic value. Therefore, a 20-replicate ensemble was selected as an approximation of the probability distribution that reasonably represents the uncertainty in the model estimates.
Boundary conditions such as air temperature and radiative fluxes (i.e., incident shortwave and longwave radiation) were provided by the MERRA2 dataset. Boundary condition (forcing) perturbations used by Kwon et al. (2019) (Table 2) model, α is a vector of model parameters, t is time, and x ∈ X defines the spatial domain. Equation (1) defines the formulation of the update step applied to the a priori state estimate (for each replicate) based on the difference between the model estimate and the observed value: such that y + t (x) = a posteriori soil moisture value at time t, y − t = a priori soil moisture estimate at time t, K t (x) = Kalman gain at time t, z t (x) = SMAP soil moisture retrieval at time t, v t = SMAP soil moisture retrieval error at time t such that The difference between the observation (plus observation error) and the mapped a priori model state estimate is known as the innovation, In t . The normalized innovation (N I t ) is an effective diagnostic tool that aids in the diagnosis of the assimilation framework and the origin of biases (Buehner, 2010). Equation (3) provides the normalized innovation formula for each replicate 215 as: The numerator in Eq.
(3) equals In t which is then normalized by the squared-root of the sum of C ztzt and C vv . In an optimal DA system, the normalized innovations should exhibit a standard normal distribution (N I t ∼ N (0,1)). To compute C ztzt and C vv , the prognostic state and observation error standard deviation was taken equal to 0.04 m 3 m −3 (O'Neill et al., 2014).

220
It is worth noting here that the EnKF is expected to behave suboptimally given the nonlinearity of the Noah-MP model in conjunction with the non-Gaussianity of the SMAP retrieval errors. However, the exploration of N I t sequence is a worthwhile exercise in an effort to better diagnose the behavior of the assimilation framework used in this study. DA runs were then compared against the evaluation datasets to analyze the influence of SM assimilation on the modeled states in Sect. 4.

Experimental results
Model estimates for water years (October to September) 2016 to 2020 are used to compute the results presented in this section. Water years were used rather than Julian years due to the former's hydrologic suitability for the state variable under 235 consideration, i.e., soil moisture (SM).

Evaluation using in situ measurements
In situ SM measurements available across the Tibetan Plateau were used to evaluate the modeled SM estimates. In situ measurements were collected at the point-scale whereas the Noah-MP grid size equaled 0.05 • x 0.05 • (∼5.5 km x 5 km at midlatitudes). Some grid cells contain multiple stations located within the 0.05 • x 0.05 • area. If more than one station was located 240 within a single grid cell, an average of the station measurements was used for comparison against the modeled SM estimates.
Therefore, the total number of grid cells suitable for evaluation equaled 78 based on a total of 101 stations. respectively. The MERRA2 runs display better temporal consistency with the measurements as compared to the IMERG runs.
In Fig. 3(c), the DA-NoCDF run exhibits the lowest RMSE while the OL run has the highest RMSE magnitude. However, the differences between the RMSE magnitudes for the different MERRA2 runs are minimal (i.e., less than 0.002 m 3 m −3 ). In

Statistical analysis
Relevant statistics were computed using all the measurements (from all the networks) available from October 2015 to September 2020 in conjunction with the corresponding Noah-MP modeled estimates. Table 2 presents

315
The DA-NoCDF simulation exhibits higher differences with the OL relative to the DA-CDF run. Therefore, to further dissect its spatial patterns with respect to landcover and soil texture, Figs. 5 and 6 were created. Fig. 5 presents the OL and MERRA2forced DA-NoCDF joint PDFs (shown here as fractions of total landcover type grid cells) for the winter months of the 2016 water year. The bar graph in subplot 5(h) provides the percentage of grid cells for each landcover type that have at least one instance of SMAP retrieval assimilation. The highest percentage is observed for grid cells belonging to the cropland type.

320
Linear regression coefficients included in all the subplots of Fig. 5 represent the slope between the two axes. If the slope is >1 then, in general, the variable on the y-axis (here DA-NoCDF) has greater soil moisture magnitudes than the x-axis (here OL). Forest (subplot 5(a)), savannas (subplot 5(c)), and cropland (subplot 5(e)) landcover types show >1 linear regression coef-ficients, indicating that, in general, the SMAP assimilation increases the soil moisture magnitude across grid cells belonging to these landcover types. Interesting to note is that the percentage of grid cells with assimilation is quite different for these three 325 landcover types (forest=10%, savannas=40%, and cropland=80%). For shrublands (subplot 5(b)), grasslands (subplot 5(d)), urban/built-up (subplot 5(f)), and barren (subplot 5(g)) landcover types, the linear regression coefficients are <1 indicating that, in general, the SMAP assimilation decreases the soil moisture magnitude across grid cells belonging to these landcover types. The lowest regression coefficient is computed for the urban/built-up landcover type. The correlation coefficients for savannas, croplands and urban/built-up are ≤0.75 and are relatively lower than the other landcover types, which suggests that 330 SMAP SM assimilation alters the SM estimates across grid cells belonging to these three landcover types the most (Note: if the SM assimilation caused no change, the OL and DA SM estimates would be nearly identical, and hence the correlation coefficient between the two would equal 1.). The lowest correlation is computed for the urban/built-up landcover type, of which 70% of the grid cells underwent assimilation, however, this landcover type only represents 0.4% of the total domain grid cells (Table A1). Similar results were observed for the IMERG-forced simulation as well (results not shown).
335 Figure 6 displays the OL and MERRA2-forced DA-NoCDF joint PDFs (shown here as fractions of total grid cells) categorized with respect to the soil texture types for the winter months of the 2016 water year. The bar graph in subplot 6(h) provides the percentage of grid cells belonging to each soil texture type that have at least one instance of SMAP retrieval assimilation.
The soil types that included sand or loam exhibited regression coefficients >1 (except for loamy sand). Grid cells belonging to loamy sand (subplot 6(b)) , silty clay (subplot 6(h)), and clay (subplot 6(i)) soil types exhibited regression coefficients <1, 340 indicating a general decrease in SM magnitude after SMAP assimilation. However, the regression coefficients of all three of these soil texture types are close to one, and therefore, do not reinforce any significant influence of SMAP assimilation on grid cells belonging to these particular soil texture types.

Irrigation impact
In South Asia, irrigation is implemented through routing of the: i) river runoff (contributed by snowmelt and precipitation), 345 ii) discharge from storage reservoirs such as dams, and iii) water pumped from subsurface aquifers, using a network of canals and tube wells (Chambers, 1988). The GMIA total irrigation-equipped area map in Fig. 7(e) visualizes this practice as high magnitudes are observed in the areas surrounding the major rivers in Pakistan, India, and Bangladesh.
Irrigation is not explicitly modeled in the Noah-MP land surface modeling environment. Therefore, to investigate the effect of SM assimilation on irrigated areas in further detail, the maps of temporal mean normalized innovation (NI) were compared 350 against the GMIA total irrigation-equipped area map. NI (detail in Sect. 3.3.3) represents the difference between the observations (i.e., SMAP SM retrievals) and the modeled a priori estimates. A positive NI value indicates that the a priori state estimate is less than the observed value while a negative NI value indicates that the a priori state estimate is greater than the observed value. For an unbiased, optimal assimilation framework, the NI sequence exhibits a mean of 0 and a standard deviation equal to 1 over time. Therefore, high positive or negative NI values reveal the presence of bias either in the model 355 estimates or the assimilated retrievals.   (Fig. 7(c)), it is apparent that the SMAP retrievals have higher SM magnitudes across irrigated areas. SMAP retrievals 360 implicitly contain the effects of irrigation and subsequently transfer that information to the modeled estimates via assimilation.
Hence, the water budget across these locations was corrected as information related to an unmodeled soil moisture source was effectively incorporated into the land surface model. Figures 7(g) and 7(h) show the general increase in mean NI magnitudes during the winter and summer months, respectively, as the percentage of irrigation-equipped area increases. NIs computed from the MERRA2 and IMERG DA-CDF runs, however, do not display this pattern.

365
Further comparing the MERRA2 and IMERG DA-NoCDF NI maps with the water storage trends identified by Fig. 1 in Girotto et al. (2017) and Fig. 2 in Loomis et al. (2019), the locations in the northwestern part of India that show negative water storage trends (resulting from groundwater pumping for purposes of irrigation) are spatially consistent with high positive NI values. The additional water introduced into the hydrologic cycle via pumping from subsurface aquifers is captured by the SMAP SM retrievals and is then used to condition the modeled estimates via assimilation.

370
The spatial patterns in NI show different magnitudes (and even different signs) at some locations for DA-CDF versus DA-NoCDF. The visible difference in NI signs is due to the implementation of CDF matching of the assimilated retrievals during the DA-CDF simulation. If the model estimates are biased, traditional data assimilation generally does not result in optimal estimates (Zhang and Moore, 2015). Mapping the observation CDF to a biased model CDF would ultimately transfer the model bias into the CDF-matched observations. Therefore, in cases where the model estimates are inherently biased, assimilation of 375 CDF-matched retrievals could update the a priori state estimates in the wrong direction. This phenomenon is apparent in IMERG DA-CDF versus IMERG DA-NoCDF NI maps across the irrigated areas and the Tibetan Plateau.
One interesting pattern to note is the presence of highly negative NI values across the high elevation areas (Hindukush mountains) in the western part of the domain in the DA-NoCDF maps (subplots 7(b) and 7(e)). Comparing the DA-NoCDF NI maps with the DA minus OL map in Fig. 4, it is apparent that the high NI values did not manifest into high DA minus OL 380 values. A high NI magnitude does not necessarily lead to a subsequently high update. If the model state error variance is quite low, the denominator in Eq. 3 will be a small value that can then result in a large NI if the nominator (innovation) is relatively large. However, a low model state error variance results in a reduced Kalman gain (due to C ytzt ), and hence, the computed update will be relatively small.
High NI magnitudes are observed in the Indus Basin even though assimilation occurred during <20% of the total days (in the 385 study period) at these locations. This suggests that the quantitative effect of SMAP SM retrieval assimilation is not primarily based on the assimilation frequency, but rather the large differences between the SMAP and a prioi estimates. The DA-CDF versus DA-NoCDF results seen here are similar to the experiments conducted by Kumar et al. (2015) to evaluate SM retrievals across irrigated areas. Their study showed that bias correction of observations via CDF matching can lead to the removal of the information pertaining to the unmodeled processes from the observations when the estimation bias stems from the absence of 390 such processes in the model.

Influence on water and carbon cycle
SM is an important component of the water cycle. It is, therefore, expected that changes in the SM estimates would translate into changes in hydrologic variables that are dependent on SM such as evapotranspiration (ET). ET is composed of evaporation from the soil and vegetation as well as transpiration from the vegetation. While ET is used to represent the water cycle in this 395 section, gross primary production (GPP) and solar-induced chlorophyll fluorescence (SIF) are utilized as vegetation proxies that represent the carbon cycle.
In order to diagnose the influence of SMAP SM assimilation on ET, the mean annual ET from the MERRA2 and IMERGforced OL, DA-CDF, DA-NoCDF simulations is analyzed. Figure 8 highlights the improved spatial consistency (relative to the ALEXI ET) of the DA-NoCDF estimates (subplots 8(d) and 8(g)) compared to the OL (subplots 8(b) and 8(e)) and DA-CDF ET show higher ET magnitudes across the Tibetan Plateau as compared to the IMERG runs, which corresponds well with the higher positive bias computed in MERRA2-forced SM estimates (see Table 2). All of the IMERG simulations exhibit better overall spatial correlation with ALEXI ET relative to the MERRA2 runs. Comparing the spatial patterns in ET magnitudes with the GMIA irrigation-equipped area map (Fig. 7(c)), it can be seen that the mean ET magnitudes across irrigated areas, particularly across the Indus basin, increased for DA-NoCDF simulations 410 (Figs. 8(d) and 8(g)) relative to the OL. However, this feature is absent in the DA-CDF simulations (Figs. 8(c) and 8(f)). The spatial patterns observed in the DA minus OL SM (see Figs. 4(f) and 4(h)) are similarly shown in the ET maps (Figs. 8(d) and 8(g)) in terms of higher ET magnitudes observed for grid cells belonging to the cropland landcover type.
Further investigation of this feature highlighted the correction of SM and ET in irrigated areas via SMAP assimilation. It is expected that as the irrigation percentage increases the surface SM would also increase. The increase in SM, in general, 415 translates into an increase in ET. Figure 9 shows the increase in ALEXI ET as the percentage of irrigated area (Fig. 7(c)) in each grid cell increases. In contrast, the OL and DA-CDF estimates do not capture this behavior, and alternatively, show declining ET values for regions with 40% or more total irrigation-equipped area when using the MERRA2 boundary conditions.
The IMERG OL and DA-CDF estimates show approximately the same decreasing trend. However, the DA-NoCDF estimates corrected the decreasing magnitudes for grid cells with >40% total irrigation-equipped area for both sets of precipitation 420 boundary conditions.
The ALEXI ET dataset serves as an independent evaluation source for OL, DA-CDF, and DA-NoCDF ET estimates. The ET magnitudes for all the modeled runs are lower than the ALEXI ET, which could be attributed to the absence of relevant processes (e.g., surface irrigation) in Noah-MP, whereas the ALEXI product implicitly includes this information. Although ALEXI is a modeled dataset, it is based on remote sensing data and has been shown to detect irrigation (Knipper et al., 2019).  is that vegetation transpiration is more dependent on root-zone SM than surface SM. In Fig. 10(b), it is seen that the change in near-surface (L1) SM is largely modulated in terms of root-zone (L2) SM. In general, root-zone SM tends to maintain low variation throughout the year. Thus, it is expected that assimilation of surface SM retrievals may not significantly impact the dynamic vegetation.
FluxSat GPP and Noah-MP GPP were compared with respect to dominant landcover types and it was observed that the 450 SMAP assimilation did not influence the vegetation within any of the landcover type grid cells to a high extent, Fig. 11. Even the highest percent improvement in the RMSE, computed for savannas (normalized information content (NIC) = 4.5%, see Appendix B for formula) during the summer months was <5%. The correlations between GOME-SIF and the different Noah-MP modeled estimates are similar in magnitude and do not highlight any significant influence of SMAP assimilation (OL versus DA-NoCDF) with respect to individual landcover types. Comparing these results to the vegetation optical depth (VOD) 455 assimilation implemented by Kumar et al. (2020), it seems that the modeled GPP estimates are relatively more improved by assimilating VOD than surface SM. In the context of land surface modeling with Noah-MP, surface SM exhibits a weaker influence on GPP as compared to VOD. This is because SM has an indirect effect on GPP, whereas assimilation of VOD has a direct impact on plant biomass, and hence, on GPP. Kumar et al. (2020) found that SM had a higher control over ET and GPP during moisture-limited conditions.

Conclusions
Soil moisture estimation across South Asia was implemented in this study by assimilating SMAP soil moisture retrievals into a land surface model. The Noah-MP land surface model was run within the NASA Land Information System software framework to simulate the regional land surface processes. Precipitation boundary conditions (in different experiments) were provided by the NASA Modern-Era Retrospective Analysis for Research and Applications (MERRA2) and GPM Integrated Multi-satellite 465 Retrievals (IMERG) products. SMAP retrieval assimilation was implemented using two approaches: i) DA-CDF= mapping of the SMAP retrieval CDF to the land surface model climatology prior to assimilation, and ii) DA-NoCDF = SMAP retrieval assimilation without CDF-matching. CDF-matching of the observations to the modeled estimates was applied in an effort to correct the distribution moments of the SMAP soil moisture retrievals.
Comparison of assimilated and model-only soil moisture estimates against in situ measurements showed the relative im-470 provement in soil moisture by assimilating SMAP retrievals. The IMERG DA-NoCDF simulation exhibited the best goodnessof-fit and reduced the mean bias and RMSE by 8.4% and 9.4% across the Tibetan Plateau. The results presented in Sect. 4 highlight that SMAP assimilation decreased the magnitude of error (Table 2), and improved the spatiotemporal soil moisture patterns (Figs. 3 and 7) and associated evapotranspiration (Fig. 8), particularly over irrigated areas. However, the influence on evapotranspiration did not proportionally translate into changes in the carbon flux.

475
The most important feature of SMAP retrieval assimilation observed in this study is the correction of state estimation biases generating from missing physics in the land surface model (unmodeled hydrologic process), i.e., irrigation. Information about the exact quantity and timing of irrigation practices is generally not publicly available except for a few parts across the globe.
The framework described in this paper could possibly be used to infer information regarding irrigation patterns and practices using an inverse method.

480
The utility of L-band radiometry for soil moisture estimation is limited by the soil penetration depth associated with PMW (∼5 cm) and the data gaps in the soil moisture retrievals. These data gaps are due to the presence of snow, ice, frozen soil, dense vegetation, and RFI instances. Therefore, the influence of SMAP soil moisture retrieval assimilation was primarily limited to surface soil moisture, compared to root-zone soil moisture, across locations where SMAP soil moisture retrievals where available for assimilation. However, improvements in the fine-scale spatial and temporal patterns in soil moisture were 485 observed even though the retrievals being assimilated were at a much coarser scale than the model grid (36 km versus 0.05 • ).
These results highlight the potential applicability of the described framework for regions where measured data are scarce as well as where accurate and consistent soil moisture estimates do not currently exist. A follow-on study to be explored based on the results of the described experiments is the routing of streamflow using modeled runoff to analyze the effect of soil moisture assimilation on runoff and river discharge. Antecedent soil moisture conditions affect the soil permeability and infiltration 490 capacity. Therefore, it is expected that improvements in soil moisture estimation could translate into improved streamflow estimates.
Appendix A: Soil texture and landcover across study domain Table A1 presents the predominant soil texture and landcover classes and their respective percentages across the study domain shown in Fig. 1.
495 Table A1. List of soil texture and landcover classes (and their respective percentages) found within the study domain presented in Fig. 1.

Soil texture Landcover
Class no. of grid cells % of total grid cells Class no. of grid cells % of total grid cells