Today, the most popular approaches in agricultural forecasting leverage process-based crop models, crop monitoring data, and/or remote sensing imagery. Individually, each of these tools has its own unique advantages but is, nonetheless, limited in prediction accuracy, precision, or both. In this study we integrate in situ and remote sensing (RS) soil moisture observations with APSIM model through sequential data assimilation to evaluate the improvement in model predictions of downstream state variables across five experimental sites in the US Midwest. Four RS data products and in situ observations spanning 19 site years were used through two data assimilation approaches, namely ensemble Kalman filter (EnKF) and generalized ensemble filter (GEF), to constrain model states at observed time steps and estimate joint background and observation error matrices. Then, the assimilation's impact on estimates of soil moisture, yield, normalized difference vegetation index (NDVI), tile drainage, and nitrate leaching was assessed across all site years. When assimilating in situ observations, the accuracy of soil moisture forecasts in the assimilation layers was improved by reducing RMSE by an average of 17 % for 10 cm and
To effectively address pressing global food security challenges, agricultural forecasting tools must exhibit high accuracy and precision across spatial and temporal scales. As process-based crop models offer a system-level representation of many soil and crop processes, they are increasingly recognized as practical forecasting tools in agricultural research (Silva and Giller, 2021; Fer at al., 2021). However, their weakness comes from many unaccounted uncertainties, such as those related to model parameters, initial conditions, and weather (Dokoohaki et al., 2021). Prior studies have shown state data assimilation (SDA) to be a powerful tool to overcome this weakness in process-based crop models (e.g., Dokoohaki et al., 2022a). SDA enables a temporally continuous, high-dimensional scaffold in which a variety of observations can be smoothly integrated using one of many robust, systematic algorithms, such as the ensemble Kalman filter (EnKF; Dietze, 2017; Huang et al., 2019; Liu et al., 2021; Dokoohaki et al., 2022a; Kivi et al., 2022). Through SDA, uncertainty around spatially heterogenous and dynamic properties in agricultural systems can be constrained, thereby increasing precision and accuracy in estimates while decreasing dependence on extensive site-level model calibration (Mishra et al., 2021).
Numerous past studies have used SDA to constrain crop model estimates, using
observations on leaf area index (e.g., Nearing et al., 2012; Ines et al.,
2013; Ma et al., 2013; Chen et al., 2018; Lu et al., 2021), soil moisture
(Kivi et al., 2022), biomass (e.g., Linker and Ioslovich, 2017), and
evapotranspiration (e.g., Huang et al., 2015). For example, a synthetic study by Zhu et al. (2017) found that the assimilation of coarse-resolution surface soil moisture data into a coupled soil water–groundwater numerical model constrained soil moisture estimates in the first 50 cm of the soil profile despite explicitly unaccounted spatial heterogeneity in soil properties. These studies showed how SDA can partially account for the spatial variability in soil hydraulic conductivity across broad regions without explicit model calibration. In addition to incorporating spatial heterogeneity in soil properties, Kivi et al. (2022) demonstrated that the
assimilation of high-quality and frequent in situ soil moisture observations
can substantially improve downstream model predictions of tile drainage,
nitrate (
Alternatively, the assimilation of high-resolution remote sensing (RS) data products dramatically increases SDA applications' range beyond in situ data availability by effectively capturing the spatiotemporal variability of many agricultural state variables, such as vegetation cover and soil moisture, with consistency and high temporal frequency (Peng et al., 2017). As a result, RS observations could be invaluable to constraining model predictions at the regional scale and have been increasingly applied for agricultural forecasting in the data assimilation literature, as demonstrated in literature reviews by Dorigo et al. (2007), Huang et al. (2019), and Weiss et al. (2020). The application of RS soil moisture data products has been especially popular and successful in data-assimilation-focused agricultural forecasting studies. These data products, which characterize soil moisture content in the first 5 cm of the soil profile, pull information from active and/or passive sensors of microwave reflectance. Due its high sensitivity to surface soil moisture, many data products have been developed around available L-band microwave sensor information collected by NASA's SMAP Mission (Kumar et al., 2018). The SMAP–HydroBlocks (SMAP–HB) data products merges SMAP data with the HydroBlocks land surface model to increase spatial resolution in the final estimates and improve scalability (Vergopolan et al., 2021b), while the SMAP–Sentinel1 data product pairs SMAP data with Sentinel-1 radar information to achieve similar goals (Das et al., 2019). Others, like the ESA CCI data product (Dorigo et al., 2017), compile information from multiple sensors, including the SMAP passive sensor, to allow for greater temporal coverage. However, this approach comes at the cost of coarser spatial resolution.
Nonetheless, as demonstrated in past studies, the assimilation of RS soil moisture data has its limitations. First, uncertainty and biases in RS data products are typically poorly defined (Huang et al., 2019). RS-based data products are based on empirical relationships, and, as they are predicted as a function of surface reflectance, uncertainties in the raw radiance will propagate unsupervised into final estimates (Weiss et al., 2020). Additionally, RS estimates characterize soil moisture in only the top 5 cm of the soil profile and, thus, rely on models or empirical parameterizations to describe the root zone soil profile. Among others, de Lannoy et al. (2007) and Monsivais-Huertero et al. (2010) both found the assimilation of in situ near-surface soil moisture observations to be far less effective than that of in situ RZSM observations in constraining estimates of the greater soil water profile. Yet, since the surface layer is typically the layer where fertilizers are added, the accurate estimation of surface layer state variables is essential for today's agroecosystems. To overcome relatively coarse spatial resolution in RS data products, past studies have explored downscaling approaches (e.g., Chakrabarti et al., 2014) or leveraged additional in situ datasets (e.g., Liu et al., 2021) to overcome “mismatch” challenges and downscale RS soil moisture estimates to more accurately reflect field scale measurements (Vergopolan et al., 2021a). However, the reliance on in situ observations of these approaches can limit system transferability across broad regions (Peng et al., 2017). Moreover, as described by Crow et al. (2012), it can be difficult to properly evaluate coarse-resolution soil moisture estimates with point-scale ground measurements due to unknown and often significant sampling uncertainty. Data assimilation with process-based models has been previously applied as a robust and scalable way to leverage information in coarse-resolution soil moisture estimates (e.g., Vergopolan et al., 2021b).
Despite the immense theoretical potential of SDA with both in situ and RS observations, past studies have reported inconsistent SDA performance in
modeling crop yields. For example, de Wit and van Diepen (2007) observed
inconsistencies in yield constraint when assimilating soil wetness index (SWI) derived from 0.25
To bridge this knowledge gap, we present a comprehensive assessment of soil
moisture data assimilation as a method for constraining crop model predictions across the US Midwest. Building on the assimilation framework in Kivi et al. (2022), we independently assimilated both in situ and RS soil
moisture observations in the APSIM crop model at five experimental sites in
the US Midwest. With field data covering 19 site years of corn and soybean
cropping systems across the region, this study tests the data assimilation
system across a broader GxExM inference space and quantifies the benefit of
assimilating different RS soil moisture products in comparison to the in situ soil moisture observations. The main objectives of this study were
to quantify how in situ soil moisture observations can constrain crop model forecasts of downstream estimates, including RZSM, crop yield, crop phenology via normalized difference vegetation index (NDVI), tile drainage flow, and to quantify the added benefit of RS soil moisture observations in improving crop model predictions of RZSM, crop yield, and crop phenology via NDVI through SDA.
Section 2.1 and 2.2 describe the five experimental sites and the in situ observations employed in this study for model setup, SDA, and evaluation. Section 2.3 outlines the four different RS soil moisture data products that were assimilated, and Sect. 2.4 presents the data-assimilation system used in this study. Lastly, Sect. 2.4.5 defines the different simulation experiments performed.
This study focused on five experimental sites across the US Midwest with in situ observations of soil moisture, crop yield, nitrate load, and tile drainage flow for 19 sites
To properly set up the APSIM model for each of the five sites, we included
all available site information on each year, cropping system, residue type,
planting and harvesting details, tillage practices, and fertilizer applications as constants in the simulations. Following updated information
available through Moore et al. (2021), the IL site includes tillage practices
in the model setup and increased nitrogen (N) fertilizer from 64.6 to 202 kg N ha
Left: site map (ESRI). Right: scatterplot demonstrating site-year total precipitation and average daily temperature (
Across the study site years, subdaily soil moisture (SM) observations were
collected at various soil depths between 10 and 105 cm using soil sensors;
the measured depths and sensor type varied by site. All observations are
available in units of volumetric water fraction (VFW; mm mm
Data on harvested yield for the TD sites were available for each site year
with 1–3 replicated measurements. These replicated observations were averaged and converted from grain at standard moisture content (i.e., 15.5 % for maize and 13 % for soybean) to dry-grain weight for best
comparison with the APSIM model output. Observations for IL were already
recorded as dry-grain weights and given in units of kg ha
The normalized difference vegetation index (NDVI) can be used to quantify
vegetation greenness and reasonably track the phenological development of
crops (Gao and Zhang, 2021). In this study, NDVI observations from Landsat between 2011 and 2019 were used to evaluate APSIM's performance in predicting crop phenology for each site year. NDVI time series were extracted at each site location from Landsat 7 and 8 remote sensing imagery courtesy of the US Geological Survey via Google Earth Engine and derived from the red (RED) and near-infrared (NIR) spectral bands using the following equation:
Daily observations of tile drainage flow (mm) and
Overview of remote sensing soil moisture data products.
To assess the performance of SM data assimilation with satellite-based observations, we included four RS data products that span different temporal and spatial resolutions (Table 1). These observations were extracted at the point level for the study sites and serve to represent the first 5 cm of the soil profile or surface SM. Observations from the winter months (i.e., December–March) were removed to avoid issues with snow cover and freezing soils. The product IDs provided in Table 1 will be used to identify each data product.
The RS dataset with the coarsest spatial resolution in this study was the European Space Agency Climate Change Initiative (ESA CCI) SM product. Each year, the ESA CCI algorithmically merges information from 3 active (e.g., ASCAT A/B) and 10 passive (e.g., SSM/I, AMSR-E, SMOS, SMAP) microwave sensors to estimate daily surface SM globally for over 40 years. Dorigo et al. (2017) provide complete documentation on how these data products are produced. Here we used the combined product (version v06.1), which includes daily uncertainty estimates. Several past studies have assimilated this data product into process-based models with varying levels of success (e.g., Zhou et al., 2016; Liu et al., 2017, 2018; Naz et al., 2019).
The SMAP–HydroBlocks surface SM dataset has the highest spatial resolution in this study. It was introduced by Vergopolan et al. (2021b) by combining the HydroBlocks land surface model, a tau–omega radiative transfer model, machine learning, in situ SM observations, and SMAP remotely sensed satellite observations to estimate surface SM with 30 m resolution across the contiguous United States. Specifically, the HydroBlocks model was coupled with a tau–omega radiative transfer model (HydroBlocks–RTM) and used to simulate SM, soil temperature, and brightness temperature at a 3 h, 30 m resolution. Brightness temperature estimates from NASA's Soil Moisture Active Passive (SMAP) mission were then merged with the HydroBlocks–RTM estimates using a spatial cluster-based Bayesian merging scheme (Vergopolan et al., 2020). Using the inverse HydroBlocks–RTM, SM was estimated at SMAP overpass time at 30 m spatial resolution. Vergopolan et al. (2021b) reported an RMSE of 0.07 mm
The SMAP–Sentinel1 SM product was produced by merging information collected
by the SMAP L-band radiometer and the Copernicus Project Sentinel-1 C-band
radar. After the malfunction of the SMAP radar in 2015, Sentinel-1 active
microwave data were used with passive microwave sensor information from the
still-operating SMAP radiometer to estimate surface SM content globally
using the active–passive algorithm. Although the merged product increased
the revisit interval from 3 to 12 d, it enabled retrievals at two different spatial resolutions (i.e., 1 and 3 km; Lievens et al., 2017). Upon comparing the estimates with in situ SM measurements, Das et al. (2019) reported RMSE for SMAP–Sentinel1 SM estimates as roughly 0.05 m
This study uses the data-assimilation system developed and evaluated in Kivi et al. (2022). The original system leveraged the pSIMS platform, APSIM crop model, ensemble Kalman filter (EnKF), and an algorithm presented by Miyoshi et al. (2013) to estimate and propagate uncertainties, perform sequential data assimilation, and generate daily agricultural forecasts at the field scale. The workflow is illustrated in Fig. 2. APSIM management variables that were known include planting and harvest dates, fertilizer amount, type, and timing, tillage type, depth, and timing, crop type, row spacing, sowing density, and, if available, planting depth.
Schematic demonstrating the workflow of the data assimilation system. System inputs represented by blue normal distributions have incorporated uncertainty in this study, while green rectangles represent known values that were included as constants.
Initial soil water, cultivar, and residue weight were randomized across model ensembles for each site to incorporate uncertainty around initial conditions. If unavailable in the management data, planting depth was also randomized and drawn from different prior distributions for each crop. These distributions represented reasonable planting depth ranges for the two crops in the Midwest, as described in extension websites produced by the University of Missouri (Luce, 2016) and Michigan State University (Staton, 2012). Using a uniform prior distribution, planting depths ranged from 1.5 to 2.5 in. (3.8–6.35 cm) for maize and 1 to 2 in. (2.5–5 cm) for soybean.
Prior distributions were also set to incorporate uncertainty around cultivar. For maize, nine cultivar parameters were grouped into an ensemble, including the six cultivar parameters (i.e., tt_flower_to_maturity, tt_flower_to_start_grain, tt_maturity_to_ripe, tt_emerg_to_endjuv, head_grain_no_max, grain_gth_rate). The other three parameters (i.e., largestLeafParams1, leaf_init_rate, leaf_app_rate1) were drawn from Dokoohaki et al. (2022b), who identified maize cultivar parameters that were influential for estimates of leaf area index (LAI) in the APSIM maize module and optimized their value distributions using a hierarchical Bayesian optimization approach across the US Midwest. Table A2 gives more detailed information on all randomized parameters and their prior distributions. We completed a preliminary assessment of the maize module at each of the study sites and found that, under the given parameter value ranges, APSIM was capable of appropriately simulating the phenological development and grain yield for maize at each site.
The selection of soybean cultivars for each site was determined using a
semi-systematic approach. First, a range of maturity groups were determined
for each site based on a study by Mourtzinis and Conley (2017), which delineated soybean maturity groups across the United States. We defined the upper and lower maturity group bounds for each site using the bounding zone contour
lines for each site location in Fig. 4 of Mourtzinis and Conley (2017). Then, initial APSIM simulations were performed for each site using all APSIM-defined soybean cultivars falling within the prescribed maturity group range. The model results were compared to the observed soybean yields at
each site, and the best-performing maturity group (MG) for each site was
determined. The final range for each site was approximately MG
To incorporate uncertainty around soil and weather into our simulations, a Monte Carlo sampling approach was used to randomly assign ensembles of weather and soil drivers to model ensembles. For each study site, 10 weather ensembles from the ERA5 reanalysis data product were employed to characterize solar radiation, maximum air temperature, minimum air temperature, precipitation, and wind speed at the daily resolution and at each site location. ERA5 is a global gridded reanalysis data product from the European Centre for Medium-Range Weather Forecasts (ECMWF), which characterizes the weather state variables at hourly time steps with associated uncertainties (Hersbach et al., 2020). In addition, 25 soil ensembles were generated from the SoilGrids global gridded soil database (Hengl et al., 2014) for each site location. These ensembles cover 30 soil properties (including available water lower limit, bulk density, drained upper limit, organic carbon, soil class, and pH) and were created by sampling from each soil parameter mean and uncertainty values available in the SoilGrids dataset.
Since APSIM does not currently estimate NDVI, APSIM was coupled with the PROSAIL model described in Dokoohaki et al. (2022b) to estimate daily NDVI values and enable the appropriate evaluation of the model's simulation of crop phenology at the study sites. The PROSAIL model is a radiative transfer tool that combines PROSPECT, a leaf optical properties model, and SAIL, a canopy bidirectional reflectance model, to estimate spectral reflectance for a given vegetative area based on soil and plant/canopy properties (Jacquemoud et al., 2009). In this study, APSIM's daily forecasts of soil and plant variables were transformed and used as inputs into the PROSAIL model to compute the spectral reflectance for each ensemble. Then, for each day and ensemble, the estimated spectral information was used to estimate NDVI using the vegetation index function within the hsdar R library (Lehnert et al., 2019). Further details on the coupling protocols can be found in Dokoohaki et al. (2022b).
The data-assimilation system (which we will call EnKF–Miyoshi hereinafter)
employs the ensemble Kalman filter (EnKF) to assimilate SM observations into
the APSIM model. The EnKF merges information from the model ensemble forecast distribution and observations (with associated uncertainty) at each time step to optimally estimate the state of the system (Evensen, 2003). The system also leverages the Miyoshi algorithm in series with the EnKF to improve estimates of the two system uncertainty matrices (i.e., The mean ( The observed distribution ( The Kalman gain ( The analysis distribution, which assumes a normal distribution, is determined with mean ( The model ensemble is updated at each time step according to the analysis distribution based on each ensemble's likelihood within the forecast distribution.
However, the EnKF–Miyoshi workflow as established cannot robustly handle
observation operators (
In this study, the GEF was applied over the EnKF–Miyoshi workflow when (1) more than one observation was assimilated for a single state variable at a given time step or (2) the number of available observations varied
throughout a simulation (i.e., changing
All simulations in this study were performed with 100 ensembles and with a 4-month initialization period starting on 1 January of the first year at each site. There were nine different simulations performed for each site in this study, which varied in terms of observations assimilated and assimilation method applied (Table 2). First, two “baseline” runs were completed across all 19 site years to establish system performance benchmarks. As a lower bound on performance, a free model simulation was performed with no data assimilation. SM sensor observations were also assimilated into the model to represent a reasonable benchmark data assimilation setting. Next, two groups of runs were performed to test the assimilation of RS SM data products: “individual” and “additive” runs. In the “individual” runs, all four RS data products were assimilated independently within the system. These runs were performed to compare the value of different RS data products directly. Then, in the “additive” runs, observations from multiple RS data products were jointly assimilated into the system following an additive approach. The first iteration included only ESA observations, and each subsequent iteration added another data product until all four data products were included (i.e., ALL). Data products were added in succession based on availability, such that the first data product tested had the highest average number of observations per year. By sequentially adding new data products, the additional impact of each RS data product could be evaluated. To allow for the application of the GEF in runs with more than one data product, a minimum of two observations per day were required for the “additive runs” to ensure the convergence of the MCMC algorithm. For all runs where RS data were assimilated, only site years after 2014 were investigated due to the limited temporal extent of RS data products.
Overview of system configuration for the nine runs performed in this study. SDA methods include the ensemble Kalman filter (EnKF) coupled with the Miyoshi algorithm and the generalized ensemble filter (GEF). The former method of these two methods provided systematic estimates of
This study applied the year-average ensemble weighting strategy, as presented in Kivi et al. (2022), to leverage all available information from the simulations and evaluate the results more accurately. In each site-year simulation, daily weights were assigned to each ensemble as the likelihood of producing the daily estimate given the analysis distribution, and ensemble weights were normalized across the model ensemble for each day. Finally, the average annual weight for each ensemble was computed for each site year. The application of annual weights in the analysis was the most robust for evaluating yearly estimates (e.g., yield, cumulative
To evaluate the accuracy and precision of model forecasts for each site-year
simulation, we utilized the root mean squared error (RMSE), spectral norm,
and weighted variance. RMSE was calculated for each run to quantify changes
in accuracy between runs, while the spectral norm and weighted variance were
employed to quantify changes in precision. Additionally, to help standardize
accuracy measures across site years, a normalized RMSE (nRMSE) was
calculated as
To identify and quantify relationships between variables, one of two correlation statistics was employed depending on the sample size of the data. When comparing data with a sufficiently large sample size (
The results in Sect. 3.1 evaluate the forecast accuracy and precision of in situ SM SDA in comparison to the free model. Section 3.2 investigates changes in forecast accuracy and precision when assimilating SM RS observations. The individual runs are assessed with regard to their data characteristics (i.e., retrieval interval and single vs. multi-sensor development), and the additive runs are evaluated in succession to determine the relative impact of added observations. Lastly, the impact of RS-based SDA on the forecast accuracy and precision of state variables is investigated and compared.
Across all assimilation time steps, the free model tended to overpredict SM
within the two assimilation layers (Fig. 3). Therefore, the adjustment in the SDA analysis step typically reduced the total amount of water in the soil profile. In SM forecasts for the two assimilation layers (i.e., SM3 and SM4), SDA performed as well or better than the free model in accuracy across all site years. The median change in RMSE due to SDA was
One-to-one plots for soil moisture estimates (mm mm
Boxplots demonstrating the distribution of relative change in
The three site years where precision was not increased in SDA include OH in 2013 and 2014 and MN in 2013. Interestingly, these site years were among those with the most remarkable improvement in accuracy. This relationship is intuitive considering the nature of the Miyoshi algorithm, which systematically inflates model forecast uncertainty at time steps when observed and forecasted SM distributions differ substantially. At the cost of reduced forecast precision, such inflation allows for the filter to pull the model forecast toward the observed distribution and improve accuracy in future predictions.
SDA's constraint of SM3 and SM4 also led to the indirect constraint of SM in
deeper soil profile layers. Across all site years with available data, the
median change in RMSE for SDA estimates of SM5, SM6, and SM7 was
Overall, in comparison to the free model, SDA improved yield estimates by explaining 17.7 % more variation in observed yield values and improving yield accuracy in 63 % of site years (Table 3). SDA accuracy was most effective in site years facing greater water stress. In those cases where yield estimates were improved, SDA often increased available soil water at critical points in crop development, reducing crop soil water deficit factors and increasing yield compared to the free model (Fig. A1). The most evident example of SDA yield improvement is IN in 2012, where the free model estimated complete maize crop failure (i.e., no grain yield) due to leaf senescence in mid-July, but SDA estimated a harvestable crop due to increased soil water in the early season (Fig. 5). However, SDA's impact on yield precision was inconsistent; roughly 53 % of site years saw reduced precision in yield estimates.
Time series of yield estimates for the free model and in situ SDA with mean daily estimates demonstrated with line graphs and the 95 % credible intervals demonstrated by the shaded regions. Black points represent the observed harvest date and yield for each site year.
Summary statistics to quantify the impact of in situ SDA (IS) and RS-SDA (RS) on forecast accuracy of APSIM state variables. The “
Overall, the free model accurately captured the phenological development of the cropping systems simulated in this study, as demonstrated by the good
agreement between observed and simulated NDVI (Fig. A2). SDA's impact on
NDVI accuracy was similar to its impact on yield accuracy, such that it typically either increased accuracy due to lessened water stress or did not
substantially affect the model performance. A comparison of
Across the 19 site years, the free model and SDA showed overall poor performance in estimating annual drainage with nRMSE values ranging from 18 %–215 % with a median value of 54.3 % for SDA and from 20 %–250 % in the free model with a median value of 52.4 %. In the site years with the lowest accuracy, APSIM often overpredicted drainage in both the free model and SDA. However, these cases of considerable overestimation in drainage were also among those site years that were most improved by SDA; 8 of the 11 site years where SDA improved estimates of annual drainage were cases where the free model overestimated tile flow. In these scenarios, SDA functioned to remove available water from the soil profile and correctly lower the amount of water lost from the system. In the remaining site years where SDA did not improve drainage accuracy, SDA increased RMSE values by 32 % on average. SDA's impact on precision for annual drainage estimates was highly variable. A total of 63 % of site years saw improvement in precision, but 4 site years saw an immense reduction in precision (i.e., between 107 %–146 % reduction).
APSIM also struggled to accurately estimate the annual
As expected, the individual influence of each RS data product was heavily dependent on its multi- or single-sensor design and temporal availability. ESA, the most widely available data product, had the greatest impact on both assimilation and downstream state variables. In contrast, assimilation with 1 and 3 km imposed only slight changes in estimates when compared to the free model. However, ESA did not always lead to improvements in model performance. As demonstrated in Fig. 6, ESA results were more variable across site years in terms of the accuracy of state variable estimates, in some cases leading to great improvement and, in other cases, leading to reduced performance. ESA reduced accuracy in predicting SM3 and SM4 in most site years (i.e., 80 %–90 %) but was the most effective in improving accuracy in estimates of annual yield, SM6, and SM7. ESA also outperformed the other three RS data products in constraining forecast precision for all state variables, improving precision in 70 %–100 % of site years. Importantly, it showed the greatest reduction in the spectral norm of the SM covariance matrix when compared to the free model, indicating the best constraint of SM precision across the entire profile.
Boxplots demonstrating the distribution of relative change (%) in state variable accuracy (RMSE) and precision (weighted variance) for the
Alternatively, the assimilation of SMAP–HB, another temporally frequent RS data product, demonstrated more conservative performance than ESA across state variables. For almost all state variables, it also performed similarly or better than the free model. However, any improvements (or reductions) in forecast accuracy were more moderate than observed with ESA. For example, accuracy in yield estimates was improved more consistently with SMAP–HB (90 %) compared to ESA (70 %), but the maximum improvement in a tested site year was a 53 % accuracy increase compared to a 95 % increase with ESA. This trend in the results highlights an important trade-off when assimilating more certain observations (i.e., ESA CCI) at a coarse spatial resolution over less certain observations at high spatial resolution (i.e., SMAP–HB) when both data products have unknown biases. In terms of forecast precision, SMAP–HB was overall quite effective in constraining state variable predictions, especially when compared to 1 and 3 km. However, SMAP–HB underperformed compared to ESA in this regard; 1 and 3 km both underperformed in accuracy constraint when compared to ESA and SMAP–HB, showing little to no change in RMSE compared to the free model.
Considering the four individual runs, more frequent assimilation time steps
also led to a more robust performance of the EnKF–Miyoshi workflow. Filter
divergence (i.e., when the observed mean falls outside of the 95 % credibility interval of the analysis distribution) occurred at 52 % and
59 % of analysis time steps for 1 and 3k̇m, respectively, but occurred at
only 44 % and 30 % of analysis time steps for SMAP–HB and ESA, respectively. For estimates of observation uncertainty, the Miyoshi algorithm predicted greater uncertainty for most RS observations than what is reported in the literature. The average standard error in ESA observations was reported to be
The baseline run for the additive RS-SDA runs was ESA, which demonstrated
inconsistent constraint of forecast accuracy and strong constraint of forecast precision. The second most available data product, SMAP–HB, was the
next RS data product added to the system. New SMAP–HB observations, on average, imposed a
The subsequent additions of the sparser 1 and 3 km RS data products were
less impactful than the addition of SMAP–HB. New 1 km observations imposed an average
When considering the impact of surface SM data assimilation on downstream model variables, we focus on results where all available RS observations were assimilated for each site. Hereinafter, we refer to the compilation of these runs across the five sites as RS-SDA.
Time series of SM1 estimates from the free model and RS-SDA with the mean daily estimates demonstrated with line graphs. The shaded regions indicate 95 % credibility intervals.
Overall, RS-SDA had minor impacts on the soil water profile relative to the free model. Figure 7 demonstrates differences between the free model and RS-SDA in SM1 estimates. For several site years, RS-SDA estimated significantly higher SM1 values in the early growing season (i.e., May–June). In the late season and fall, RS-SDA often estimated lower SM1 values. The impact of these SM1 changes on lower layer SM values seemed to decrease with depth, such that differences between the free model and RS-SDA mean estimates were more subtle in deeper layers. This reduced impact on lower layers is also, in part, a reflection of the increasing total soil water volume represented by soil layers down through the profile (see Table 3 for layer depths). Nonetheless, any differences in SM estimates did not lead to notable changes in accuracy for any SM layer (Table 3). Notable changes were visible in the soil water deficit factors for several growing seasons, such that RS-SDA led to reduced water stress for the growing crop. We speculate that this results from increased available soil water in the root zone during initial periods of crop water uptake (i.e., June). Forecast precision for soil-water-related estimates also did not change substantially with assimilation. For SM1 estimates, assimilation substantially reduced variability across site years (Fig. 7). In many cases, this constraint in the surface soil layer did not propagate into significant changes for precision in lower layer estimates (Fig. 7). However, on average, precision was improved rather than reduced with assimilation, with the most significant downstream constraint in the soil layers closest to the surface.
RS-SDA demonstrated partial constraint of aboveground estimates. Considering
the
In this study, the extent to which in situ SM data assimilation affected APSIM model predictions depended on each state variable's sensitivity to the assimilated state variable (i.e., soil moisture). Deeper layer SM estimates – the most sensitive state variables to SM3 and SM4 – were the most strongly constrained. Figure A1 demonstrates the significant linear relationship between daily changes in forecasted SM3 and SM4 due to SDA and daily changes in SM estimates for all deeper soil layers. As expected with a cascading water balance model, the strength of the linear relationship weakens as the vertical distance between soil layers increases. In the model, SM in each layer can influence SM estimates of deeper soil layers, but only indirectly through its influence on the SM in the layer immediately below it. Therefore, the influence of the assimilation layers is reduced by each subsequent SM process down through the soil profile and is weakest in the final soil layer (SM7). Nevertheless, the constraint of SM7 was still quite strong in SDA. By assimilating SM for two upper soil layers, the accuracy of SM estimates improved immensely by simply leveraging the pre-existing model structure (compare to Liu et al., 2017).
Crop yield showed the next strongest constraint in SDA. However, as noted in previous studies, its sensitivity to SM SDA was conditional (Lu et al., 2021; Kivi et al., 2022). While changes in SM affected lower layer SM at all analysis time steps, crop yield was only affected when the changes impacted crop water stress. Daily crop water uptake is determined in APSIM as the minimum of crop water demand and soil water supply. Therefore, SDA could only influence crop yield when the soil water adjustment pushed the water supply above or below the demand threshold. For this reason, greater SDA improvement was found in crop yield estimates during water-stressed site years. Other pathways through which SM can impact crop yield in APSIM, like soil N cycling, did not play a strong role in this study.
The impact of SM SDA on APSIM drainage estimates can also be beneficial, given certain conditions. As shown in the results, drainage was affected by SM3 and SM4 through two pathways: (1) changes in total soil water with assimilation adjustment and (2) changes in crop water uptake due to changes in crop water stress. The role of each of these pathways varied over the year, such that the presence of a growing crop and root system weakened the sensitivity of drainage estimates to changes in the assimilation layers. To quantify this change in sensitivity, we divided daily model forecasts into two categories: with crop water uptake (June–September) and without crop water uptake. Then, the relationship between changes in SM3 and SM4 and changes in drainage was analyzed separately for each group. There was no significant linear relationship when looking at SM3 changes in either case. However, the linear relationship between changes in SM4 and changes in daily drainage was stronger when no crop was present (
Among the state variables considered in SDA,
The assimilation of RS surface SM observations imposed a far weaker constraint on APSIM state variables compared to the assimilation of the soil sensor observations. For example, the median reduction in SM RMSE ranged from 7 %–27 % across different layers of the soil profile with soil sensor observations, but, with RS observations in RS-SDA, it ranged from roughly 1 %–5 % (Table 3). The weakened constraint with RS-SDA was likely more than an issue of observation inaccuracies. Instead, there is greater evidence to show that changes in SM1 simply had less influence on downstream state variables than changes in SM3 and SM4. This is due, in part, to the increased vertical distance between the surface SM layer (SM1) and other observed soil layers (i.e., SM3–7). The APSIM SoilWat module operates as a cascading water balance model to estimate the movement of water and solutes between and across soil layers (Dokoohaki et al., 2018). Thus, the assimilation adjustment of the SM1 estimate would not be as strongly tied to lower layer estimates when using a top-down approach. Yet, surface SM data assimilation notably changed SM2 estimates, the SM estimates for the layer just below it. This result reflects the findings of Lu and et al. (2019), who assimilated RS surface SM observations into a surface energy balance model. They found that SDA improved SM estimates in the second layer to a greater extent than in lower layers when comparing estimates to observations. Since observations were not available for SM2 at the study sites, this hypothesis could not be tested within this work.
The two assimilation protocols (i.e., assimilation of SM1 vs. assimilation of SM3 and SM4) were also markedly different in the quantity of soil water associated with their assimilation adjustments. Where soil layers 3 and 4 corresponded to almost 14 % of the soil profile (20 cm depth), the near-surface soil layer only corresponded to about 3.6 % of the soil profile (5 cm depth). Thus, when considering the top-down effect of SM assimilation on lower layers, each adjustment with RS assimilation had just 25 % of the impact of the previous system given the same adjustment in volumetric soil water content. This 5-fold reduction in potential impact closely mirrors the change in RMSE reduction for SM layers highlighted above (i.e., 7 %–27 % to 1 %–5 %). One way to overcome this limitation of surface SM is to leverage the strong covariance between SM1 and SM in nearby layers (i.e., SM2) to directly nudge their values within the analysis time step using, for example, an augmented state vector (e.g., Kivi et al., 2022) or exponential filter approaches (e.g., Albergel et al., 2008).
RS surface SM data assimilation still demonstrated strong potential for improving APSIM forecasts within this study. First, the assimilation of surface SM improved estimates of crop yield overall when compared to the free model, with a median RMSE reduction of 17.2 %. Past RS SM data assimilation studies had similar success in improving crop yield estimates, and several attributed the improvement to increased surface SM and reduced crop water stress with SM assimilation (e.g., Ines et al., 2013; Chakrabarti et al., 2014). We speculate that the model performance indicates that water stress likely played an important role. Although direct observations are not available for crop water uptake to test this hypothesis, we suspect RS-SDA accurately increased available soil water at critical growth stages and, thus, increased crop water uptake.
The four different RS SM data products varied quite broadly in spatial
resolution, varying from 30 m to 0.25
When comparing RS data products in this study, it is important to recognize that all data products considered in this work are based, in part, on SMAP radiometer data. SMAP–HB merged SMAP brightness temperature data with the HydroBlocks–RTM model, ESA includes SMAP as one of its 10 passive microwave sensors, and 1 and 3 km rely on SMAP for passive microwave information within their derivation. In the first iteration, ESA contributed most of the information provided by the SMAP radiometer to the model and, therefore, imposed large changes in SM1 estimates. Then, with each additional data product, the overall impact on the analysis distribution weakened, as much of the new information had already been provided to the system. It is also important to note that given that all data products directly or indirectly are based SMAP, the successive assimilation of these data products can introduce error covariances between the model runs and the observations. This may potentially result in an over- or underestimation of the uncertainty, thereby affecting the performance of the filter. Therefore, further investigation into the impact of including these error covariances between the data products is deemed necessary in order to enhance the accuracy of the EnKF filter.
The Miyoshi algorithm often estimated higher observation uncertainty (
In the study, we assessed the extent to which soil moisture data assimilation can improve APSIM model forecasts. We used a generalizable and novel data-assimilation system to assimilate RS and in situ soil moisture measurements across the US Midwest 19 site years and evaluated how a direct soil moisture constraint affected downstream model estimates of root zone soil moisture, crop yield, tile flow, and nitrate leaching. Our results highlighted the capacity of soil moisture data assimilation to improve model estimates of crop yield in water-limited conditions, increasing crop water uptake at critical points in the growing season. Soil moisture data assimilation also improved estimates of soil moisture throughout the profile in most cases but did not well constrain nitrate leaching or tile drainage. This indicates a need for better constraint of both the soil water and soil nitrogen cycles in the APSIM model.
This work also lays the groundwork for future regional applications of soil moisture data assimilation. Importantly, our findings reaffirmed soil moisture data assimilation's ability to “localize” gridded weather estimates of precipitation to reflect observed values more accurately. Since cropping systems are highly sensitive to precipitation inputs, this is a strong advantage of soil moisture data assimilation for forecasting applications where coarse-resolution weather drivers are employed. Though RS soil moisture data assimilation could be an effective way to overcome limited availability of in situ data, our work shows that assimilation of in situ surface soil moisture is not as powerful as the assimilation of in situ root zone soil moisture values in terms of model constraint. If the former is applied, additional constraints or an augmented state-vector approach would be necessary to achieve higher system performance. When selecting a RS soil moisture data product for data assimilation applications, high temporal resolution due to multi-sensor satellite availability and accurately estimated observation uncertainty are two critical components for optimal system performance. To that same point, combining several data products at different spatial resolutions can help to reduce assimilation intervals within the system. Further investigation is needed to independently test the impact of observation sample size (i.e., number of data products), temporal resolution, spatial resolution, and uncertainty on system performance. Moreover, the data products considered in this study do not represent the full range of RS soil moisture data products that are available publicly. This work should be expanded to evaluate data products derived from other satellites/derivations both individually and in combination with other sources to exhaust all available options.
Site management information as defined across all APSIM simulations in this study.
Continued.
Prior distributions for model ensembles.
Scatterplots comparing change (i.e., SDA – free model) in mean SM5, SM6, SM7, daily drainage, and daily
Time series of NDVI estimates from the two schemes for each site year with the mean daily estimates demonstrated with line graphs and the 95 % credibility interval demonstrated by the shaded regions. Black points represent the observed values.
Time series of cumulative
Code and observational data used in this study will be provided upon request.
MK was responsible for code development, performing the simulations, and writing the manuscript. NV contributed to revising the manuscript and provided the SMAP–HB dataset. HD was responsible for developing the initial idea, code development, writing, and supervising the study.
The contact author has declared that none of the authors has any competing interests.
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The authors would like to thank all those on the Energy Farm team who made the presented case study possible. In particular, we would like to thank Carl Bernacchi, Bethany Blakely, Michael Masters, Grace Andrews, and Heather Goring-Harford, who made the Energy Farm dataset available and performed the analyses for the nitrate leaching data, and Konrad Taube and Haley Ware, who helped with water collection and water filtering in 2018 and 2019. We also want to thank Caitlin Moore and Evan Dracup, who helped to collect and process much of the other data from the plot. Additionally, we wanted to acknowledge those funding sources that supported the work of the Energy Farm team. First, the data used in this study were funded in part by (1) the Leverhulme Centre for Climate Change Mitigation, funded by the Leverhulme Trust through a Research Centre award (RC-2015-029); (2) the Center for Advanced Bioenergy and Bioproducts Innovation (CABBI) at the University of Illinois; and (3) the Global Change and Photosynthesis Research Unit of the USDA Agricultural Research Service.
This paper was edited by Alexander Gruber and reviewed by Warrick Dawes and Svitlana Kokhan.