22 Feb 2023
 | 22 Feb 2023
Status: this preprint is currently under review for the journal HESS.

Data worth analysis within a model-free data assimilation framework for soil moisture flow

Yakun Wang, Xiaolong Hu, Lijun Wang, Jinmin Li, Lin Lin, Kai Huang, and Liangsheng Shi

Abstract. Conventional data-worth (DW) analysis for soil water problems depends on physical dynamic models. The widespread occurrence of model structural errors and the strong nonlinearity of soil water flow may lead to biased or wrong worth assessment. By introducing the nonparametric data-worth analysis (NP-DWA) framework coupled with ensemble Kalman filter (EnKF), this real-world case study attempts to assess the worth of potential soil moisture observations regarding the reconstruction of fully data-driven soil water flow models prior to data gathering. The DW of real-time soil moisture observations after Gaussian process training and Kalman update was quantified with three representative information metrics, including the trace, Shannon entropy difference, and relative entropy. The sequential NP-DWA framework was examined by a number of cases in terms of the variable of interest, spatial location, observation error, and prior data content. Our results indicated that the overall increasing trend of the DW from the sequential augmentation of additional observations was susceptible to interruptions by localized surges due to never-experienced atmospheric conditions (i.e., rainfall events) within the NP-DWA framework. Fortunately, this performance degradation can be effectively alleviated by enriching training scenarios or the appropriate amplification of observational noise under extreme meteorological conditions. Nevertheless, a substantial expansion of the prior data content may cause an unexpected increase in DW of future potential observations due to the possible introduction of ensuing observation noises. Hence, high-quality and representative “small” data may be a better choice than unfiltered “big” data. Compared with the observations in the surface layer with the strongest time-variability, the soil water content in the middle layer robustly exhibited remarkable superiority in the construction of model-free soil moisture models. An alternative monitoring strategy with a larger data-worth was prone to a higher DW assessment accuracy within the proposed NP-DWA framework. We also demonstrated that the DW assessment performance was jointly determined by ‘3C’, i.e., capacity of potential observation realizations to “capture” actual observations, correlation of potential observations with the variables of interest, and choice of DW indicators. Direct mapping from regular meteorological data to soil water content within the NP-DWA mitigated the adverse effects of nonlinearity-related interference, which thus facilitated the identification of the soil moisture covariance matrix, especially the cross-covariance.

Yakun Wang et al.

Status: open (until 23 Apr 2023)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2023-34', Anonymous Referee #1, 22 Mar 2023 reply

Yakun Wang et al.

Yakun Wang et al.


Total article views: 277 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
214 56 7 277 1 2
  • HTML: 214
  • PDF: 56
  • XML: 7
  • Total: 277
  • BibTeX: 1
  • EndNote: 2
Views and downloads (calculated since 22 Feb 2023)
Cumulative views and downloads (calculated since 22 Feb 2023)

Viewed (geographical distribution)

Total article views: 268 (including HTML, PDF, and XML) Thereof 268 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
Latest update: 26 Mar 2023
Short summary
To avoid overloaded monitoring cost from redundant measurements, this study proposed a nonparametric data-worth analysis framework to assess the worth of future soil moisture data regarding the model-free unsaturated flow models before data gathering. Results indicated that: (1) the method can quantify the data-worth of alternative monitoring schemes to obtain the optimal one; (2) high-quality and representative ‘small’ data could be a better choice than unfiltered ‘big’ data.