Data worth analysis within a model-free data assimilation framework for soil moisture flow
Abstract. Conventional data-worth (DW) analysis for soil water problems depends on physical dynamic models. The widespread occurrence of model structural errors and the strong nonlinearity of soil water flow may lead to biased or wrong worth assessment. By introducing the nonparametric data-worth analysis (NP-DWA) framework coupled with ensemble Kalman filter (EnKF), this real-world case study attempts to assess the worth of potential soil moisture observations regarding the reconstruction of fully data-driven soil water flow models prior to data gathering. The DW of real-time soil moisture observations after Gaussian process training and Kalman update was quantified with three representative information metrics, including the trace, Shannon entropy difference, and relative entropy. The sequential NP-DWA framework was examined by a number of cases in terms of the variable of interest, spatial location, observation error, and prior data content. Our results indicated that the overall increasing trend of the DW from the sequential augmentation of additional observations was susceptible to interruptions by localized surges due to never-experienced atmospheric conditions (i.e., rainfall events) within the NP-DWA framework. Fortunately, this performance degradation can be effectively alleviated by enriching training scenarios or the appropriate amplification of observational noise under extreme meteorological conditions. Nevertheless, a substantial expansion of the prior data content may cause an unexpected increase in DW of future potential observations due to the possible introduction of ensuing observation noises. Hence, high-quality and representative “small” data may be a better choice than unfiltered “big” data. Compared with the observations in the surface layer with the strongest time-variability, the soil water content in the middle layer robustly exhibited remarkable superiority in the construction of model-free soil moisture models. An alternative monitoring strategy with a larger data-worth was prone to a higher DW assessment accuracy within the proposed NP-DWA framework. We also demonstrated that the DW assessment performance was jointly determined by ‘3C’, i.e., capacity of potential observation realizations to “capture” actual observations, correlation of potential observations with the variables of interest, and choice of DW indicators. Direct mapping from regular meteorological data to soil water content within the NP-DWA mitigated the adverse effects of nonlinearity-related interference, which thus facilitated the identification of the soil moisture covariance matrix, especially the cross-covariance.
Yakun Wang et al.
Status: open (until 23 Apr 2023)
- RC1: 'Comment on hess-2023-34', Anonymous Referee #1, 22 Mar 2023 reply
Yakun Wang et al.
Yakun Wang et al.
Viewed (geographical distribution)
General comments: This study proposes a comprehensive data-driven framework for selecting the optimal observing operations (data-worth analysis) and updating the predictions for soil moisture dynamics. The fully data-driven approach provides a complement to physics-based models, especially for complex real-world scenarios. While the quality of the manuscript is good, there are still some issues that require clarification.
1. A major concern is the conclusions drawn from applying the Gaussian processes and EnKF assimilation techniques. While efficient and simple to implement, these methods have inherent limitations such as excessively smooth predictions (GP) and optimality only for Gaussian linear problems (EnKF). As the soil moisture dynamics are not fully met by these assumptions, the proposed method may experience difficulties, such as the mentioned localized surges. Therefore, some conclusions "high-quality and small data may be better than unfiltered big data" and "the soil water content in the middle layer exhibits remarkable superiority in comparison to the surface with its highest-level variability" may be case-specific rather than generalizable. It is important to consider other data-driven and assimilation methods, such as deep neural networks, particle filtering, and MCMC, leading to potentially different outcomes. I would like to see some clarifications regarding this issue.
2. It is recommended that the methodology section of this paper be better presented. Specifically, the problem setup for moisture prediction, an explicit list of the contents of vectors X and y should be provided prior to section 2.1. This will enable the reader to better understand the proposed data-driven framework.
3. Some techniques have been proposed for better performance in nonlinear problems, e.g., restart, iterations. How will these techniques perform in NP-DWA?
4. L31:"An alternative monitoring strategy with a larger data-worth was prone to a higher DW assessment accuracy within the proposed NP-DWA framework" This sentence is meanlingless and should be removed.
5. Please provide the dimensionality for all the involved vectors and matrices.