the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A comprehensive study of deep learning for soil moisture prediction
Yanling Wang
Liangsheng Shi
Yaan Hu
Xiaolong Hu
Wenxiang Song
Lijun Wang
Abstract. Soil moisture plays a crucial role in the hydrological cycle, but accurately predicting soil moisture presents challenges due to the nonlinearity of soil water transport and variability of boundary conditions. Deep learning has emerged as a promising approach for simulating soil moisture dynamics. In this study, we explore ten different network structures to uncover their mechanisms of data utilization and maximize the potential of deep learning for soil moisture prediction, including three basic feature extractors and seven diverse hybrid structures, six of which are applied to soil moisture prediction for the first time. We compare the predictive abilities and computational costs of the models across different soil textures and depths systematically. Furthermore, we exploit the interpretability of the models to gain insights into their workings and attempt to advance our understanding of deep learning in soil moisture dynamics. For soil moisture forecasting, our results demonstrate that the temporal modeling capability of Long Short-Term Memory (LSTM) is well-suited. Besides, the improved accuracy achieved by feature attention LSTM (FA-LSTM) and the generative adversarial network-based LSTM (GAN-LSTM), along with the Shapley additive explanations (SHAP) analysis, help us discover the effectiveness of attention mechanisms and the benefits of adversarial training in feature extraction. These findings provide effective network design principles. The Shapley values also reveal varying data leveraging approaches among different models. The t-Distributed Stochastic Neighbor Embedding (t-SNE) visualization illustrates differences in encoded features across models. In summary, our comprehensive study provides insights into soil moisture prediction and highlights the importance of the appropriate model design for specific soil moisture prediction tasks. We also hope this work serves as a reference for deep learning studies in other hydrology problems. The codes of 3 machine learning and 10 deep learning models are open sourced.
- Preprint
(1937 KB) - Metadata XML
- BibTeX
- EndNote
Yanling Wang et al.
Status: open (until 21 Oct 2023)
-
RC1: 'Comment on hess-2023-177', Anonymous Referee #1, 02 Sep 2023
reply
The study conducted a comparison of ten different network structures to assess their predictive abilities and computational costs across various soil textures and depths. The results indicate that Long Short-Term Memory (LSTM), feature attention LSTM (FA-LSTM), and generative adversarial network-based LSTM (GAN-LSTM) are effective in soil moisture forecasting. The study also provides insights into the interpretability of the models and emphasizes the importance of appropriate model design for specific soil moisture prediction tasks. Therefore, this study can serve as a valuable reference for the application of deep learning models in soil water dynamics. Overall, the manuscript is well-organized and easy to follow.
However, there are a few minor issues that the authors should consider. Firstly, it would be beneficial to provide a more detailed description of the representativeness of the sites to avoid potential one-sided conclusions. Additionally, conducting a sensitivity analysis for the input factors would provide further justification for the input screening of the deep learning models.
Citation: https://doi.org/10.5194/hess-2023-177-RC1 -
CC1: 'Reply on RC1', Yanling Wang, 10 Sep 2023
reply
We sincerely thank the reviewer for providing such insightful and detailed comments which have greatly improved the quality of the manuscript. Regarding the reviewer's concern about the data description, we have reorganized Section 2 and added detailed station land cover and meteorology information. Moreover, we have conducted a Pearson correlation analysis for screening input variables.
Table 1 presents the comprehensive details for ten selected sites, sorted from high to low soil permeability. These sites are carefully chosen to illustrate the model's generalization ability, and they encompass ten different soil textures and five distinct land cover types. In addition to site basic meteorology information, Table 2 provides a record of climate data for these selected locations. This data includes minimum, maximum, average, and standard deviation values for air temperature and precipitation. Furthermore, our input data correlation analysis in Figure R1 also demonstrates the variations between the stations.
Table 1. Summary of main characteristics of ten sites.
 Sand
Silt
Clay
Land cover
Period
Lat.
Lon.
Monahans-6-ENE
83
6
11
Shrubland
2010-2022
31.62
102.81
Necedah-5-WNW
83
11
6
Grassland
2009-2022
44.06
-90.17
Falkenberg
73
21
6
Cropland, rained
2003-2020
52.17
14.12
AAMU-jtg
53
22
25
Grassland
2010-2022
34.78
-86.55
Cullman-NAHRC
49
27
24
Mosaic Cropland
2006-2022
34.20
-86.80
Cape-Charles-5-ENE
49
27
24
Herbaceous cover
2011-2022
37.29
-75.93
LittleRiver
47
30
23
Grassland
2005-2020
31.50
-83.55
Spickard
35
41
24
Grassland
2010-2022
40.25
-93.72
Weslaco
34
45
21
Cropland, rained
2017-2021
26.16
-97.96
UpperBethlehem
32
38
30
Herbaceous cover
2008-2010
17.72
-64.80
Â
Table 2. Statistical results of P, and TA at 10 station sites
 Min
Max
Mean
Std
Training set
Validation set setset
Test set
Monahans-6-ENE
P
0
80.6
0.85
4.60
2010.04.21-2017.08.25
2017.08.25-2020.02.05
2020.02.05-2022.07.19
Â
TA
-12.78
36.53
19.18
8.86
Necedah-5-WNW
P
0
127.6
2.48
7.23
2009.10.13-2017.08.27
2017.08.27-2020.04.11
2020.04.11-2022.11.26
Â
TA
-28.87
30.47
7.92
11.69
Falkenberg
P
0
35.34
0.73
1.95
2003.01.17-2013.07.07
2013.07.07-2017.01.01
2017.01.01-2020.06.30
Â
TA
-18.19
29.45
9.69
7.82
AAMU-jtg
P
0
175.26
2.44
9.42
2010.02.06-2017.10.07
2017.10.07-2020.04.27
2020.04.27-2022.11.18
Â
TA
-10.83
31.27
16.69
8.24
Cullman-NAHRC
P
0
177.28
2.18
7.73
2006.05.18-2016.04.19
2016.04.19-2019.08.10
2019.08.10-2022.11.30
Â
TA
-10.07
30.61
16.00
8.28
Cape-Charles-5-ENE
P
0
159.10
2.94
9.19
2011.06.15-2018.04.13
2018.04.13-2020.07.22
2020.07.22-2022.11.01
Â
TA
-10.47
32.11
15.67
8.53
LittleRiver
P
0
154.68
2.95
9.62
2005.10.18-2014.04.26
2014.04.26-2017.02.26
2017.02.26-2020.01.01
Â
TA
-4.24
31.99
19.77
7.08
Spickard
P
0
152.91
2.43
8.59
2010.10.08-2018.01.18
2018.01.18-2020.06.22
2020.06.22-2022.11.26
Â
TA
-22.13 22.13
32.31
11.64
11.17
Weslaco
P
0
294.89
1.65
11.66
2017.01.01-2019.08.07
2019.08.07-2020.06.18
2020.06.18-2021.05.01
Â
TA
-1.41
32.46
23.46
6.07
UpperBethlehem
P
0
156.20
2.78
10.12
2008.09.15-2009.09.05
2009.09.05-2010.01.01
2010.01.01-2010.05.01
Â
TA
21.64
28.78
25.93
1.46
In the process of screening input factors, we have carefully selected meteorological inputs based on the precipitation and evapotranspiration calculation. Besides, soil temperature data, along with soil moisture data from the previous day are incorporated to represent the soil condition. Figure R1 displays the Pearson correlation analysis results for input factors at the Cape-Charles and UpperBethlem sites. Notably, the correlation coefficients between soil moisture data and the input data vary greatly with both the station and depth. For instance, while the correlation coefficient between longwave radiation (LW) and soil moisture is low at UpperBethlem, it is significant at Cape-Charles, highlighting the influence of site-specific differences. Although utilizing highly correlated factors as inputs appears to be a logical choice, achieving uniformity across different sites and depths can be a complex task. However, this presents a crucial aspect to explore when evaluating and comparing the performance of models for self-learning screening of significant influencing factors. Therefore, we have chosen to include all eight of these data points as inputs. Figure R2 shows the autocorrelation analysis conducted at 5 soil depths. The autocorrelation coefficients for soil water content at different depths decrease with increasing delay days. The most significant change is observed in the surface layer. As a result, we have opted to use a 4-day delay as our input.
Figure R1. Pearson correlation analysis results among the observed variables of 0.05m and 1.00m at Cape-Charles (a) (b) and UpperBethlem (c) (d) sites.
Figure R2. Autocorrelation analysis results of soil water content with different days delay at Cape-Charles
-
AC1: 'Reply on RC1', Yanling Wang, 11 Sep 2023
reply
We sincerely thank the reviewer for providing such insightful and detailed comments which have greatly improved the quality of the manuscript. Regarding the reviewer's concern about the data description, we have reorganized Section 2 and added detailed station land cover and meteorology information. Moreover, we have conducted a Pearson correlation analysis for screening input variables.
Table 1 presents the comprehensive details for ten selected sites, sorted from high to low soil permeability. These sites are carefully chosen to illustrate the model's generalization ability, and they encompass ten different soil textures and five distinct land cover types. In addition to site basic meteorology information, Table 2 provides a record of climate data for these selected locations. This data includes minimum, maximum, average, and standard deviation values for air temperature and precipitation. Furthermore, our input data correlation analysis in Figure R1 also demonstrates the variations between the stations.
Table 1. Summary of main characteristics of ten sites.
 Sand
Silt
Clay
Land cover
Period
Lat.
Lon.
Monahans-6-ENE
83
6
11
Shrubland
2010-2022
31.62
102.81
Necedah-5-WNW
83
11
6
Grassland
2009-2022
44.06
-90.17
Falkenberg
73
21
6
Cropland, rained
2003-2020
52.17
14.12
AAMU-jtg
53
22
25
Grassland
2010-2022
34.78
-86.55
Cullman-NAHRC
49
27
24
Mosaic Cropland
2006-2022
34.20
-86.80
Cape-Charles-5-ENE
49
27
24
Herbaceous cover
2011-2022
37.29
-75.93
LittleRiver
47
30
23
Grassland
2005-2020
31.50
-83.55
Spickard
35
41
24
Grassland
2010-2022
40.25
-93.72
Weslaco
34
45
21
Cropland, rained
2017-2021
26.16
-97.96
UpperBethlehem
32
38
30
Herbaceous cover
2008-2010
17.72
-64.80
Table 2. Statistical results of P, and TA at 10 station sites
 Min
Max
Mean
Std
Training set
Validation set setset
Test set
Monahans-6-ENE
P
0
80.6
0.85
4.60
2010.04.21-2017.08.25
2017.08.25-2020.02.05
2020.02.05-2022.07.19
Â
TA
-12.78
36.53
19.18
8.86
Necedah-5-WNW
P
0
127.6
2.48
7.23
2009.10.13-2017.08.27
2017.08.27-2020.04.11
2020.04.11-2022.11.26
Â
TA
-28.87
30.47
7.92
11.69
Falkenberg
P
0
35.34
0.73
1.95
2003.01.17-2013.07.07
2013.07.07-2017.01.01
2017.01.01-2020.06.30
Â
TA
-18.19
29.45
9.69
7.82
AAMU-jtg
P
0
175.26
2.44
9.42
2010.02.06-2017.10.07
2017.10.07-2020.04.27
2020.04.27-2022.11.18
Â
TA
-10.83
31.27
16.69
8.24
Cullman-NAHRC
P
0
177.28
2.18
7.73
2006.05.18-2016.04.19
2016.04.19-2019.08.10
2019.08.10-2022.11.30
Â
TA
-10.07
30.61
16.00
8.28
Cape-Charles-5-ENE
P
0
159.10
2.94
9.19
2011.06.15-2018.04.13
2018.04.13-2020.07.22
2020.07.22-2022.11.01
Â
TA
-10.47
32.11
15.67
8.53
LittleRiver
P
0
154.68
2.95
9.62
2005.10.18-2014.04.26
2014.04.26-2017.02.26
2017.02.26-2020.01.01
Â
TA
-4.24
31.99
19.77
7.08
Spickard
P
0
152.91
2.43
8.59
2010.10.08-2018.01.18
2018.01.18-2020.06.22
2020.06.22-2022.11.26
Â
TA
-22.13
32.31
11.64
11.17
Weslaco
P
0
294.89
1.65
11.66
2017.01.01-2019.08.07
2019.08.07-2020.06.18
2020.06.18-2021.05.01
Â
TA
-1.41
32.46
23.46
6.07
UpperBethlehem
P
0
156.20
2.78
10.12
2008.09.15-2009.09.05
2009.09.05-2010.01.01
2010.01.01-2010.05.01
Â
TA
21.64
28.78
25.93
1.46
In the process of screening input factors, we have carefully selected meteorological inputs based on the precipitation and evapotranspiration calculation. Besides, soil temperature data, along with soil moisture data from the previous day are incorporated to represent the soil condition. Figure R1 displays the Pearson correlation analysis results for input factors at the Cape-Charles and UpperBethlem sites. Notably, the correlation coefficients between soil moisture data and the input data vary greatly with both the station and depth. For instance, while the correlation coefficient between longwave radiation (LW) and soil moisture is low at UpperBethlem, it is significant at Cape-Charles, highlighting the influence of site-specific differences. Although utilizing highly correlated factors as inputs appears to be a logical choice, achieving uniformity across different sites and depths can be a complex task. However, this presents a crucial aspect to explore when evaluating and comparing the performance of models for self-learning screening of significant influencing factors. Therefore, we have chosen to include all eight of these data points as inputs. Figure R2 shows the autocorrelation analysis conducted at 5 soil depths. The autocorrelation coefficients for soil water content at different depths decrease with increasing delay days. The most significant change is observed in the surface layer. As a result, we have opted to use a 4-day delay as our input.
Figure R1. Pearson correlation analysis results among the observed variables of 0.05m and 1.00m at Cape-Charles (a) (b) and UpperBethlem (c) (d) sites.
Figure R2. Autocorrelation analysis results of soil water content with different days delay at Cape-Charles
-
CC1: 'Reply on RC1', Yanling Wang, 10 Sep 2023
reply
Yanling Wang et al.
Yanling Wang et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
390 | 147 | 15 | 552 | 5 | 5 |
- HTML: 390
- PDF: 147
- XML: 15
- Total: 552
- BibTeX: 5
- EndNote: 5
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1