Downscaling satellite-derived soil moisture in the Three North region using ensemble machine learning and multiple-source knowledge integration

Liu, Kai; Zhang, Hongyan; Bo, Yong; Li, Dehui; Li, Long; Li, Hang; Wang, Shudong; Li, Xueke

doi:https://doi.org/10.5194/hess-2024-129

Preprints

https://doi.org/10.5194/hess-2024-129

Preprints

08 May 2024

| 08 May 2024

Status: this discussion paper is a preprint. It has been under review for the journal Hydrology and Earth System Sciences (HESS). The manuscript was not accepted for further review after discussion.

Downscaling satellite-derived soil moisture in the Three North region using ensemble machine learning and multiple-source knowledge integration

Kai Liu, Hongyan Zhang, Yong Bo, Dehui Li, Long Li, Hang Li, Shudong Wang, and Xueke Li

Abstract. Soil moisture plays a crucial role in hydrological and ecological systems. While remote sensing has advanced large-scale soil moisture monitoring, current satellite products often face spatial resolution limitations. This study presents a reliable framework for downscaling satellite-derived soil moisture, leveraging ensemble machine learning and multiple knowledge sources. Our approach efficiently converges outputs from diverse machine learning algorithms through Bayesian model, harnessing spatiotemporal domains and point-wise data. Covering approximately five million square kilometres in the Three Northern region of China, our model generates 1-km daily soil moisture maps, accurately reflecting soil water content patterns and showing spatial consistency with outputs from two credible numerical models. Validation against in situ measurements from three ground networks confirms the accuracy of the downscaled dataset. Comparative analysis demonstrates the superiority of the Bayesian-based method over four individual machine learning methods. The high-resolution dataset produced proves effective in capturing drought dynamics, particularly extreme drought patterns. The robustness of our framework is further affirmed through uncertainty analysis, employing leave-one-out and progressive sample reduction approaches. In summary, our ensemble machine learning-based framework offers an efficient solution for acquiring accurate and high-resolution soil moisture data across large regions, with implications for water resource management and drought monitoring.

Received: 30 Apr 2024 – Discussion started: 08 May 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2681 KB)

Supplement (784 KB)

Download & links

Kai Liu, Hongyan Zhang, Yong Bo, Dehui Li, Long Li, Hang Li, Shudong Wang, and Xueke Li

Status: closed

RC1:
'Comment on hess-2024-129', Anonymous Referee #1, 20 Jun 2024

The major contribution of this paper is the use of Bayesian Model Averaging (BMA) to combine the outputs from several different empirical ("machine learning") techniques for soil moisture downscaling. The authors test this methodological innovation by comparing to a large dataset of in-situ soil moisture sensors scattered across northern China. I have several comments that I hope the authors will address.
First, I believe that statistical derivation of BMA assumes that the models are independent of each other. It seems like the models developed here are likely not independent because they have been developed using the same inputs and the same training data. Have the authors tested whether this assumption applies to their models? If they are dependent, what is the impact on the results?
Second, all the downscaling methods considered provide very little improvement in the soil moisture estimates. A key goal of downscaling is to include fine scale spatial variability that is not present in the coarse resolution input. However, when I examine the histograms in Figure 6, I see no increase in the variability of soil moisture when the downscaling methods are applied. Some of the methods have less variability than the coarse resolution input. Are these methods successfully introducing any variability in the patterns? Also, the accuracy of the BMA method is only slightly better than the coarse resolution input. The exact improvement is difficult to see because Table 4 does not include the performance of the coarse resolution input nor the overall performance across all the datasets used. Those should be added). The authors seem satisfied with the improvement in their discussion and conclusions, but it seems like the improvements do not warrant the huge processing involved. The authors consider relatively few variables. Could better performance be achieved by using model inputs?
Third, little consideration is given as to whether the in situ dataset adequately captures 1-km spatial variations in soil moisture (which is the stated goal of the downscaling method). The measurement support is likely very small and the spacing is likely much larger than 1-km. Even if the downscaling models reproduce this dataset exactly, have we really developed an accurate 1-km resolution soil moisture estimate? Can the authors provide some support that that a given in situ soil moisture observation is representative of its 1-km grid cell? Also, can the authors show that the collection of 1-km grid cells that have in situ observations capture the range of conditions that occur within the region? I believe some support along these lines would greatly strengthen the paper.
I would suggest removing the Noah results because they really don't contribute to testing the innovation that is presented.

Citation: https://doi.org/10.5194/hess-2024-129-RC1
- AC1: 'Reply on RC1', kai Liu, 18 Nov 2024
  
  Thank you for taking the time to review our manuscript and providing helpful comments and suggestions. We have prepared a separate pdf file in which we address all your concerns on a point-by-point basis. It is attached as a supplement.
  
  Citation: https://doi.org/10.5194/hess-2024-129-AC1
RC2:
'Comment on hess-2024-129', Maud Formanek, 21 Oct 2024
Summary
This study uses Bayesian model averaging to model soil moisture at a 1km resolution in the Three North region of China by combining the results from 4 individual machine learning methods. Their model uses 5 datasets of varying resolution (LST: 1km/daily, NDVI: 1km/16d, surface albedo: 0.05°/daily, elevation: 90m and precipitation: 0.1°/daily) as explanatory variables for soil moisture. Their model is trained on the 0.25° ESA CCI COMBINED soil moisture product by resampling the high-resolution predictor variables to the same scale and then applied to a lower 1km resolution using the original predictor datasets. The main finding of the paper is that the pearson R correlation coefficient and the RMSE against in-situ measurements from 3 different networks improve slightly in their new high resolution dataset.
General Comments

The use of Bayesian model averaging shows an innovative use of machine learning to train models

Misleading title. The authors use the term ‘downscaling’ to describe their method and the purpose of this study. In the remote sensing community, downscaling usually refers to using a coarse-grained dataset of some environmental variable (like soil moisture) along with auxiliary datasets available at the target resolution as predictor variables in a model to predict the same environmental variable at a higher (target) resolution. The target variable of the training and the validation is typically also used at the target resolution. What the authors describe in the paper would be better described as model calibration.

There is no clear training-validation split in the modelling. This should always be employed when using machine learning models.

The authors report a narrowing of the distribution of soil moisture values in the ‘downscaled’ dataset compared to the original CCI SM, which indicates a loss of information. A higher-resolution dataset should have a wider distribution of values than a low-resolution one, if we assume that the low-resolution data is a (weighted) average of the high-resolution data contained in its boundaries. This follows from the central limit theorem.

The authors emphasize that one of the major advantages of their methodology is the inclusion of prior knowledge from in-situ data into the Bayesian modelling framework. However, I could not find a description of the priors they use anywhere in the paper. This needs to be included if such a strong statement is made.

It is unclear from the manuscript how the weights of the individual models in the Bayesian model averaging algorithm are derived.

The time-series values from the BMA are either consistently higher or lower than all of the underlying model predictions (Figure 9). A weighted average must lie somewhere between its constituent values! This suggests that the authors are making an error in their calculations.

The manuscript contains misleading citations.

Specific Comments

1. Introduction

The introduction lacks a clear demonstration as to why we need higher resolution soil moisture datasets (applications, hypothesis to test, etc). It would benefit from a more detailed discussion of the most influential studies that have used downscaling and what their weaknesses are. Some strong statements lack citations and some citations are misleading (see detailed comments below).

Lines 49-51: “While these products are valuable for certain applications (Molero et al., 2016), the spatial resolution of these products—largely tens of kilometers—limits the ability to capture the spatial heterogeneity of soil moisture (Njoku and Entekhabi, 1996; Schmugge, 1998).” I am certain there are more recent studies looking into the spatial heterogeneity of soil moisture.

Lines 52-54: “Soil moisture downscaling, an effective technique for improving spatial resolution, has received substantial attention (Zhang et al., 2022). Statistical approaches and land surface models (Famiglietti et al., 2008; Grayson and Western, 1998) have been widely used, but these methods typically require large amounts of parametric data with ground data.” Famiglietti et al., 2008, does not perform any downscaling, but only provides a quantification of SM variability across scales. This citation would be better suited to the paragraph preceding this one. The last part of this sentence is also unclear. What is meant by ‘parametric data with ground data’?

Lines 54-56: “Various fusion methods integrating multi-source satellite remote sensing data have been developed, falling into categorized like active-passive microwave and optical-microwave data integration.” This needs citations and it is also unclear why combining active and passive microwave sensors would increase the resolution of a dataset. It typically only increases coverage and reduces uncertainty.

Lines 56-57: “All these mentioned models encounter challenges related to model structure constraints, data quality, scale disparities, and geographic limitations (Peng et al., 2017; Werbylo and Niemann, 2014).” Werbylo and Niemann, 2014, compares two in-situ sampling approaches for use in downscaling, but doesn’t discuss challenges in downscaling.

Lines 76-77: “Existing ensemble machine learning often overlooks the incorporation of prior knowledge, a crucial regularization mechanism that prevents overfitting and enhances model generalization.” Priors can only prevent overfitting, if the overfitted solution is unlikely in prior space. A large sample size always overcomes any prior. Prior knowledge helps with small sample sizes. This needs a citation.

2. Study area and materials

Figure 1: Mismatch between text and figure. How do the three station types in the plot (Meteorological, Crop and CERN stations) relate to the ones mentioned in section 2.2.5 (in-situ measurements, NZW, QXZ, CERN)?

Lines 123-125: “We use the combined active-passive ESA CCI products from 2003 to 2010, obtained from the ESA data archive (https://www.esa-soilmoisture-cci.org/)” The ESA CCI SM website clearly states that “If using the COMBINED product, the following is also compulsory in addition to the above: Preimesberger, W., Scanlon, T., Su, C. -H., Gruber, A. and Dorigo, W. (2021). Homogenization of Structural Breaks in the Global ESA CCI Soil Moisture Multisatellite Climate Data Record, in IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 4, pp. 2845-2862, April 2021, doi: 10.1109/TGRS.2020.3012896.”

Line 127: “The integration of MODIS products within satellite-derived soil moisture downscaling has been extensively employed” This needs a citation.

Line 157: “The CERN dataset comprises 34 stations, covering a period of approximately five days from 2005 to 2014.” There are only 5 measurements in 9 years? Is this correct?

3. Methods

3.1 Feasibility of chosen explanatory factors: The authors only use one (random forest) out of their 4 machine learning models for their feature importance analysis. Why this one and why not all? The results might be different for the different models. Furthermore, the authors should test for collinearity/correlation between explanatory variables to avoid overfitting their models.

Lines 208-215: The linear regression analysis doesn’t add any scientific value to the paper. I would remove this paragraph along with Figure 3b.

3.2 Machine learning methods: it is unnecessary to explain these 4 very commonly used machine learning models.

Line 253, Equation 4: The formula here is misleading, as x should be the result of a nonlinear transformation of the explanatory variables, not the “the value of each dimension in the training set”

Lines 297-301: “In theory, calculating p(M_i |D) of a model involves computing the likelihood function for each model, multiplying it by the prior probability of each model, and dividing by the marginal likelihood. However, this method is rarely employed in practice due to the complexity of computing the likelihood function and prior distribution, especially for complex models with high-dimensional parameter spaces. Instead, iterative estimation techniques such as Markov Chain Monte Carlo methods are commonly used. In our study, we utilized Markov Chain Monte Carlo Cube (MC3) for this purpose.” Given that BMA is the main innovation in their study, the authors should explain in more detail how they calculated the individual model probabilities. Citations are needed here too.

Lines 324-325: “Through a sensitivity analysis conducted with an independent dataset, maximum values for these parameters are chosen for the period spanning 2003 to 2010.” Which independent dataset?

4. Results and Discussions

The maps from Figure 5 are of too limited quality to really assess whether their method leads to improved resolution of spatial patterns in soil moisture.

Lines 388-390: “It is evident that the downscaled data produced by the BMA method exhibit more pronounced differences compared to the original data, particularly in terms of histogram distributions shifting towards the peak. This implies that the downscaled method effectively captures the disparities between the 25km and 1km products.” Downscaled datasets should have broader distributions than the original data, as the original data should represent an average over the HR pixels contained within them. A narrowing of the distribution indicates a loss of information, rather than a gain. I would like to see the average variability of the 1km SM within the 25km pixel.”

Lines 397-398: “While most of the downscaling results exhibit lower values compared to the original ESA CCI values during most months, this variance is not of substantial magnitude.” This is not a variance, but a bias.

Line 398-399: “This pattern can be attributed to the inherent characteristics of the BMA ensemble approach, which combines multiple machine learning outcomes to prevent excessively high or low values.” This might explain the lower variance, but not the bias.

Lines 419-420: “The absence of in situ data in the western desert-dominated region potentially weak the model training, exerting a negative effect on the resultant model accuracy.” Until this point, we don’t know how the in-situ data is used in training. I assume it enters the model probabilities in the BMA, but this is never explained.

Figure 7: colour scales are confusing, high MAE should correspond to low R (and thus have the same colours)

Lines 431-439: The authors should include confidence intervals for these metrics. Furthermore, I wonder how they compare to their coarse-grained modelled dataset? I assume they would be very similar, which would suggest that the downscaling has little effect, but that the results rather stem from the models smoothing the data (at any scale).

Lines 440-450: I don't see how this paragraph is relevant to the paper. Accurately predicting soil moisture during Monsoon season is not a question of downscaling. The authors also never mention a particular focus on Monsoon prediction in the introduction or elsewhere.

In Figure 9, the time-series values from the BMA are either consistently higher or lower than all of the underlying model predictions. A weighted average must lie somewhere between its constituent values! This suggests that the authors are making an error in their calculations.

Lines 488-489: “As illustrated in Fig. 10, the R and MAE distributions of the ERA5 data within the study area and the Noah data within the Loess Plateau are utilized.” Why is the Noah data not used for the whole study area too? Unless a reason for this is given, it looks like cherry-picking.

Lines 489-491: “Results reveal that the BMA ensemble outcomes exhibit reasonable performance in terms of higher R values and lower MAE values when compared to both the ERA5 and Noah datasets.” Higher R and lower MAE compared to what? The original CCI SM dataset? Please specify.

Section 4.6 Uncertainty analysis: this paragraph does not constitute uncertainty analysis, but a feature importance analysis.

Table 5: The authors should not mix R and R², but rather pick one and transform the other. They should report the overall R, rather than the range over in-situ networks. This is misleading.

Figures S6 and S7 (comparing the model performance with and without clustering and the spatiotemporal searching window) should be incorporated into the main manuscript, as they highlight the superiority of their novel approach. Figures 11 and 12 could move to the supporting information as they don’t add much value to the paper.
Citation: https://doi.org/10.5194/hess-2024-129-RC2
- AC2: 'Reply on RC2', kai Liu, 18 Nov 2024
  
  Thank you for taking the time to review our manuscript and providing helpful comments and suggestions. We have prepared a separate pdf file in which we address all your concerns on a point-by-point basis. It is attached as a supplement.
  
  Citation: https://doi.org/10.5194/hess-2024-129-AC2

Status: closed

RC1:
'Comment on hess-2024-129', Anonymous Referee #1, 20 Jun 2024

The major contribution of this paper is the use of Bayesian Model Averaging (BMA) to combine the outputs from several different empirical ("machine learning") techniques for soil moisture downscaling. The authors test this methodological innovation by comparing to a large dataset of in-situ soil moisture sensors scattered across northern China. I have several comments that I hope the authors will address.
First, I believe that statistical derivation of BMA assumes that the models are independent of each other. It seems like the models developed here are likely not independent because they have been developed using the same inputs and the same training data. Have the authors tested whether this assumption applies to their models? If they are dependent, what is the impact on the results?
Second, all the downscaling methods considered provide very little improvement in the soil moisture estimates. A key goal of downscaling is to include fine scale spatial variability that is not present in the coarse resolution input. However, when I examine the histograms in Figure 6, I see no increase in the variability of soil moisture when the downscaling methods are applied. Some of the methods have less variability than the coarse resolution input. Are these methods successfully introducing any variability in the patterns? Also, the accuracy of the BMA method is only slightly better than the coarse resolution input. The exact improvement is difficult to see because Table 4 does not include the performance of the coarse resolution input nor the overall performance across all the datasets used. Those should be added). The authors seem satisfied with the improvement in their discussion and conclusions, but it seems like the improvements do not warrant the huge processing involved. The authors consider relatively few variables. Could better performance be achieved by using model inputs?
Third, little consideration is given as to whether the in situ dataset adequately captures 1-km spatial variations in soil moisture (which is the stated goal of the downscaling method). The measurement support is likely very small and the spacing is likely much larger than 1-km. Even if the downscaling models reproduce this dataset exactly, have we really developed an accurate 1-km resolution soil moisture estimate? Can the authors provide some support that that a given in situ soil moisture observation is representative of its 1-km grid cell? Also, can the authors show that the collection of 1-km grid cells that have in situ observations capture the range of conditions that occur within the region? I believe some support along these lines would greatly strengthen the paper.
I would suggest removing the Noah results because they really don't contribute to testing the innovation that is presented.

Citation: https://doi.org/10.5194/hess-2024-129-RC1
- AC1: 'Reply on RC1', kai Liu, 18 Nov 2024
  
  Thank you for taking the time to review our manuscript and providing helpful comments and suggestions. We have prepared a separate pdf file in which we address all your concerns on a point-by-point basis. It is attached as a supplement.
  
  Citation: https://doi.org/10.5194/hess-2024-129-AC1
RC2:
'Comment on hess-2024-129', Maud Formanek, 21 Oct 2024
Summary
This study uses Bayesian model averaging to model soil moisture at a 1km resolution in the Three North region of China by combining the results from 4 individual machine learning methods. Their model uses 5 datasets of varying resolution (LST: 1km/daily, NDVI: 1km/16d, surface albedo: 0.05°/daily, elevation: 90m and precipitation: 0.1°/daily) as explanatory variables for soil moisture. Their model is trained on the 0.25° ESA CCI COMBINED soil moisture product by resampling the high-resolution predictor variables to the same scale and then applied to a lower 1km resolution using the original predictor datasets. The main finding of the paper is that the pearson R correlation coefficient and the RMSE against in-situ measurements from 3 different networks improve slightly in their new high resolution dataset.
General Comments

The use of Bayesian model averaging shows an innovative use of machine learning to train models

Misleading title. The authors use the term ‘downscaling’ to describe their method and the purpose of this study. In the remote sensing community, downscaling usually refers to using a coarse-grained dataset of some environmental variable (like soil moisture) along with auxiliary datasets available at the target resolution as predictor variables in a model to predict the same environmental variable at a higher (target) resolution. The target variable of the training and the validation is typically also used at the target resolution. What the authors describe in the paper would be better described as model calibration.

There is no clear training-validation split in the modelling. This should always be employed when using machine learning models.

The authors report a narrowing of the distribution of soil moisture values in the ‘downscaled’ dataset compared to the original CCI SM, which indicates a loss of information. A higher-resolution dataset should have a wider distribution of values than a low-resolution one, if we assume that the low-resolution data is a (weighted) average of the high-resolution data contained in its boundaries. This follows from the central limit theorem.

The authors emphasize that one of the major advantages of their methodology is the inclusion of prior knowledge from in-situ data into the Bayesian modelling framework. However, I could not find a description of the priors they use anywhere in the paper. This needs to be included if such a strong statement is made.

It is unclear from the manuscript how the weights of the individual models in the Bayesian model averaging algorithm are derived.

The time-series values from the BMA are either consistently higher or lower than all of the underlying model predictions (Figure 9). A weighted average must lie somewhere between its constituent values! This suggests that the authors are making an error in their calculations.

The manuscript contains misleading citations.

Specific Comments

1. Introduction

The introduction lacks a clear demonstration as to why we need higher resolution soil moisture datasets (applications, hypothesis to test, etc). It would benefit from a more detailed discussion of the most influential studies that have used downscaling and what their weaknesses are. Some strong statements lack citations and some citations are misleading (see detailed comments below).

Lines 49-51: “While these products are valuable for certain applications (Molero et al., 2016), the spatial resolution of these products—largely tens of kilometers—limits the ability to capture the spatial heterogeneity of soil moisture (Njoku and Entekhabi, 1996; Schmugge, 1998).” I am certain there are more recent studies looking into the spatial heterogeneity of soil moisture.

Lines 52-54: “Soil moisture downscaling, an effective technique for improving spatial resolution, has received substantial attention (Zhang et al., 2022). Statistical approaches and land surface models (Famiglietti et al., 2008; Grayson and Western, 1998) have been widely used, but these methods typically require large amounts of parametric data with ground data.” Famiglietti et al., 2008, does not perform any downscaling, but only provides a quantification of SM variability across scales. This citation would be better suited to the paragraph preceding this one. The last part of this sentence is also unclear. What is meant by ‘parametric data with ground data’?

Lines 54-56: “Various fusion methods integrating multi-source satellite remote sensing data have been developed, falling into categorized like active-passive microwave and optical-microwave data integration.” This needs citations and it is also unclear why combining active and passive microwave sensors would increase the resolution of a dataset. It typically only increases coverage and reduces uncertainty.

Lines 56-57: “All these mentioned models encounter challenges related to model structure constraints, data quality, scale disparities, and geographic limitations (Peng et al., 2017; Werbylo and Niemann, 2014).” Werbylo and Niemann, 2014, compares two in-situ sampling approaches for use in downscaling, but doesn’t discuss challenges in downscaling.

Lines 76-77: “Existing ensemble machine learning often overlooks the incorporation of prior knowledge, a crucial regularization mechanism that prevents overfitting and enhances model generalization.” Priors can only prevent overfitting, if the overfitted solution is unlikely in prior space. A large sample size always overcomes any prior. Prior knowledge helps with small sample sizes. This needs a citation.

2. Study area and materials

Figure 1: Mismatch between text and figure. How do the three station types in the plot (Meteorological, Crop and CERN stations) relate to the ones mentioned in section 2.2.5 (in-situ measurements, NZW, QXZ, CERN)?

Lines 123-125: “We use the combined active-passive ESA CCI products from 2003 to 2010, obtained from the ESA data archive (https://www.esa-soilmoisture-cci.org/)” The ESA CCI SM website clearly states that “If using the COMBINED product, the following is also compulsory in addition to the above: Preimesberger, W., Scanlon, T., Su, C. -H., Gruber, A. and Dorigo, W. (2021). Homogenization of Structural Breaks in the Global ESA CCI Soil Moisture Multisatellite Climate Data Record, in IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 4, pp. 2845-2862, April 2021, doi: 10.1109/TGRS.2020.3012896.”

Line 127: “The integration of MODIS products within satellite-derived soil moisture downscaling has been extensively employed” This needs a citation.

Line 157: “The CERN dataset comprises 34 stations, covering a period of approximately five days from 2005 to 2014.” There are only 5 measurements in 9 years? Is this correct?

3. Methods

3.1 Feasibility of chosen explanatory factors: The authors only use one (random forest) out of their 4 machine learning models for their feature importance analysis. Why this one and why not all? The results might be different for the different models. Furthermore, the authors should test for collinearity/correlation between explanatory variables to avoid overfitting their models.

Lines 208-215: The linear regression analysis doesn’t add any scientific value to the paper. I would remove this paragraph along with Figure 3b.

3.2 Machine learning methods: it is unnecessary to explain these 4 very commonly used machine learning models.

Line 253, Equation 4: The formula here is misleading, as x should be the result of a nonlinear transformation of the explanatory variables, not the “the value of each dimension in the training set”

Lines 297-301: “In theory, calculating p(M_i |D) of a model involves computing the likelihood function for each model, multiplying it by the prior probability of each model, and dividing by the marginal likelihood. However, this method is rarely employed in practice due to the complexity of computing the likelihood function and prior distribution, especially for complex models with high-dimensional parameter spaces. Instead, iterative estimation techniques such as Markov Chain Monte Carlo methods are commonly used. In our study, we utilized Markov Chain Monte Carlo Cube (MC3) for this purpose.” Given that BMA is the main innovation in their study, the authors should explain in more detail how they calculated the individual model probabilities. Citations are needed here too.

Lines 324-325: “Through a sensitivity analysis conducted with an independent dataset, maximum values for these parameters are chosen for the period spanning 2003 to 2010.” Which independent dataset?

4. Results and Discussions

The maps from Figure 5 are of too limited quality to really assess whether their method leads to improved resolution of spatial patterns in soil moisture.

Lines 388-390: “It is evident that the downscaled data produced by the BMA method exhibit more pronounced differences compared to the original data, particularly in terms of histogram distributions shifting towards the peak. This implies that the downscaled method effectively captures the disparities between the 25km and 1km products.” Downscaled datasets should have broader distributions than the original data, as the original data should represent an average over the HR pixels contained within them. A narrowing of the distribution indicates a loss of information, rather than a gain. I would like to see the average variability of the 1km SM within the 25km pixel.”

Lines 397-398: “While most of the downscaling results exhibit lower values compared to the original ESA CCI values during most months, this variance is not of substantial magnitude.” This is not a variance, but a bias.

Line 398-399: “This pattern can be attributed to the inherent characteristics of the BMA ensemble approach, which combines multiple machine learning outcomes to prevent excessively high or low values.” This might explain the lower variance, but not the bias.

Lines 419-420: “The absence of in situ data in the western desert-dominated region potentially weak the model training, exerting a negative effect on the resultant model accuracy.” Until this point, we don’t know how the in-situ data is used in training. I assume it enters the model probabilities in the BMA, but this is never explained.

Figure 7: colour scales are confusing, high MAE should correspond to low R (and thus have the same colours)

Lines 431-439: The authors should include confidence intervals for these metrics. Furthermore, I wonder how they compare to their coarse-grained modelled dataset? I assume they would be very similar, which would suggest that the downscaling has little effect, but that the results rather stem from the models smoothing the data (at any scale).

Lines 440-450: I don't see how this paragraph is relevant to the paper. Accurately predicting soil moisture during Monsoon season is not a question of downscaling. The authors also never mention a particular focus on Monsoon prediction in the introduction or elsewhere.

In Figure 9, the time-series values from the BMA are either consistently higher or lower than all of the underlying model predictions. A weighted average must lie somewhere between its constituent values! This suggests that the authors are making an error in their calculations.

Lines 488-489: “As illustrated in Fig. 10, the R and MAE distributions of the ERA5 data within the study area and the Noah data within the Loess Plateau are utilized.” Why is the Noah data not used for the whole study area too? Unless a reason for this is given, it looks like cherry-picking.

Lines 489-491: “Results reveal that the BMA ensemble outcomes exhibit reasonable performance in terms of higher R values and lower MAE values when compared to both the ERA5 and Noah datasets.” Higher R and lower MAE compared to what? The original CCI SM dataset? Please specify.

Section 4.6 Uncertainty analysis: this paragraph does not constitute uncertainty analysis, but a feature importance analysis.

Table 5: The authors should not mix R and R², but rather pick one and transform the other. They should report the overall R, rather than the range over in-situ networks. This is misleading.

Figures S6 and S7 (comparing the model performance with and without clustering and the spatiotemporal searching window) should be incorporated into the main manuscript, as they highlight the superiority of their novel approach. Figures 11 and 12 could move to the supporting information as they don’t add much value to the paper.
Citation: https://doi.org/10.5194/hess-2024-129-RC2
- AC2: 'Reply on RC2', kai Liu, 18 Nov 2024
  
  Thank you for taking the time to review our manuscript and providing helpful comments and suggestions. We have prepared a separate pdf file in which we address all your concerns on a point-by-point basis. It is attached as a supplement.
  
  Citation: https://doi.org/10.5194/hess-2024-129-AC2

Kai Liu, Hongyan Zhang, Yong Bo, Dehui Li, Long Li, Hang Li, Shudong Wang, and Xueke Li

Supplement

https://doi.org/10.5194/hess-2024-129-supplement

Kai Liu, Hongyan Zhang, Yong Bo, Dehui Li, Long Li, Hang Li, Shudong Wang, and Xueke Li

Viewed

Total article views: 970 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
604	261	105	970	61	50	62

HTML: 604
PDF: 261
XML: 105
Total: 970
Supplement: 61
BibTeX: 50
EndNote: 62

Views and downloads (calculated since 08 May 2024)

Month	HTML	PDF	XML	Total
May 2024	188	57	10	255
Jun 2024	68	16	5	89
Jul 2024	24	15	5	44
Aug 2024	32	11	0	43
Sep 2024	14	7	1	22
Oct 2024	40	30	1	71
Nov 2024	27	20	2	49
Dec 2024	19	10	1	30
Jan 2025	18	7	8	33
Feb 2025	23	10	9	42
Mar 2025	20	11	2	33
Apr 2025	19	19	3	41
May 2025	27	10	47	84
Jun 2025	22	22	8	52
Jul 2025	23	10	0	33
Aug 2025	40	6	3	49

Cumulative views and downloads (calculated since 08 May 2024)

Month	HTML	PDF	XML	Total
May 2024	188	57	10	255
Jun 2024	68	16	5	89
Jul 2024	24	15	5	44
Aug 2024	32	11	0	43
Sep 2024	14	7	1	22
Oct 2024	40	30	1	71
Nov 2024	27	20	2	49
Dec 2024	19	10	1	30
Jan 2025	18	7	8	33
Feb 2025	23	10	9	42
Mar 2025	20	11	2	33
Apr 2025	19	19	3	41
May 2025	27	10	47	84
Jun 2025	22	22	8	52
Jul 2025	23	10	0	33
Aug 2025	40	6	3	49

Viewed (geographical distribution)

Total article views: 936 (including HTML, PDF, and XML) Thereof 936 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 23 Aug 2025

Short summary

Our framework brings together remote sensing, machine learning, and numerical modeling to enhance soil moisture records. We merge outputs from various machine learning algorithms to ensure the model reliability. The ability of our approach in capturing drought dynamics is noticeable, making it invaluable in arid and semi-arid regions globally, such as northern China and the northern-central United States, where drought susceptibility is high.


Total:	0
HTML:	0
PDF:	0
XML:	0

Downscaling satellite-derived soil moisture in the Three North region using ensemble machine learning and multiple-source knowledge integration

Supplement

Viewed

Viewed (geographical distribution)

Cited

2 citations as recorded by crossref.