Mapping soil moisture across the UK: assimilating cosmic-ray neutron sensors, remotely sensed indices, rainfall radar and catchment water balance data in a Bayesian hierarchical model

Levy, Peter E.; the COSMOS-UK team,

doi:https://doi.org/10.5194/hess-28-4819-2024

Articles | Volume 28, issue 21

https://doi.org/10.5194/hess-28-4819-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/hess-28-4819-2024

© Author(s) 2024. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 28, issue 21

Research article

|

06 Nov 2024

Research article |

| 06 Nov 2024

Mapping soil moisture across the UK: assimilating cosmic-ray neutron sensors, remotely sensed indices, rainfall radar and catchment water balance data in a Bayesian hierarchical model

Peter E. Levy and the COSMOS-UK team

Download

Final revised paper (published on 06 Nov 2024)
Preprint (discussion started on 12 Sep 2023)

Interactive discussion

Status: closed

RC1:
'Comment on egusphere-2023-2041', Anonymous Referee #1, 27 Nov 2023

The authors have done a job on high-resolution soil moisture modeling at the UK scale. The paper is well structured, but a major revision is needed before publication. My main issues include:

1. To add a flowchart that systematically shows the various parts of the study and the roles of the various data.

2. To add a description of the matching of COSMOS sites to model grids. It is not clear at this point how to match COSMOS data at nearly 100m resolution with models at 2km resolution.

3. As the authors said, they used decades of stream flow data. Have these watersheds changed over the last few decades? In particular, are there any hydraulic structures or water extraction projects conducted during this period? How would these decades of river flow data affect the results of this study if they are unsteady?
4. The information presented in Fig.3 is not clear, please revise it. Please add the corresponding rainfall. Please show the soil moisture of one or two months in different seasons.

Citation: https://doi.org/10.5194/egusphere-2023-2041-RC1
- AC1: 'Reply on RC1', Peter E. Levy, 08 Feb 2024
  
  We thank the referee for the time taken. Their comments are shown in italics; our response is beneath in normal font.
  My main issues include:
  1. To add a flowchart that systematically shows the various parts of the study and the roles of the various data.
  
  - A good idea - we will add this in the revision.
  
  2. To add a description of the matching of COSMOS sites to model grids. It is not clear at this point how to match COSMOS data at nearly 100m resolution with models at 2km resolution.
  - This is straightforward because the COSMOS sites are simply matched to the 2-km square they are located in. We can state this in the revision, and discuss other options (e.g. using data from the surrounding grid cells to interpolate to the COSMOS site location).
  
  3. As the authors said, they used decades of stream flow data. Have these watersheds changed over the last few decades? In particular, are there any hydraulic structures or water extraction projects conducted during this period? How would these decades of river flow data affect the results of this study if they are unsteady?
  - Where these do occur, it would indeed make a step change of unknown size in the parameters we are estimating. The NRFA data include meta-data on any known man-made changes of this kind, and we have tried to remove data prior to these changes where they have occurred. However, of the 1200+ catchments, this affects relatively few, most of which have been identified and removed, so we do not think this is a major problem with the analysis. We can add text to this effect in the revision to the manuscript.
  
  4. The information presented in Fig.3 is not clear, please revise it. Please add the corresponding rainfall. Please show the soil moisture of one or two months in different seasons.
  
  - Adding rainfall to the existing figure is straightforward. Showing some contrasting months is also easy, but will require a separate figure. We can do both in the revisions.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2041-AC1
RC2:
'Comment on egusphere-2023-2041', Anonymous Referee #2, 10 Dec 2023
The present manuscript aims at predicting soil moisture for the whole UK using a new hydrological model approach based on statistical considerations and data from discharge gauges, remote-sensing products, and cosmic-ray neutron sites. The authors explain the mathematical background of their model in detail and extensively discuss parts of the used data, their results, and the model limitations. To me the introduction of the mathematical approach reads interesting, but it seems to combine a lot of new concepts and ideas, such as an EMA filter, a slope m, a mixed effects model, complex kriging algorithms, bayesian statistics, etc. It is not clear to me to what extend this is all new or already established. It is also not clear to me how these ideas are backed by previous research. If the approach is completely new, I would relate this manuscript more to a journal for hydrological model development. The key is the invention of an (apparently) new model approach, while the use of the highly advertised COSMOS data here turned out to be just a very minor aspect of the study. Being not a hydrological modeler, I cannot evaluate the choices the authors made on the way, but I feel that comparisons to existing models are widely missing. Once the model development is accepted by the hydrological modeler community, a second paper could integrate new data sets, such as COSMOS-UK, to study its performance. Hence, I'd recommend major revision to better focus on the model development, comparisons to existing models, and to address the remaining concerns.
# Major concerns
As a key motivation for inventing a completely new hydrological model, I am missing an extensive introduction of existing hydrological models, their methods and capabilities to predict spatial SM in the UK, where and why they fail, and what will be done differently in this study to solve these issues. Has nobody before operated a hydrological model in the UK? What is their resolution? Has nobody before integrated discharge data? Or satellite data? Or CRNS data? There is plenty of literature here that needs to be discussed before it becomes clear whether you actually invented a completely new approach or took or ammended parts of existing ones. And whether this choice is adequate compared to the performance of existing models.

The authors present their "simple" model with a number of unclear assumptions (Lines 58-70). E.g., treating soil moisture dynamics as a pulse-decay curve with exponential shape. I have strong doubts that this is a valid assumption for soil hydrological processes, neglecting porosity, capillary forces, van.Genuchten models, vegetation influence, etc. If the authors are really convinced about their assumptions here, the reader would at least expect scientific argumentation of why these assumptions hold, e.g., using insights from existing literature. The whole section hardly names any hydrological paper to strengthen the choice of assumptions, which would be OK for the first hydro model invented in 1950, but not in 2023.

A major challenge when comparing soil moisture from hydrological models and COSMOS data is the vertical soil moisture profile. COSMOS averages soil moisture between 0 and 80 cm, with an exponential weight which is higher for shallower layers and that depends (unfortunatelly) on the soil moisture profile itself. It changes over time. And it is not trivial to what layer of the hydrological model these measurements should be compared to, and how. Many other papers have addressed this challenge already. While in the present paper, I cannot find any hint on how exactly the authors compared observed and predicated soil moisture layer-wise. Please elaborate.

The agreement between observed and predicted soil moisture does not look convincing to me (Fig. 3 and 4). There are obvious biases and unmatched dynamics still visible. Performance metrics like KGE or R² are missing to assess the qualitiy of the prediction. The RMSE alone could miss important differences in dynamics.

I wonder whether the performance of the model has been tested on uncalibrated sites. A usual approach to test spatial extrapolation or regionalization models is to train them on a few sites and test them on other sites. Please add such an analysis such that the reader can assess the reliability of your high-resolution model at sites other than the COSMOS sites.

The major selling point of the new model seems to be computational speed (Line 381). However, there are other hydrological models which are also based on simple principles, physical parameters, and still extremely fast. One of many examples could be the mHM model (Samaniego et al. 2010), proofed to be one of the best hydro models globally. A major difference is that they regionalize the calculation of soil porosity, while your model takes a given map for granted. It would be important to highlight the differences to this and other existing models in terms of methodology, speed, and quality of results.

# Minor concerns
The structure of the introduction is unconventional and confusing. It appears that the introduction has not ended before section 1.1, but the subsequent description of the hydro model used seems also be part of the introduction, too. After that, the aims of the study are outlined two pages later. This is highly confusing and should be changed. Section 1.1., and maybe parts of 1.2, should move to the methods section. Please elaborate on the structure and outline of the study at the end of the introduction. I was not able to identify a clear hypothesis, other than making "the most accurate estimate of mapped soil moisture as possible", which is both vague and nonscientific language.

The introduction seems to be a bit biased, as no issues of the CRNS technique have been addressed, while many issues of remote sensing products are prominently mentioned. Especially since the argumentation focuses towards the unwanted influence of vegetation water and soil properties, it is necessary to indicate that CRNS has very similar issues, as it does not work reliably in highly vegetated, highly prorous, or highly organic soils (Bogena et al. 2013, Rasche et al. 2021, etc).

Section 2.1.1.: A proper and unbiased introduction of the COSMOS technique, which is, as was advertised, key to this study, requires more description of the pros and cons. In that sense, the description is actually incomplete. Neutrons are not only sensitive to soil moisture, but to any hydrogen pool in organic matter, vegetation, snow, etc. This is a highly relevant information to assess the performance and quality of your results. Also the fact that COSMOS data is calibrated on actual soil moisture is very relevant, because neutrons are a relative quantity just as the remote sensing data you critisize. Furthermore, Köhli et al. speaks of 15 to 80 cm of sensing depth, why do you mention max. 30 cm depth here? The answer is the wet soil in UK, which brings us back to the fact that limitations of COSMOS have not been properly explained here. Please elaborate on the quality of the CRNS data and provide related citations.

# Specific comments
## Abstract:
The abstract is not logical or at least unclear. You motivate your study by the fact that remote-sensing data, soil hydrological data and vegetation introduce uncertainty. Then you present a solution which involves a remote-sensing product and soil properties. The reader would expect a brief argumentation why this solution solves the previously mentioned issues while it again makes use of them.

The study was further motivated with the fact that remote sensing data have issues to provide absolute soil moisture. The solution presented, however, seems to be good at explaining variation only, with no mention of absolute SM predictions anymore (at least in the abstract). If you raise an issue in the beginning, the reader would expect a reference to it at the end of the story. 

Please use scientific and more concrete language when describing the models used. A "simple model", as the major outcome of your study, is not an adequate description. Can you name it? Is it a statistical or bucket model? Help the reader to categorize the key model of your study among the many existing model variants in hydrology. Similarly, please name or briefly elaborate on "a process-based model" which you mentioned using as a benchmark.

The last sentence does not make sense to me. If there is neglible computation time and assimilation of realtime data, why it lacks behind one week? 

## Manuscript
Line 26: Consider mentioning also the useful integration depth of this measurement technique. 

Line 33: Can you assign the individual citations to each problem separately, instead of lumping them all at the end of the sentence? Thanks!

Line 36: replace "are" by "and" (...influenced)

Line 295: "there is no clear pattern to it". Please rephrase. The interpretation of the pattern is scientific research. Just because no reason for the variations has been identified so far, it does not mean that there is no reason or no underlying pattern at all.

Code availability: it is highly recommended to publish the model code, e.g. in a git repository, as it is common standard for other hydrological models.
Citation: https://doi.org/10.5194/egusphere-2023-2041-RC2
- AC2: 'Reply on RC2', Peter E. Levy, 08 Feb 2024
  
  We thank the referee for the time taken and attention to detail. Their comments are shown in italics; our response is beneath in normal font.
  I'd recommend major revision to better focus on the model development, comparisons to existing models, and to address the remaining concerns.
  # Major concerns
  
  1. As a key motivation for inventing a completely new hydrological model, I am missing an extensive introduction of existing hydrological models, their methods and capabilities to predict spatial SM in the UK, where and why they fail, and what will be done differently in this study to solve these issues. Has nobody before operated a hydrological model in the UK? What is their resolution? Has nobody before integrated discharge data? Or satellite data? Or CRNS data? There is plenty of literature here that needs to be discussed before it becomes clear whether you actually invented a completely new approach or took or ammended parts of existing ones. And whether this choice is adequate compared to the performance of existing models.
  
  - We accept this point, and can add some text to the introduction on existing soil moisture products.
  2. The authors present their "simple" model with a number of unclear assumptions (Lines 58-70). E.g., treating soil moisture dynamics as a pulse-decay curve with exponential shape. I have strong doubts that this is a valid assumption for soil hydrological processes, neglecting porosity, capillary forces, van.Genuchten models, vegetation influence, etc. If the authors are really convinced about their assumptions here, the reader would at least expect scientific argumentation of why these assumptions hold, e.g., using insights from existing literature. The whole section hardly names any hydrological paper to strengthen the choice of assumptions, which would be OK for the first hydro model invented in 1950, but not in 2023.
  
  - We find this a strange comment. The assumptions are explicit in these lines and in the equations, as well as in the referee's comment itself. We are not "neglecting porosity, capillary forces ..." but demonstrating that they do not need to be represented explicitly: at a given site, the dynamics can be summarised very simply as exponential decay, and thereby linearised via the EMA filter. We cite three hydrological papers which have used the same approach successfully. We could add a section which demonstrates how this follows from first principles, but we thought this would be over-kill. We could add this in supplementary information perhaps.
  3. A major challenge when comparing soil moisture from hydrological models and COSMOS data is the vertical soil moisture profile. COSMOS averages soil moisture between 0 and 80 cm, with an exponential weight which is higher for shallower layers and that depends (unfortunatelly) on the soil moisture profile itself. It changes over time. And it is not trivial to what layer of the hydrological model these measurements should be compared to, and how. Many other papers have addressed this challenge already. While in the present paper, I cannot find any hint on how exactly the authors compared observed and predicated soil moisture layer-wise. Please elaborate.
  
  - We explicitly state that we are modelling the COSMOS observations of soil moisture, which can be interpreted loosely as near-surface soil moisture. At no point do we say that there are any "layers of the hydrological model", and the equations are explicit, so I'm not clear where the confusion arises. As the referee says, the depth that CRNS are sensitive to varies somewhat with soil moisture itself, but are always strongly weighted towards the surface soil moisture. We can make this point explicitly in the revision - that the observations (and thus predictions) are subject to this varying-depth effect, and there is no simple solution to this. One could attempt an inverse modelling scheme to infer a depth profile of soil moisture, but this would be very poorly constrained by the available observations.
  4. The agreement between observed and predicted soil moisture does not look convincing to me (Fig. 3 and 4). There are obvious biases and unmatched dynamics still visible. Performance metrics like KGE or R² are missing to assess the qualitiy of the prediction. The RMSE alone could miss important differences in dynamics.
  
  - r2 for every model variant is listed in Table 1, along with AIC as the more useful measure of comparative goodness-of-fit. Sure, the agreement is not perfect, but the point is that the simple linear model does better than the previous satellite estimates and the more complex models cited.
  5. I wonder whether the performance of the model has been tested on uncalibrated sites. A usual approach to test spatial extrapolation or regionalization models is to train them on a few sites and test them on other sites. Please add such an analysis such that the reader can assess the reliability of your high-resolution model at sites other than the COSMOS sites.
  
  - We are not averse to adding cross-validation in principle, but it doesn't achieve anything additional. The point of the hierarchical approach is that it treats the site-to-site variability explicitly, and estimates the global parameters having accounted for this. So in principle, we can already say how well we expect the model to do at a new site, since we have estimated the variance Ψ.
  
  One real advantage of this approach is that we can propagate this uncertainty that we know will arise at each new site into the predictions. Cross-validation is a more computationally intensive way to quantify that same site-to-site uncertainty, but does not provide an easy means of propagating that uncertainty into predictions. The strength of AIC is that, in theory, it provides a measure of out-of-sample prediction, so indicates which model should give the best prediction at sites outwith the calibration set.
  
  We propose to add some text making the above point to the revision, explaining how this method compares to cross-validation.
  6. The major selling point of the new model seems to be computational speed (Line 381). However, there are other hydrological models which are also based on simple principles, physical parameters, and still extremely fast. One of many examples could be the mHM model (Samaniego et al. 2010), proofed to be one of the best hydro models globally. A major difference is that they regionalize the calculation of soil porosity, while your model takes a given map for granted. It would be important to highlight the differences to this and other existing models in terms of methodology, speed, and quality of results.
  
  - We can add some comparison with other modelling approaches to the introduction and/or discussion. One obvious difference with the MHM is the degree of complexity, since it is a system of ODEs with at least 62 parameters to be estimated, rather than a single linear equation with six parameters (Eqn 4). As an aside, the MHM paper referred to appears to do something similar to the method we describe here, albeit using very different terminology (e.g. "regionalisation").
  # Minor concerns
  
  1. The structure of the introduction is unconventional and confusing. It appears that the introduction has not ended before section 1.1, but the subsequent description of the hydro model used seems also be part of the introduction, too. After that, the aims of the study are outlined two pages later. This is highly confusing and should be changed. Section 1.1., and maybe parts of 1.2, should move to the methods section. Please elaborate on the structure and outline of the study at the end of the introduction. I was not able to identify a clear hypothesis, other than making "the most accurate estimate of mapped soil moisture as possible", which is both vague and nonscientific language.
  
  - By contrast, referee 1 says "the paper is well structured". We explain the problem, then introduce our approach to modelling soil moisture in time (1.1) and in space (1.2), and give explicit aims (1.3). The aims only make sense in terms of the problem we are trying to solve (making accurate maps of soil moisture) and our approach to solving it (integrating disparate data sources in a hierarchical linear model), so inevitably appear later. We are not testing any hypothesis here because we are not doing an experiment. There is nothing "vague and nonscientific" about our stated aims.
  2. The introduction seems to be a bit biased, as no issues of the CRNS technique have been addressed, while many issues of remote sensing products are prominently mentioned. Especially since the argumentation focuses towards the unwanted influence of vegetation water and soil properties, it is necessary to indicate that CRNS has very similar issues, as it does not work reliably in highly vegetated, highly prorous, or highly organic soils (Bogena et al. 2013, Rasche et al. 2021, etc).
  
  - We accept this point. We will add some text to give better balance as the referee suggests.
  3. Section 2.1.1.: A proper and unbiased introduction of the COSMOS technique, which is, as was advertised, key to this study, requires more description of the pros and cons. In that sense, the description is actually incomplete. Neutrons are not only sensitive to soil moisture, but to any hydrogen pool in organic matter, vegetation, snow, etc. This is a highly relevant information to assess the performance and quality of your results. Also the fact that COSMOS data is calibrated on actual soil moisture is very relevant, because neutrons are a relative quantity just as the remote sensing data you critisize. Furthermore, Köhli et al. speaks of 15 to 80 cm of sensing depth, why do you mention max. 30 cm depth here? The answer is the wet soil in UK, which brings us back to the fact that limitations of COSMOS have not been properly explained here. Please elaborate on the quality of the CRNS data and provide related citations.
  
  - Same point as #2 above. We will some text to give better balance as the referee suggests.
  # Specific comments
  
  ## Abstract:
  1. The abstract is not logical or at least unclear. You motivate your study by the fact that remote-sensing data, soil hydrological data and vegetation introduce uncertainty. Then you present a solution which involves a remote-sensing product and soil properties. The reader would expect a brief argumentation why this solution solves the previously mentioned issues while it again makes use of them.
  
  - The point we failed to make was that our method reduces uncertainty by integrating multiple data sources, all of which have weaknesses, but together act as a better constraint on the true soil moisture. We will add text to this effect.
  2. The study was further motivated with the fact that remote sensing data have issues to provide absolute soil moisture. The solution presented, however, seems to be good at explaining variation only, with no mention of absolute SM predictions anymore (at least in the abstract). If you raise an issue in the beginning, the reader would expect a reference to it at the end of the story. 
  
  - We accept this point, will clarify this in the revision.
  3. Please use scientific and more concrete language when describing the models used. A "simple model", as the major outcome of your study, is not an adequate description. Can you name it? Is it a statistical or bucket model? Help the reader to categorize the key model of your study among the many existing model variants in hydrology. Similarly, please name or briefly elaborate on "a process-based model" which you mentioned using as a benchmark.
  
  - We will substitute with "linear model", since this is widely understood.
  4. The last sentence does not make sense to me. If there is neglible computation time and assimilation of realtime data, why it lacks behind one week? 
  
  - The referee has misread the sentence. We do not say "assimilation of realtime data". We say "predictions are updated daily, lagging approximately one week behind real time"; it takes about a week for the weather and satellite data to become available. Computation time is <5 seconds for the whole domain, once the input data are available.
  ## Manuscript
  
  Line 26: Consider mentioning also the useful integration depth of this measurement technique. 
  
  - We will add text to this effect.
  Line 33: Can you assign the individual citations to each problem separately, instead of lumping them all at the end of the sentence? Thanks!
  
  - I think all problems apply to all, but will double-check and edit as necessary.
  Line 36: replace "are" by "and" (...influenced)
  
  - No, the "and" is on the next line. "are" is correct here.
  Line 295: "there is no clear pattern to it". Please rephrase. The interpretation of the pattern is scientific research. Just because no reason for the variations has been identified so far, it does not mean that there is no reason or no underlying pattern at all.
  
  - We do not say there is "no reason for the variation", we merely say "there is no clear pattern to it", meaning we cannot interpret it in terms of the information available to us.
  Code availability: it is highly recommended to publish the model code, e.g. in a git repository, as it is common standard for other hydrological models.
  
  - We can publish the code as suggested, but the model itself is only a single line of R code. Most of the code is data wrangling to change between formats and data structures for the inputs, so very task-specific and not very interesting, but happy to make public on GitHub. Unfortunately the meteorological data used is not open-access, so we can't provide a live working version, though we can provide the outputs in this way.
  
  Citation: https://doi.org/10.5194/egusphere-2023-2041-AC2

Peer review completion

AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload

ED: Publish subject to revisions (further review by editor and referees) (13 Feb 2024) by Gerrit H. de Rooij

Dear authors,

The reviews and your replies are such that I believe that a revised version of the paper can be suitable for publication. I therefore request you to provide a revised version of the paper. I made a few notes when I was studying the discussion that I reproduce below, in the hope that they will be of benefit when you revise the paper.

Sincerely yours,

Gerrit de Rooij
Editor

Referee 1

In your reply to the second comment you state that you ‘the COSMOS sites are simply matched to the 2-km square they are located in’., but you do not explain how the matching was performed. Did you simply equate the values for which matching was required?

Referee 2

This referee is the more critical of the two. From the discussion I have the impression that you (the authors) and the referee approach the subject from very different viewpoints. At times this leads to differences of opinion that I consider part of the scientific debate (and therefore not a ground for rejection), and at other times to misunderstandings.

In the former case, the discussion with the referee can be incorporated in the paper by devoting some space in the Introduction to the literature that represents alternative approaches. The referee alludes to this by suggesting to review the literature on hydrological modelling on the relevant scales (main comments 1 and 6). I would like to add that the paper in its current form is somewhat slanted towards the remote sensing aspects and could be more even-handed by devoting attention to the hydrological aspects of the study. This will take some effort but is quite doable, in my assessment. This will help you to better define what the added value of your model is, vis-a-vis the suite of available models. You already do so somewhat tentatively in the paper, and more pointed in your reply to Referee 2. I therefore suspect you very well know what the contribution of your work is, you only need to make sure that the reader knows as well.

The misunderstandings can help you to clarify the paper, especially for those readers who have backgrounds and research interests that are different from yours.

Main comment 2. I am not sure how well you can derive the exponential decay from first principles of soil physics, but I do not think it is necessary – you have several papers to back up the approach.

Main comment 3. In the discussion of this point, the contrasting vantage points of the authors on one hand and the referee on the other are very apparent. I believe you can use the discussion here to clarify the paper for the more hydrologically inclined readers, and also to select additional literature to discuss in the introduction to make the paper more balanced. It may also prove worthwhile to briefly point out the strengths and weaknesses of either viewpoint, which can then result in a line of argument that supports the choice for the type of modelling approach you are advocating in the paper. You already initiated the development of such an argument in your reply to the comment.

Main comment 5. Your reply to this comment is quite interesting. Please work in into the paper in one way or another.

Hide

AR by Peter E. Levy on behalf of the Authors (26 Jul 2024) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (29 Jul 2024) by Gerrit H. de Rooij

RR by Anonymous Referee #1 (29 Aug 2024)

ED: Publish subject to technical corrections (30 Aug 2024) by Gerrit H. de Rooij

AR by Peter E. Levy on behalf of the Authors (10 Sep 2024) Manuscript

Short summary

Having accurate up-to-date maps of soil moisture is important for many purposes. However, current modelled and remotely sensed maps are rather coarse and not very accurate. Here, we demonstrate a simple but accurate approach that is closely linked to direct measurements of soil moisture at a network sites across the UK, to the water balance (precipitation minus drainage and evaporation) measured at a large number of catchments (1212) and to remotely sensed satellite estimates.

Mapping soil moisture across the UK: assimilating cosmic-ray neutron sensors, remotely sensed indices, rainfall radar and catchment water balance data in a Bayesian hierarchical model

Download

Interactive discussion

Peer review completion

Suggestions for revision or reasons for rejection