Determination of vadose zone and saturated zone nitrate lag times using long-term groundwater monitoring data and  statistical machine learning

Wells, Martin J.; Gilmore, Troy E.; Nelson, Natalie; Mittelstet, Aaron; Böhlke, John K.

doi:https://doi.org/10.5194/hess-25-811-2021

Articles | Volume 25, issue 2

https://doi.org/10.5194/hess-25-811-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/hess-25-811-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 25, issue 2

Research article

|

19 Feb 2021

Research article |

| 19 Feb 2021

Determination of vadose zone and saturated zone nitrate lag times using long-term groundwater monitoring data and statistical machine learning

Martin J. Wells, Troy E. Gilmore, Natalie Nelson, Aaron Mittelstet, and John K. Böhlke

Download

Final revised paper (published on 19 Feb 2021)
Supplement to the final revised paper
Preprint (discussion started on 05 May 2020)
Supplement to the preprint

Interactive discussion

Status: closed

AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment

- Printer-friendly version

- Supplement

RC1: 'Comments', Anonymous Referee #1, 05 Jun 2020
- AC1: 'Authors' Final Response', Troy E. Gilmore, 09 Aug 2020
RC2: 'Referee comments', Scott Gardner, 20 Jun 2020
- AC1: 'Authors' Final Response', Troy E. Gilmore, 09 Aug 2020
EC1: 'Start uploading preliminary responses', Nunzio Romano, 06 Jul 2020
RC3: 'Interactive comment on “Determination of vadose and saturated-zone nitrate lag times using long-term groundwater monitoring data and statistical machine learning” by Martin J. Wells et al.', Sophie Ehrhardt, 14 Jul 2020
- AC1: 'Authors' Final Response', Troy E. Gilmore, 09 Aug 2020
SC1: 'USGS Co-Author Disclaimer Note', Martin Wells, 16 Jul 2020
SC2: 'Initial Response to Reviewers 1 and 2', Martin Wells, 16 Jul 2020
SC3: 'Final Response to Reviewers', Martin Wells, 24 Jul 2020

Peer-review completion

AR: Author's response | RR: Referee report | ED: Editor decision

ED: Reconsider after major revisions (further review by editor and referees) (22 Aug 2020) by Nunzio Romano

AR by Troy E. Gilmore on behalf of the Authors (09 Oct 2020) Author's response Manuscript

ED: Referee Nomination & Report Request started (14 Oct 2020) by Nunzio Romano

RR by Scott Gardner (14 Nov 2020)

RR by Pia Ebeling (26 Nov 2020)

Suggestions for revision or reasons for rejection

This study uses an interesting innovative approach to estimate transport rates and lag time of NO3 in the unsaturated and saturated zone from groundwater NO3 data using random forests. However, the manuscript could be better streamlined according to the messages; especially the method chapter is quite lengthy, repetitive and hard to understand (it is longer than results and discussion). This unfortunately also hampers a clear understanding of the results and discussion chapter in some parts. Some concrete suggestions are given in the detailed comments, though they are not complete and the method sections needs to be revised. The framework is promising and could replace expensive isotope sampling, therefore, I suggest publication after including the requested changes.

Major concerns
Why do you use only vertical velocities? For the unsaturated zone, this concept is common, but I am uncertain why this is used for the saturated zone as well. I did not understand the assumption of shallow groundwater in this context (L.253). Please, clarify this more. Why would it be important to know travel times in vertical dimension without horizontal component? How can the flow be dominantly vertical if the aquifer bottom is considered a no-flow boundary? I think this limitation should also be discussed in the discussion.
Why do you use the importance of the total travel time to infer transport rates? This argument is not clear for me.

Abstract:
L 17: I do not understand this sentence, though I think I know what you mean. Consider reformulation.
Introduction:
L. 38-39 I do not understand this sentence. What do you mean by “responses … can be complicated by uncertainties”? Do you mean predictions of responses?
L 48 Are lag times the same as groundwater ages for you?
L 60-61. Why is the screen depth a proxy for lag times? Wouldn’t this only be the case for homogeneous settings and only for vertical movement?
Methods (74-290):
Section 2.1 reads very long-winded and not all detail is needed later in the discussion. Please, consider condensing this section.
L 114 What are nested wells? Wells with more than one screen?
Section 2.2 and 2.3: In my opinion, these two sections read quite unfocused and I think they could be condensed and merged into one section. E.g. L. 183-187 This seems unnecessary to explain as this is the principle of random forest which can be easily referenced from literature. Similarly, l 187-196 can be shortened into one sentence saying which approach was used to evaluate variable importance. Actually, 2.2. reads as a summary of what will be explained in 2.3-2.5, which is not necessary. I think you can easily remove it.
L 169: This sentence seems redundant
L 170 Why did you use nested resampling? I do not see that you discuss those results later. If you used it for tuning, please specify this including the tuning method and parameters.
L. 174 I would suggest the authors to be clearer as permutation importance and pdps are not used to evaluate the model performance but to interpret the model results.
L 174-182 One sentence is enough to say that you use NSE to evaluate model performance.
L 198 This sentence is vague and not informative. I would even say it is not correct actually. Within one pdp only one variable (x-axis) is varies while for the others mean values are used. Please reformulate or delete. In my opinion, this paragraph can also be shortened.
L 204-208 Can be merged into one sentence. It is not relevant to know that data were imported and clipped, just that you selected the “wells within the study area with a corresponding depth …” would be enough.
L 233: I am sorry, I did not understand how you use the dynamic descriptors. “annual median [NO3] was assigned a lagged dynamic value”, what exactly does that mean? And are the dynamic variables also variable in space?
L 267-269 this sentence in unnecessary like it is. You already state before what range of recharge rates you used. Or maybe the second part could be supported with references: what are those realistic mean values and why are they realistic?
L 264-280 Could the calculated parameter ranges of the two paragraphs not be easily presented in a table? It would increase the understanding and not read that lengthy. The text could be strongly shortened then.
L 285-288 Why did you take this assumption? You could also have used various features derived from the dynamic predictors including mean values over several years. Considering the uncertainty in travel time estimates, I think this approach is not well supported. However, it is still slightly unclear to me how you used the dynamic values and linked them to the annual median [NO3]. Did you assign one past value to each well and total lag time (for one transport rate combination)? If so, why do think this is possible if we also have horizontal groundwater movement and the annual median [NO3] is not only dependent on vertical travel times at the well location but on vertical travel times somewhere else plus the horizontal travel times. Please, clarify. Maybe a conceptual figure would help to understand how you link dynamic descriptor values to certain wells and corresponding annual median [NO3].
L 289-293 This part is jumping back to RF application. I think the method section should be restructured with the concept of travel times being presented first and then the RF application used to predict the travel times.
Results and Discussion (L. 297- 425)
L 299 Which is the “initial model”? I cannot follow. This should be stated more clearly in the methods.
L 325-335 This paragraph belongs to the methods. Why do you use the variable importance of total travel times to determine the “optimal” transport rates? I struggle to understand why this approach is promising.
L. 338 Do you think these ratios are transferable to other locations? I would expect this to be site specific. This is not clearly stated here, as the statement sounds rather general.
L 345 “second analysis”? What was the paragraph before then? This needs to be specified in the methods.
L 346 How uncertain do you see those values considering the equifinality within the Vs/Vu ratios shown in Fig. 6? Is it meaningful to select one model in this case? Are there more ways to constrain the equifinality problem except data of recharge rates?
L 355. “mean recharge value derived from groundwater ages in intermediate wells (1.22 m/yr, n = 13).” Please provide the reference here.
3.3 Section: What data or prior knowledge would someone need to apply your new approach to estimate lag times? Would you trust to apply the method without having groundwater ages to validate? Do you think this method also works in areas with higher denitrification impact where NO3 is less conservatively transported? Could the approach be hampered if denitrification potential or other subsurface conditions are heterogeneous (e.g. linked to hot spots such as pyrite lenses or hydraulic conductivity)?
Conclusion:
L 440 What do you mean by “comparisons of data-driven analyses with complementary datasets”
Figures:
L 686:”Vs/Vu”

Hide

ED: Publish subject to minor revisions (review by editor) (28 Nov 2020) by Nunzio Romano

AR by Troy E. Gilmore on behalf of the Authors (09 Dec 2020) Author's response Manuscript

ED: Publish as is (27 Dec 2020) by Nunzio Romano

AR by Troy E. Gilmore on behalf of the Authors (04 Jan 2021) Manuscript

Short summary

Groundwater in many agricultural areas contains high levels of nitrate, which is a concern for drinking water supplies. The rate at which nitrate moves through the subsurface is a critical piece of information for predicting how quickly groundwater nitrate levels may improve after agricultural producers change their approach to managing crop water and fertilizers. In this study, we explored a new statistical modeling approach to determine rates at which nitrate moves into and through an aquifer.