|This study uses an interesting innovative approach to estimate transport rates and lag time of NO3 in the unsaturated and saturated zone from groundwater NO3 data using random forests. However, the manuscript could be better streamlined according to the messages; especially the method chapter is quite lengthy, repetitive and hard to understand (it is longer than results and discussion). This unfortunately also hampers a clear understanding of the results and discussion chapter in some parts. Some concrete suggestions are given in the detailed comments, though they are not complete and the method sections needs to be revised. The framework is promising and could replace expensive isotope sampling, therefore, I suggest publication after including the requested changes.|
Why do you use only vertical velocities? For the unsaturated zone, this concept is common, but I am uncertain why this is used for the saturated zone as well. I did not understand the assumption of shallow groundwater in this context (L.253). Please, clarify this more. Why would it be important to know travel times in vertical dimension without horizontal component? How can the flow be dominantly vertical if the aquifer bottom is considered a no-flow boundary? I think this limitation should also be discussed in the discussion.
Why do you use the importance of the total travel time to infer transport rates? This argument is not clear for me.
L 17: I do not understand this sentence, though I think I know what you mean. Consider reformulation.
L. 38-39 I do not understand this sentence. What do you mean by “responses … can be complicated by uncertainties”? Do you mean predictions of responses?
L 48 Are lag times the same as groundwater ages for you?
L 60-61. Why is the screen depth a proxy for lag times? Wouldn’t this only be the case for homogeneous settings and only for vertical movement?
Section 2.1 reads very long-winded and not all detail is needed later in the discussion. Please, consider condensing this section.
L 114 What are nested wells? Wells with more than one screen?
Section 2.2 and 2.3: In my opinion, these two sections read quite unfocused and I think they could be condensed and merged into one section. E.g. L. 183-187 This seems unnecessary to explain as this is the principle of random forest which can be easily referenced from literature. Similarly, l 187-196 can be shortened into one sentence saying which approach was used to evaluate variable importance. Actually, 2.2. reads as a summary of what will be explained in 2.3-2.5, which is not necessary. I think you can easily remove it.
L 169: This sentence seems redundant
L 170 Why did you use nested resampling? I do not see that you discuss those results later. If you used it for tuning, please specify this including the tuning method and parameters.
L. 174 I would suggest the authors to be clearer as permutation importance and pdps are not used to evaluate the model performance but to interpret the model results.
L 174-182 One sentence is enough to say that you use NSE to evaluate model performance.
L 198 This sentence is vague and not informative. I would even say it is not correct actually. Within one pdp only one variable (x-axis) is varies while for the others mean values are used. Please reformulate or delete. In my opinion, this paragraph can also be shortened.
L 204-208 Can be merged into one sentence. It is not relevant to know that data were imported and clipped, just that you selected the “wells within the study area with a corresponding depth …” would be enough.
L 233: I am sorry, I did not understand how you use the dynamic descriptors. “annual median [NO3] was assigned a lagged dynamic value”, what exactly does that mean? And are the dynamic variables also variable in space?
L 267-269 this sentence in unnecessary like it is. You already state before what range of recharge rates you used. Or maybe the second part could be supported with references: what are those realistic mean values and why are they realistic?
L 264-280 Could the calculated parameter ranges of the two paragraphs not be easily presented in a table? It would increase the understanding and not read that lengthy. The text could be strongly shortened then.
L 285-288 Why did you take this assumption? You could also have used various features derived from the dynamic predictors including mean values over several years. Considering the uncertainty in travel time estimates, I think this approach is not well supported. However, it is still slightly unclear to me how you used the dynamic values and linked them to the annual median [NO3]. Did you assign one past value to each well and total lag time (for one transport rate combination)? If so, why do think this is possible if we also have horizontal groundwater movement and the annual median [NO3] is not only dependent on vertical travel times at the well location but on vertical travel times somewhere else plus the horizontal travel times. Please, clarify. Maybe a conceptual figure would help to understand how you link dynamic descriptor values to certain wells and corresponding annual median [NO3].
L 289-293 This part is jumping back to RF application. I think the method section should be restructured with the concept of travel times being presented first and then the RF application used to predict the travel times.
Results and Discussion (L. 297- 425)
L 299 Which is the “initial model”? I cannot follow. This should be stated more clearly in the methods.
L 325-335 This paragraph belongs to the methods. Why do you use the variable importance of total travel times to determine the “optimal” transport rates? I struggle to understand why this approach is promising.
L. 338 Do you think these ratios are transferable to other locations? I would expect this to be site specific. This is not clearly stated here, as the statement sounds rather general.
L 345 “second analysis”? What was the paragraph before then? This needs to be specified in the methods.
L 346 How uncertain do you see those values considering the equifinality within the Vs/Vu ratios shown in Fig. 6? Is it meaningful to select one model in this case? Are there more ways to constrain the equifinality problem except data of recharge rates?
L 355. “mean recharge value derived from groundwater ages in intermediate wells (1.22 m/yr, n = 13).” Please provide the reference here.
3.3 Section: What data or prior knowledge would someone need to apply your new approach to estimate lag times? Would you trust to apply the method without having groundwater ages to validate? Do you think this method also works in areas with higher denitrification impact where NO3 is less conservatively transported? Could the approach be hampered if denitrification potential or other subsurface conditions are heterogeneous (e.g. linked to hot spots such as pyrite lenses or hydraulic conductivity)?
L 440 What do you mean by “comparisons of data-driven analyses with complementary datasets”