|1. Recommend changing manuscript title|
Should change to,
“Regionalization with hierarchical hydrologic similarity and ex-situ data in the context of groundwater recharge estimation at ungauged watersheds”
In the context of – would help emphasize that the method could also be applied to many other processes or study subjects.
groundwater recharge - You estimate LNR not mean annual groundwater recharge
2. Replace mean annual groundwater recharge with a “recharge signature which will be introduced in section ?” or something similar.
Page 4, Line 30 (Introduction): You say you estimate mean annual groundwater recharge but you estimate LNR.
3. Change thresholds for colours in colourbar of figure 3.
Maps don’t show that much spatial variability in recharge, which could be highlighted more if change thresholds.
Also, would be more intuitive if colours got darker with increasing recharge rates.
4. Write section not sect.
Page 17, Line 9
Page 20, Results, Line 5.
Page 21, Line 18
5. Need clearer splitting criteria labels in Figure 9 or explain/describe splitting criteria showing in your trees in a legend.
Page 25: labels such as PPT02, SLP_deg won’t be understood easily by readers who aren’t thoroughly reading the paper and with BGEOL_147 or NLCD01 readers will likely not understand it.
6. Highlight how future users of this method or other data mining methods should be selective about which characteristics they use.
Page 26, Lines 11-14: You make a very nice point about how one variable organic matter content may hold much of the same information as AWC. This can sometimes lead to variables competing against each other as to which one is more important, but in reality they are likely describing the same thing. Therefore, should probably just use the variable which identifies the processes you are interested in the best.
Similarly, sand and clay fractions are likely to be highly negatively correlated, so could use the ratio of sand to clay instead of separate sand and clay fractions. I’m not suggesting this for this paper as your main message is the methodology. However, I think it is important that any future users of the methodology be warned against an approach of chucking all the available information into a machine learning method and just watching what comes out. We should be more thoughtful as to which variables may be important for the processes we are interested in.
Below are some media articles highlighting some of the opinions of a researcher at Rice university in Houston about using machine learning in science.
6. Sentence needs rephrasing
Page 28, Lines 17-18: “difficult to make physical interpretation our of the results in figure 10” – sounds a bit strange.
Page 33, bottom line: “In a difference case” – should be in a different case.
7. Should have sub-sections for each of the limitations mentioned in section 5.3 and they should appear in the same order they are written in the text.
Page 30, Lines 21-22: state three limitations, but only have sub-sections 5.3.1 and 5.3.2
8. Sub-section numbering need correcting
5.3.1 The scale of target response.
5.3.1 The MRB-based… (should be 5.3.2)
5.0.1 Limited temporal data coverage (should this be 5.3.3 or 5.4)
5.0.2 Non-comprehensive list… (should this be 5.3.4 or 5.4 or 5.5)
9. The storyline is now much clearer and it is easier to follow your key messages and contributions.
Additions to the introduction and discussion have really helped in making a clear story and ensure your main messages are understood by the reader.
The methodology and case study sections are now much easier to follow and I like the examples you give to explain some of your terminology (P11, lines 16-22).