The reviewer thanks the authors for addressing many of the points raised during the review. There are, however, still some open points which require to be addressed.
The authors' reliance on citation of literature appears to be selective, raising concerns that they may be attempting to back up their arguments through simple reference rather than substantive engagement with the cited works. This practice, characterized by citation on a keyword basis without thorough consideration of contextual nuances, risks oversimplifying complex scientific concepts and extrapolating findings beyond their appropriate scope, which may impact the validity of their arguments. Consequently, this approach undermines the integrity of the scientific discourse by potentially perpetuating misconceptions and failing to address or acknowledge inherent limitations or uncertainties within the literature.
An illustrative example of this tendency is evident in their persistence regarding the potential influence of speculative factors such as cow activity on model offsets, without adequately substantiating these claims or conducting investigations to validate such hypotheses. Additionally, their insistence on employing a variety of statistical performance measures to evaluate a systematically offset model appears to be a misguided attempt to obfuscate inherent shortcomings rather than directly addressing or mitigating them. This approach not only distracts from the clarity and focus of their analysis but also raises doubts about the overall approach of the model and the transparency of the authors in reporting their findings.
Detailed comments to the reply of the authors:
"however, our experiments demonstrated that varying soil depths from 3 to 6 layers did not have a substantial impact on the simulated neutron count results in our model setting" (...)
The reviewer had raised suspicions about the coarse layer structure as hydrologically the topsoil dynamics happens on a much smaller scale and that could have implications on the CRNS signal. The reviewer suspected that there is a scale mismatch between a layering that might be appropriate for hydrological purposes and CRNS which is sensitive to dynamics on a smaller scale. To be precise: In the way the authors have structured their simulations, they reduce most of the signal dynamics measured by the CRP to two layers. The question of the reviewer may have been misleading as the referring of the authors to literature which used a similar layering for other reasons it would is not directly in favor of their argument. The additional material which is presented by the authors does not support the challenged conclusion. They show, that there are significant deviations for the simulations using 3, 5 or 6 layers. As the residuals are better in the three-layer case, one can assume that the authors have chosen the more coarse representation in order to yield better results. Typically, a more fine layered representation would lead to converging results. In case significant deviations or alternating fit qualities, it hints that there are further systematics with respect to that model parameter. As in the original manuscript of the authors the vertical weighting function was presented incorrectly, this was reason enough for the reviewer to suspect a systematic error based on the choice of layers. However, the only conclusion which can be drawn from the material the authors present here is, that the results are (still) biased by the choice of layers. In case the authors chose to model their system less granularly to yield a better fit quality, the unknown system bias might yield systematically wrong representations. Given that using fewer soil layers reduces specifically the residuals in situations where the deviations are surprisingly large, the assumption is that the authors might simply introduce new errors to compensate other model errors. It could be the case that the authors have - involuntarily as due to their arguments - chosen a for CRNS representative reduction of the layering scheme. That, however, would need to be analyzed separately.
"Thank you for bringing up t at the other N-SM conversion functions (...)"
Thank you for providing additional context regarding the selection of N-SM conversion functions in your study. However, the reviewer would like to address a couple of points:
The statement regarding the UCF method from Franz et al. (2013) having low experimental performance in the past, as demonstrated by McJannet et al. (2014) and Baatz et al. (2014), may not accurately represent the broader literature where similar issues have been raised for the N0 method as well. Therefore, it would be more appropriate to acknowledge the mixed findings in the literature rather than categorically dismissing the UCF method based on selected studies.
Regarding the recent UTS method from Köhli et al. (2021), it is important to recognize that while it may require further validation, it still represents a noteworthy advancement in the field of N-SM conversion functions. Dismissing it solely on the basis of its publication date and perceived complexity may overlook potential benefits it could offer, especially if it proves to outperform existing methods in certain scenarios.
Overall, while the reviewer appreciates the thorough explanation provided for the choice of COSMIC and Desilets methods, it is essential to maintain a balanced and nuanced perspective on the various N-SM conversion approaches available in the literature.
"We are using the five most established measures in hydrology to evaluate the model performance (...)"
Thank you for providing insight into your approach to model evaluation and the rationale behind using multiple statistical measures. However, it is important to note that while employing a variety of evaluation metrics can provide a more comprehensive assessment of model performance, it does not inherently address the systematic uncertainties introduced by modeling choices. The use of various performance measures may indeed offer a more nuanced understanding of model behavior, capturing different aspects such as bias, dynamics, and temporal errors. However, it is essential to recognize that these measures are still influenced by the underlying assumptions and parameterizations of the hydrological model itself. As such, simply presenting results that are consistent across multiple measures does not necessarily guarantee robustness in the face of the evident model uncertainties. In addressing the concerns raised by the reviewer, it would be beneficial for the authors to provide a more transparent discussion of the modeling assumptions, limitations, and potential sources of uncertainty. This would help contextualize the interpretation of the evaluation results and provide a clearer understanding of the model's performance and its implications for CRNS measurement in combination with hydrological modeling.
"As was demonstrated in a large number of accepted literature, COSMIC is an analytical-based model incorporating key physics-based processes important for CRNS applications in conjunction with models (...)"
The authors might mistake an "analytical model" for a "physics-based model". While it is true that COSMIC is an analytical forward-operator which mimics physical processes it does not mean that there are actually representations of physical processes. The authors here are probably subject to a 'non-sequitur' error. Correlation does not mean causality. COSMIC eventually uses an exponential function with empirically determined parameters. These parameters unfortunately do not correspond to the physical quantities they are supposed to stand for. The COSMIC approach is similar to the UCF approach by Franz et al. (2013). Desilets before used a hyperbola for describing the N-SM relation. Köhli et al. (2021) showed that the most realistic representation of the measured CRP intensity can be described by a combination of hyperbola and exponential function. As far as an exponential function can represent the N-SM relation to some extent, it does not mean COSMIC with its exponential function represents physical processes in any way. The be clear on that point: COSMIC does not use in its analytical description any physically correct attenuation lengths, instead, if one would require to use such instead of empirical parameter adaptations the equation would not work. Furthermore:
COSMIC is described as focusing solely on the influence of locally and directly transported neutrons to the detector, which suggests it may not adequately represent the complete physics of the CRNS method. This limitation could undermine its claim to be a physics-based model if it neglects important physical processes. COSMIC makes several assumptions, such as the belief that neutrons in the soil are only produced by other high-energy neutrons, which may not accurately reflect the true physics of neutron interactions in the environment. This suggests a potential oversimplification or misunderstanding of the underlying physical principles. Additionally, researchers have raised criticism regarding the accuracy of mathematical formulations and calculations within COSMIC, including errors in integrating equations in cylindrical coordinates and inaccuracies in mathematical expressions. These issues suggest a lack of rigor and precision in the model's implementation, which is crucial for any model claiming to be physics-based.
As the authors in a later statement pave the ground for an entirely opportunistic choice "Hence, we consider the 1D model assumptions adequate for the target application.", the reviewer asks the authors to acknowledge this in their manuscript.
"Thank you for this remark. The site-specific nature of the N0 parameter is a well-recognized aspect within the Cosmic-Ray Neutron Sensor (CRNS) community (...)"
That response unfortunately contains several logical shortcomings:
- The authors cite various studies to support the assertion that N0 is site-specific. However, the mere mention of previous research without providing specific evidence or logical reasoning does not sufficiently substantiate the claim. Additionally, the observation of non-identical N0 values across different sensors, as noted in the referenced studies, does not inherently establish the site-specificity of N0, especially as the authors also cite studies which have shown the opposite of consistent N0 values.
- The author suggests that because the mHM model does not explicitly incorporate site-specific influences on N0, the value of N0 is inferred solely through the calibration procedure. This oversimplified inference overlooks potential complexities involved in determining the site-specific nature of N0 and assumes that the model's omission of certain factors implies their negligible impact on the N0 determination.
As the authors state in their response: Other factors beyond site characteristics may influence the calibration process. This neither means that the N0 parameter should be site-specific nor theoretically is site-specific.
"We are here referring to equation 8 as mentioned in the text. As was explained, it resembles an integral of the
vertical neutron transport, geometrically projected to the vertical axis."
The authors probably mean "projected integral". The term "geometric integral" is already used in mathematics for a different type of calculation procedure.
"Evidence for the influence of crowding cows at this site"
As the authors themselves state in their reply that there is no evidence that this is a relevant influence factor, the reviewer asks the authors to remove such distracting assumptions.
"Here the consistency in simulating neutron count variability means that mHM has the capability of capturing the general trend and pattern of the simulated data (...)”
Thank you for the clarification provided regarding the interpretation of the statement regarding the consistency in simulating neutron count variability. However, the explanation provided does not fully address the concern raised by the reviewer regarding the potentially misleading nature of highlighting only the top 1% of model runs in combination with potential systematic biases. While the reviewer acknowledges that the top 1% of model runs may demonstrate in some cases a stable performance in terms of overall trend and pattern capture or extentension to a large subset of model runs, the reviewer lacks the understanding of the statistical deviations and variations the authors present. With each subset of model runs showing an inconsistent variability it is not easy to follow the arguments brought up by the authors based for example on seasonality. As brought up earlier, by selectively highlighting only the best-performing model runs, there is a risk of overlooking potential weaknesses or limitations in the model performance across a wider range of conditions and scenarios. |