I appreciate the authors' effort to improve the presentation of the material, but the paper to me still feels like an overwhelming amount of test results which are only partially digested and made sense of. The authors obviously disagree with my opinion - we'll have to agree to disagree on this one. I still think that too much information necessary to understand the paper is given in the supplementary material and is difficult to identify due to the enormous amount of information the authors share with the reader.
I think the authors have only partially replied to some of comments - although the new manuscript does address many of my concerns and improves the presentation. I would still recommend to the authors to have one more round of proof-reading: there are still many grammatical mistakes.
I will just here make some further comments, so are re-iteration of some of the comments which I feel the authors did not do full justice to. I believe these concerns relate to the very core of the analysis and question the validity and robustness of the findings in the manuscript.
When I said the authors do not fit IDF curves I meant that as far as I can tell there is not attempt made in the estimation procedure to ensure that design events for longer durations are larger or equal to design events of a shorter duration. I would expect the intensity of 100-year event for the 30-minute event to be <= than the 100-year event for the 15-minute event. In the plots shown by the authors (for example in Figure 4) this is the case mostly (but not for the Startford WWTP station when looking at the 15- and 30-minute boxplots), but the modelling does not enforce this. Enforcing the consistency of the IDF curves is one of the challenges of the joint modelling of rainfall of different durations, which the authors don't tackle in my opinion (and that's fine as long as this is acknowledged). This lack of shape enforcement is also the cause of the funny kinks seen in Figure 8 and S15, which are the result of fitting separate distributions to the series of different durations.
The authors disagree with my take on the issue in Page 2 of the "Authors' replies" document, where they state they have developed IDF curves.
In their reply to my comment given in Page 3 of the "Authors' replies" document I feel the authors have not really replied to my Comment: they just re-iterate the fact that they use a lot of tests as it was previously done in the literature, but do not acknowledge the possible issues connected to multiple comparisons. The moment in which one begins to do a lot of statistical tests is exactly the time where the issue of multiple comparisons becomes an issue. It is ok to do several tests on your data, but you are bound to have some false positives. Just adding more tests in the mix will not help necessarily in having a clearer view of possible changes in the series, as there will necessarily be some spurious result due to randomness.
In their reply to my further comment 7 given in Page 8 of the "Authors' replies" document I feel the authors have not addressed my point, but have just given some good reasons of why to use non-parametric tests when detecting changes in some series. Again, I am asking what is the point of doing all of these non-parametric tests if the information derived from the non-parametric tests is not used in any way the parametric modelling, which is the basis for all the comparison against the present day approach. If the Pettitt's test identify a change point should 't we also test the presence of step-changes in the parametric modeling. If we don't use the non-parametric tests in some way: why bother to perform them.
My perception is that they conclusions mostly rely on the non-stationary GEV-fits with linear trends and the comparisons of the non-stationary DSI and EC curves. So again, why use a Pettitt test if they claim themselves in the reply to my comment "In practice, when real change point is unknown, often Mann-Whitney test, in general, does not work well and the Pettitt method can yield plausible change point location along with its statistical significance. However, the significance of the Pettitt test can be obtained using an approximated limiting distribution. Therefore, above tests were needed in the current setting." Why doing a Pettitt test and then not try to use a parametric model to express the potential step changes in the series?
Overall I feel the authors do a lot of analysis which they end up not using in the final modelling: for example they have included the Bayes factor but only rely on it when it gives evidence in favour of non--stationary models, using the AICc when BF doesn't do what they want it to do. It's redundant to the reader to give results which end up not being used to draw conclusions.
Regarding the AICc - the authors now make it clearer that they use a very specific form of the AIC, which is not the standard one: I welcome this further clarification, but would drop the reference to the Akaike 1974 paper which introduces the traditional AIC. It is not clear to me why the authors decide to use this version of the AIC rather than the traditional one - they don't really give a reason for their choice. I would also imagine that if they had used the standard version of the AIC maybe the outcome of using AIC and BF would have been slightly more in agreement, since they both use some form of the likelihood.
Comment 11 (Page 9 of the "Authors' replies" document): I thanks the author for the explanation, although if understand correctly, the 100-year event as it calculate now would be the biggest event in the 100 year starting from the year in which recording began. So say \mu_1 is positive and \sigma is constant, we have the 100-year event to coincide with the median of the posterior distribution of the 100-year event for the time t_100. This implies that the authors are extrapolating the effects detected with the current series to the future, and also does not somehow take into account that we would need some decades before hitting that maximum value for the 100-year event. There is quite a bit of research on how to update design event concept for changing extremes (for example Rootzen and Katz), which is a further challenge for engineers after trends have been detected in hydrometric series.
In Figure 3 it is still not very clear to me how the authors decide when to have a coloured triangle and when to include a cross. You perform several change-point tests, so does the triangle get a colour if any of the change-point tests is significant (and what it they give indication inof change points in different years?). Similarly, which non-stationaruty test warrants for a cross to appear in the triangle: this needs to be mentioned either in the caption or in the main text.
SI 2.2: After eq. 2.6: the term "power" has a specific meaning in statistics, which is actually almost the opposite of the p-value. Use the term "significance".
SI 3.1: after eq. 3.3, Where should have a small w. Further (and more importantly) from the text it seems that authors are confused to what is a posterior and what is a likelihood in a Bayesian context: p(\omega|y) and p(\lambda|y,x) are posterior distributions, while p(y|\omega) and p(y|\lambda,x) are likelihood functions. In Bayesian inference one uses the posterior to make inference (and not to infer parameters). The sentence as it stands now makes no sense.
I would move the sentence "(often referred to as the burn-in period)" to after the description of what the burn-in period is, i.e. the next sentence.
The authors never specify the priors used in the analysis as far as I can tell: this is a very relevant piece of information which is missing.
SI 3.2.1: it is confusing to have the parameters being denoted with $\theta$, rather than $\omega$ and $\lambda$ as in the previous section.
Further small comments
Typo in title Require not requires
Line 1 of the abstract: no need for the word "increased", since "risen" is used
Page 12 Line 30: missing a (decrease) and drop the vice-versa. Alternatively, drop the (decrease) in line 29
References
Rootzén, H. and Katz, R.W., 2013. Design life level: quantifying risk in a changing climate. Water Resources Research, 49(9), pp.5964-5972. |