the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Non-asymptotic distributions of water extremes: Superlative or superfluous?
Abstract. Non-asymptotic (π©π) probability distributions of block maxima (BM) have been proposed as an alternative to asymptotic distributions of BM derived by classic extreme value theory (EVT). Their advantage should be the inclusion of moderate quantiles as well as extremes in the inference procedures. This would increase the amount of used information and reduce the uncertainty characterizing the inference based on short samples of BM or peaks over high threshold. In this study, we show that π©π distributions of BM suffer from two main drawbacks that make them of little usefulness for practical applications. Firstly, unlike classic EVT distributions, π©π models of BM imply the preliminary definition of their conditional parent distributions, which explicitly appears in their expression. However, when such conditional parent distributions are known or estimated also the unconditional parent distribution is readily available, and the corresponding π©π distribution of BM is no longer needed, as it is just an approximation of the upper tail of the parent. Secondly, when declustering procedures are used to remove autocorrelation characterizing hydro-climatic records, π©π distributions of BM devised for independent data are strongly biased even if the original process exhibits low/moderate autocorrelation. On the other hand, π©π distributions of BM accounting for autocorrelation are less biased but still of little practical usefulness. Such conclusions are supported by theoretical arguments, Monte Carlo simulations, and re-analysis of sea level data.
- Preprint
(1657 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
CC1: 'Comment on hess-2023-234', Sarah Han, 27 Oct 2023
I think that this manuscript is very dispersive. I suggest to insert (in the first part) at least one flow chart and one figure, in order to facilitate the understanding of all the steps for a common reader.
This is the difference between a very good scientific paper and a common one; in the latter case there could be the risk that only the authors and a very small set of readers can deeply understand the work!
Font size of Figures 2 and 3 seems very small! A very good dissemination of results also requests suitable figures. I suggest to enlarge the dimensions and (mainly for Figure 3) to create two figures, aimed at a better visualization (and understanding) of the plots.
Figures 4, 5 and 6: is logarithmic the scale of the vertical axes? I suppose it, but it is better if Author specify it along the text (or in the captions). This is always for a clear presentation of the results to all the readers in the scientific community.
Sections 5 and 6: my opinion about the βdispersionβ is also confirmed by the presence of mathematical formulas in the second part of this manuscript. Indeed, the whole methodology description should be placed in the first part of a scientific paper, while the second part should be only dedicated to the discussion of the results.
Β
Overall, this manuscript seems to suffer from two issues:
- a not so clear (for all) presentation of the methodology;
- self-referentiality: in the references part I counted twenty papers of Serinaldi, and this seems not so elegant in the scientific communityβ¦
Β
Sincerely
Citation: https://doi.org/10.5194/hess-2023-234-CC1 - AC3: 'Reply on CC1', Francesco Serinaldi, 14 Jun 2024
-
RC1: 'Comment on hess-2023-234', Anonymous Referee #1, 18 Mar 2024
A classic choice in the statistical modelling of extremes is between (a) constructing a detailed model of an entire process, from which its extremal properties can be estimated, either analytically or more usually by numerical methods, or (b) direct modelling of the extremes themselves.Β If adequate reliable data are available and the investigator has sufficient time, then approach (a) allows information from other sources (such as physical models) to be included at the modelling stage and has the benefit that estimates of all quantities, including extremes, stem from a single overall model and therefore are consistent.Β However this approach is demanding of data and of time, and makes the implicit assumption that the details of the underlying process are relevant to the extremes.Β Approach (b) is less demanding of data and avoids detailed modelling by applying the classical theorems of extreme-value theory (EVT) to block maxima or threshold exceedances for the phenomenon of interest.Β Although originally developed for independent and identically distributed observations, these theorems have been shown to be robust to plausible types of dependence in the underlying data, and have been widely and generally successfully applied in environmental settings.Β They can be regarded as semiparametric models, in the sense that they do not depend heavily on the underlying process.Β A major concern is that they rely on limiting approximations (the GEV and GPD) that may fit data at observed levels satisfactorily but extrapolate poorly to unobserved levels.Β Such models provide a simple and direct empirical approach to modelling extremes of the underlying phenomenon but it may be a struggle to incorporate physical constraints or other background knowledge into them.Β
The paper under discussion can be viewed as a critique of a particular type (a) approach, namely metastatistical extreme-value (MEV)Β modelling, from the viewpoint of a classical type (b) approach, namely the fitting of GEV and GPD models to block maxima and threshold exceedances.Β There are two main criticisms:
- that papers proposing MEV have done so by application to and illustration on `real dataβ, in which the true data-generating mechanism is unknown, which implies that it is impossible to compare the behaviour of different approaches under ideal conditions (when the target of inference is known);
- that in any case the comparisons are incorrect, because of confusion over the target of inference (see Figure 7).Β Here the point is more subtle, but it is summarised in equation (16) of the paper.Β The point here is that if one is estimating a quantile function $Q_\theta(p)=F^{-1}(p;\theta)$ that depends on an unknown parameter $\theta$ and one will estimate $\theta$ from a single sample using an estimator $\hat\theta$, then the estimator of $Q_\theta(p)$ is $Q_{\hat\theta}(p)$, whose properties should be assessed over repeated sampling using independent replicates $Q_{\hat\theta_1}(p), \ldots, Q_{\hat\theta_S}(p)$ based on $S$ samples leading to estimates $\hat\theta_1,\ldots, \hat\theta_S$. Β The average of these estimates would be $S^{-1} \sum_{s=1}^S Q_{\hat\theta_s}(p)$, i.e., the right-hand side of (16), rather than $Q_{\bar \hat\theta}(p)$ (the left-hand side of (16)), where $\bar\hat\theta$ is the average of the parameter estimates for the $S$ samples.Β The paper under discussion illustrates the difference via the left- and right-hand panels of Figure 7.Β (Though the discussion at lines 532-535 leaves it unclear how the `median GEV/Gumbelβ curves are computed β the median for each $p$, giving a result that would not corresponding to any single quantile function, or what?Β And if the median, why not the mean?)
Both of these criticisms seem to me to be correct, and they should in my view embarrass the reviewers of the original MEV papers and the journals that published them.
I found the paper to be quite poorly written, to the point of unclarity in numerous places, including lines 405 (`the spreader β¦β?), 429 (` β¦, or better, β¦β?) or 524 (what is a predictive quantile function of a predictive quantile function?), and with many minor errors.Β Examples of the latter are that (i) the Beirlant et al book cited at line 18 was published in 2004, not 2006, and (ii) stating on line 476 that the distribution of an order statistic is beta is incorrect β the beta distribution represents variables on a finite interval, and clearly this does not apply to order statistics from, say, a Gaussian sample (did the authors mean that the distribution of an order statistic can be represented _using_ that of a beta random variable?), (iii) equation (17), the left-hand side of which is a function of $z$, while the right-hand side is a number (as the expectation of $Z$ is a constant), and (iv) at line 461, where results from a simulation study are `eventually used to build confidence intervalsβ β but in a simulation study the truth is known, so confidence intervals are not needed β as a confidence interval is based on a single sample, we have to guess that the authors mean that their $S$ return level estimates will be used to compute quantiles of a distribution.Β The paper is full of inaccuracies of this sort, so the reader is continually wondering `is that correct?β and concluding `not quiteβ; this does not give confidence in the main results. Β It is the role of the authors to produce a well-crafted article, not that of a reviewer, so I will not give more examples (it would take many pages to list them all), but generally I found the writing to be unclear, long-winded, and in need of a careful review by a native English-speaker (see, e.g., line 514).Β Reducing the paper radically by revising and trimming the text throughout would improve it.Β I would also target much of the general text in the earlier sections for cuts, since it is mostly not germane to the criticism of the MEV work. Β A 15-page paper in the current format would make the main points more clearly and should be more readable. Β
Citation: https://doi.org/10.5194/hess-2023-234-RC1 - AC1: 'Reply on RC1', Francesco Serinaldi, 14 Jun 2024
-
RC2: 'Comment on hess-2023-234', Francesco Marra, 20 May 2024
- AC2: 'Reply on RC2', Francesco Serinaldi, 14 Jun 2024
Status: closed
-
CC1: 'Comment on hess-2023-234', Sarah Han, 27 Oct 2023
I think that this manuscript is very dispersive. I suggest to insert (in the first part) at least one flow chart and one figure, in order to facilitate the understanding of all the steps for a common reader.
This is the difference between a very good scientific paper and a common one; in the latter case there could be the risk that only the authors and a very small set of readers can deeply understand the work!
Font size of Figures 2 and 3 seems very small! A very good dissemination of results also requests suitable figures. I suggest to enlarge the dimensions and (mainly for Figure 3) to create two figures, aimed at a better visualization (and understanding) of the plots.
Figures 4, 5 and 6: is logarithmic the scale of the vertical axes? I suppose it, but it is better if Author specify it along the text (or in the captions). This is always for a clear presentation of the results to all the readers in the scientific community.
Sections 5 and 6: my opinion about the βdispersionβ is also confirmed by the presence of mathematical formulas in the second part of this manuscript. Indeed, the whole methodology description should be placed in the first part of a scientific paper, while the second part should be only dedicated to the discussion of the results.
Β
Overall, this manuscript seems to suffer from two issues:
- a not so clear (for all) presentation of the methodology;
- self-referentiality: in the references part I counted twenty papers of Serinaldi, and this seems not so elegant in the scientific communityβ¦
Β
Sincerely
Citation: https://doi.org/10.5194/hess-2023-234-CC1 - AC3: 'Reply on CC1', Francesco Serinaldi, 14 Jun 2024
-
RC1: 'Comment on hess-2023-234', Anonymous Referee #1, 18 Mar 2024
A classic choice in the statistical modelling of extremes is between (a) constructing a detailed model of an entire process, from which its extremal properties can be estimated, either analytically or more usually by numerical methods, or (b) direct modelling of the extremes themselves.Β If adequate reliable data are available and the investigator has sufficient time, then approach (a) allows information from other sources (such as physical models) to be included at the modelling stage and has the benefit that estimates of all quantities, including extremes, stem from a single overall model and therefore are consistent.Β However this approach is demanding of data and of time, and makes the implicit assumption that the details of the underlying process are relevant to the extremes.Β Approach (b) is less demanding of data and avoids detailed modelling by applying the classical theorems of extreme-value theory (EVT) to block maxima or threshold exceedances for the phenomenon of interest.Β Although originally developed for independent and identically distributed observations, these theorems have been shown to be robust to plausible types of dependence in the underlying data, and have been widely and generally successfully applied in environmental settings.Β They can be regarded as semiparametric models, in the sense that they do not depend heavily on the underlying process.Β A major concern is that they rely on limiting approximations (the GEV and GPD) that may fit data at observed levels satisfactorily but extrapolate poorly to unobserved levels.Β Such models provide a simple and direct empirical approach to modelling extremes of the underlying phenomenon but it may be a struggle to incorporate physical constraints or other background knowledge into them.Β
The paper under discussion can be viewed as a critique of a particular type (a) approach, namely metastatistical extreme-value (MEV)Β modelling, from the viewpoint of a classical type (b) approach, namely the fitting of GEV and GPD models to block maxima and threshold exceedances.Β There are two main criticisms:
- that papers proposing MEV have done so by application to and illustration on `real dataβ, in which the true data-generating mechanism is unknown, which implies that it is impossible to compare the behaviour of different approaches under ideal conditions (when the target of inference is known);
- that in any case the comparisons are incorrect, because of confusion over the target of inference (see Figure 7).Β Here the point is more subtle, but it is summarised in equation (16) of the paper.Β The point here is that if one is estimating a quantile function $Q_\theta(p)=F^{-1}(p;\theta)$ that depends on an unknown parameter $\theta$ and one will estimate $\theta$ from a single sample using an estimator $\hat\theta$, then the estimator of $Q_\theta(p)$ is $Q_{\hat\theta}(p)$, whose properties should be assessed over repeated sampling using independent replicates $Q_{\hat\theta_1}(p), \ldots, Q_{\hat\theta_S}(p)$ based on $S$ samples leading to estimates $\hat\theta_1,\ldots, \hat\theta_S$. Β The average of these estimates would be $S^{-1} \sum_{s=1}^S Q_{\hat\theta_s}(p)$, i.e., the right-hand side of (16), rather than $Q_{\bar \hat\theta}(p)$ (the left-hand side of (16)), where $\bar\hat\theta$ is the average of the parameter estimates for the $S$ samples.Β The paper under discussion illustrates the difference via the left- and right-hand panels of Figure 7.Β (Though the discussion at lines 532-535 leaves it unclear how the `median GEV/Gumbelβ curves are computed β the median for each $p$, giving a result that would not corresponding to any single quantile function, or what?Β And if the median, why not the mean?)
Both of these criticisms seem to me to be correct, and they should in my view embarrass the reviewers of the original MEV papers and the journals that published them.
I found the paper to be quite poorly written, to the point of unclarity in numerous places, including lines 405 (`the spreader β¦β?), 429 (` β¦, or better, β¦β?) or 524 (what is a predictive quantile function of a predictive quantile function?), and with many minor errors.Β Examples of the latter are that (i) the Beirlant et al book cited at line 18 was published in 2004, not 2006, and (ii) stating on line 476 that the distribution of an order statistic is beta is incorrect β the beta distribution represents variables on a finite interval, and clearly this does not apply to order statistics from, say, a Gaussian sample (did the authors mean that the distribution of an order statistic can be represented _using_ that of a beta random variable?), (iii) equation (17), the left-hand side of which is a function of $z$, while the right-hand side is a number (as the expectation of $Z$ is a constant), and (iv) at line 461, where results from a simulation study are `eventually used to build confidence intervalsβ β but in a simulation study the truth is known, so confidence intervals are not needed β as a confidence interval is based on a single sample, we have to guess that the authors mean that their $S$ return level estimates will be used to compute quantiles of a distribution.Β The paper is full of inaccuracies of this sort, so the reader is continually wondering `is that correct?β and concluding `not quiteβ; this does not give confidence in the main results. Β It is the role of the authors to produce a well-crafted article, not that of a reviewer, so I will not give more examples (it would take many pages to list them all), but generally I found the writing to be unclear, long-winded, and in need of a careful review by a native English-speaker (see, e.g., line 514).Β Reducing the paper radically by revising and trimming the text throughout would improve it.Β I would also target much of the general text in the earlier sections for cuts, since it is mostly not germane to the criticism of the MEV work. Β A 15-page paper in the current format would make the main points more clearly and should be more readable. Β
Citation: https://doi.org/10.5194/hess-2023-234-RC1 - AC1: 'Reply on RC1', Francesco Serinaldi, 14 Jun 2024
-
RC2: 'Comment on hess-2023-234', Francesco Marra, 20 May 2024
- AC2: 'Reply on RC2', Francesco Serinaldi, 14 Jun 2024
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
793 | 196 | 44 | 1,033 | 25 | 24 |
- HTML: 793
- PDF: 196
- XML: 44
- Total: 1,033
- BibTeX: 25
- EndNote: 24
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1