Extreme Value Statistics of Scalable Data Exemplified by Neutron Porosities in Deep Boreholes

The manuscripts reports on a statistical analysis of neutron porosity data using the framework developed by the authors. This framework models the porosity increments as the product of a truncated fractal Brownian motion with lag-dependent variance and a random variable, which here is modeled either by an alpha-stable or lognormal random variable. The variogram of the fractal Brownian motion is modeled as a truncated power variogram. Sections 3-5 are concerned with the estimation of the parameters of the increment models and the determination of sample structure functions. Sections 6 and 7 provide an analysis of the frequency distribution of peak over threshold of the porosity increments and their structure functions. The paper provides an interesting statistical analysis that sheds light on spatial porosity patterns, which may give insight into the spatial distribution of hydraulic conductivity.

its generalization is not immediate and our work is not aimed at achieving it for extremes.Future developments can benefit from our findings to study the theoretical grounds underpinning the scaling behavior of Extreme Values of environmental variables of the type we analyze.We clarify this point in our revised Conclusions.
I have no doubt that this manuscript contains enough interesting material to be published in HESS.However, I found it hard to read and understand.This is partly because I come from the statistical and nonlinear physics community, and I am not familiar with the terminology employed in hydrology and earth sciences.But it is so also because the manuscript needs an extensive revision.While the title focuses only on extreme value statistics, the manuscript actually pursues three different goals: -First, to show that the neutron porosity data possess indeed the properties of scale-mixtures of tfBm or tfGn.-Second, to provide further support for the authors' unified theoretical framework which captures all common manifestations of scale-dependent statistics without having to associate this behavior with multifractals.-Third, the goal acknowledged in the title of the paper, namely to explore the behavior of POTs in this kind of samples.On the other hand, the organization of the manuscript does not contribute much to its readability.In order to support these criticisms, and guide the authors in the necessary revision of their manuscript, I will comment on the contents of the different sections.
The Reviewer has summarized with clarity the key points of the manuscript.We follow his/her advice and have reorganized the manuscript to enhance readability while maintaining it a self-consistent contribution.We also modified the title, to reflect the broad objectives of the work.Title: It refers only to one aspect of the paper contents (extreme value statistics).It should be made more general, to be representative of the actual contents.
We have modified the title of the manuscript to de-emphasize the aspects associated purely with extreme values and make it in line with the overall content of the work.It now reads: "Scalable statistics of correlated random data and extremes applied to deep borehole porosities."Abstract: It does not correspond to what it is expected from the abstract of a scientific paper (namely, a concise and complete summary of the research reported in the manuscript).It should be written again, to provide the following information: the objectives of the research, the procedure followed (what was done), the results obtained, and the conclusions reached.
We will rephrase and restructure the Abstract in the revised manuscript to follow the suggestion of the Reviewer.
Sec. 1. Introduction: I recommend splitting the first paragraph into three different paragraphs, on (i) extreme value statistics, (ii) the key question of scale dependence, and (iii) the interest of neutron porosity data (for clarity).
We thank the Reviewer for this suggestion.Given our revised focus, we will restructure this part by starting with (a) the key question of scale dependence; and then progressing to (b) extreme values, and finally (c) the Neutron Porosity data.
The paragraph in p. 11640 about research on spatial correlations between large values of transmissivity in subsurface hydrology, and related issues, is distracting.I found this discussion somehow unrelated to the subject of the present paper.Please, make an effort to link this information to the research reported.This part is supposed to frame our research in the broader context of previous work related to extreme values of hydrogeological attributes, such as transmissivity, in the context of groundwater hydrology.We will rephrase this part to blend it appropriately with the other segments of the text.
The description provided in the last paragraph of the Introduction, starting in p. 11642, could be linked to the actual organization of the paper, now missing.E.g. ". . .tendency of increments to have symmetric, non-Gaussian frequency distributions characterized by heavy tails that often decay with separation distance or lag, as shown in Sec.__; power-law scaling of sample structure functions (. ..), presented in Sec.__; etc." We will rephrase this part according to the Reviewer's suggestion.
Sec. 2. Source and frequency distributions of neutron porosity data: The authors explain that the data is part of a broader set, previously analyzed within a multifractal formalism by Dashtian et al. (2011).So a comparison of the results reported there and those of the present paper (based on the authors' novel method) would be desirable.Dashtian et al. (2011) analyzed the data by considering three multifractal methods based on spectral density, multifractal random walk and multifractal detrended fluctuation analysis.They concluded that the data exhibited multifractal characteristics.With reference to Neutron Porosity data, these authors observed long/range correlations, associated with values of the Hurst coefficients in the range 0.80 -1.0.They argue that multifractality in the data evidenced by their analysis is due to (a) the broad probability distribution of the data, and (b) the occurrence of long range correlations.They argue further that the latter is associated with depositions occurring over time, while they state that multifractality is consistent with the type of between layers variability observed for a property.
We show that the Neutron Porosity data can be interpreted without resorting to multifractal concepts.Like multifractal analyses such as those pursued by Dashtian et al. (2011), we too base our work in part on sample structure functions of order q of absolute increments.Both approaches provide estimates of a Hurst coefficient, H, and account for nonlinear dependence of the power-law scaling exponent, , on q.We however do so without invoking multifractality.
Values of H we estimate in the first scaling regime (identified in our manuscript) are consistent with those given by Dashtian et al. (2011); in both cases, these values indicate persistence and long range correlations.Persistence of spatially distributed values of a hydrogeological property are typical of a single hydrogeological unit (or layer).We associate this with intra-layer variability.Our estimates of H at large lags are low, indicating antipersistence of data correlated over long distances; this is typical of alternating layers formed by generally diverse geomaterials.Our interpretation is thus consistent with depositional events similar to those invoked by Dashtian et al. (2011), without implying multifractality.
As to nonlinear variation of  with q, typically interpreted in the literature as a symptom of multifractality, we have shown theoretically and computationally elsewhere (please see our reference list) that it is typical of sampling a truncated mono-fractal.Not only is this latter interpretation of nonlinear scaling simpler than the multifractal interpretation, but it also explains phenomena (such as breakdown in power-law scaling of structure functions at small and large lags, and extended self-similarity) which multifractal theory does not explain.
We will emphasize these points in the revised manuscript to enhance the impact of our results.
The paragraph in line 17 of p. 11643, on ML fits, could be made a separate paragraph, for readability.
We agree and will do in the revised manuscript.
Is there a physical basis for the different models fitted (gaussian, α-stable subgaussian, and NLN subgaussian)?Please, discuss.
The choice of the type of subordinator model does not have a clear physical basis in the literature.Rather, the choice is observation-based and is chiefly grounded on the ability of a given model to interpret key features displayed by the empirical densities of the data.
Lé vy-(or -) stable probability distributions are frequently employed in the literature due to their ability to interpret heavy tails displayed by empirical distributions of data.While convenient in this sense, this model has the drawback of being associated with densities with diverging moments of order larger than , notably the variance.Its use in the literature is documented in the reference list we have included in the manuscript.
The use of a lognormal subordinator provides us with the ability to represent tailing behaviors to a certain extent and has the additional benefit of being associated with densities with finite moments, most notably the variance.
Both models have been used in diverse contexts, including the analysis of environmental, financial, and hydrological data.
Our theory is compatible with diverse types of subordinators and in this work we compare on rigorous grounds, through ML model calibration, (a) the ability of both types of subordinators to capture the critical features displayed by our data and (b) their impact on the identification of the parameters of the underlying Gaussian field.
We will add a short discussion about these points in the revised manuscript.
Sec. 3. Frequency distributions of neutron porosity increments: The same question applies to the paragraph in line 13 of p. 11644.Is there a physical argument for fitting α-stable and NLN models to the empirical frequency distributions of increments?
See our answer to the comment above.
Sec. 4. Statistical scaling of neutron porosity increments: -In line 17 of p. 11645 the authors report 'a break in power-law regime'; I would find the expression 'a crossover between two different power-law regimes' more appropriate.The authors interpret it as one regime representing variability within (w) sedimentary layers, and the other variability between (b) them.What are those layers?Is this interpretation based on some other information available, which for instance could match the characteristic vertical separation at which the crossover occurs?Results of the data analysis are followed here by interpretation.I think that taking the interpretation of the results to a (new) Discussion section, just before the Conclusions, would benefit the manuscript.
With reference to the first point raised by the Reviewer, we will employ the terminology suggested in our revised manuscript and describe the feature highlighted by the Sample Structure Functions and "cross-over between two diverse power-law regimes".
The occurrence of layers of diverse geomaterials is related to depositional processes which develop over time and take place in any sedimentary basin of the kind we analyze here.The occurrence of layering in the fields we analyzed is also documented by Dashtian et al. (2011) on the basis of the complete set of well logs.When a geological system is conceptualized in a probabilistic framework, the spatial distribution of attributes (such as porosity) is linked to physically occurring layers of diverse materials through the concepts of correlation scale and persistence or antipersistence.These define layering in a statistical sense.This concept is commonly employed in in the field of stochastic groundwater hydrology and it is precisely what was we are considering in our work.We will clarify this point in the revised manuscript.
With reference to the way data and results and presented and discussed, we understand the point of the Reviewer and are grateful for this suggestion.We also think that this aspect is also associated to personal style of conveying information and we prefer to comment and discuss results in the order we present them.We studied the order of presentation so that each result is sequential to the previous one to ensure clarity of exposition.We are employing in this work a style which is consistent with our previous works and would prefer to keep it this way.We think that, after implementation of all other suggestions provided by the Reviewer, the readability of the manuscript will be enhanced so that this type of re-structuring will no longer be required.This is also consistent with the comments of Reviewer No. 2, who finds the structure of the manuscript to be sound.In any case, we will leave the final decision on this particular matter to the Editor.
-The contents of lines 21-26 of p. 11645, about a similar dual-scaling phenomenon reported by Siena et al. (2014), would also fit better in a Discussion section.
Please, see our reply to the comment above.
-Researchers working on multifractality in the statistics of increments (e.g. in fullydeveloped turbulence) often make use of normalized p-root structure functions $C_pˆN = C_p/R_pˆG$, where $C_p = (S_p)ˆ{1/p}$ and the normalizing factor $R_pˆG = (S_pˆG/S_2ˆG)ˆ {1/p}$ is the ratio of structure functions for a Gaussian distribution, which depends only on $p$.This is useful to unveil deviations from monofractality and Gaussian statistics.While at short lags $C_pˆN$ follows a power law of the lag, with exponent $ξ _p/p$ (and $ξ_p$ depends nonlinearly on $p$ if the signal is multifractal), at large lags $C_pˆN$ collapses onto a single curve for all $p$ as expected for Gaussian statistics.This could be an alternative way of representing the results in Figs.7 and 16.
We thank the Reviewer for this suggestion.As noted above, our approach is not aimed at unveiling multifractality.We start from a different premise and consider what is typically identified in the literature as multifractal behavior to be tied, instead, to sampling of a truncated monofractal random field.
As we state in our manuscript, the multifractal framework is not capable of explaining theoretically the collection of all scaling manifestations which have been observed in natural systems.Our interpretation is simpler than the one based on multifractal concepts and is consistent with all observed scaling features of the type we document in this manuscript and have documented in previous works.As such, we do not see the need to investigate apparent (or possible) deviations from multifractality, since we work in a different and more general context.
-The authors find Hurst scaling exponents $H_w \ gt 1/ \ hat{ \ alpha}$ and $H_b \ ll 1/ \ hat{ \ alpha}$, and they associate these results respectively to persistent and antipersistent variability.It would be helpful to explain why, or to provide a reference.This material could also be moved to the new Discussion section.
Please, see our answers above with regard to the concept of persistency and antipersistency.We will add a paragraph in our revised manuscript to clarify the concept.
-Regarding ESS (extended self-similarity), straight line fits have indeed high confidence values R > 0.9; however, to be fair, it should be pointed out also that there is less than one decade of scaling in most cases.
We do agree and this is related to the typical range of spatial scales within which data are available in geosciences.We will add this clarification in our revision.
In lines 14-17 of p. 11646, and lines 10-15 of p. 11648, the authors argue that finding the distribution of increments to satisfy ESS is akin to verifying that the data conforms to the new theoretical scaling framework proposed by them.This is true, but it should not be mistaken with the concept that multifractality -as a framework that would also explain the present results should be ruled out.
Though someone may, one day, explain ESS in light of multifractal theory, that has not happened so far.We, on the other hand, were able to explain ESS fully in light of our theory.We will clarify this point in our revised work.
Is there any statistical analysis that could be applied to the porosity data and would allow a clear-cut discrimination of the two theoretical frameworks?And, also interestingly, in which respect the authors' theory provides a better description or clearer physical insight of the statistical properties of the porosity data analyzed here?The interested reader will find this discussion illuminating.Of course, it should be placed in a section devoted to Discussion, not mixed with the results of the data analysis.
The answer to this comment ties to our answers to the previous points raised by the Reviewer.In essence, we think that, given a choice between two competing theoretical frameworks, one that explains all observed phenomena and another that explains only some of these phenomena, one should prefer the first.Furthermore, given a choice between two theoretical models of observed behavior, one relatively simple (closed form and expressed in terms of only a few parameters, such as the one we have developed) and one more complex (requiring an infinite number of parameters, or empirical simplifications, such as the multifractal formalism in our case), the first (more parsimonious) model should be preferred.We will convey this view in the revised manuscript.

Sec. 5. Estimation of variogram parameters: This section quickly becomes very technical.
The first half, devoted to theoretical concepts, could complement Appendices A and B and make together a new section of the manuscript (Theoretical framework) that could go right after the Introduction.In this way the reader would have immediate access to the concepts and terminology employed in the data analysis.
We agree and will move the theoretical part in a Section which will be specifically devoted to the theoretical elements.Alternatively, and depending on the need to preserve readability of the revised version, we will relegate it to a new Appendix.This will also accommodate a corresponding request from Reviewer No. 2.
The results for Lé vy-stable subordinators (line 22 p. 11649 to line 14 p. 11650), summarized in Fig. 10, and the results for log-normal subordinators (line 15 to 27 p.11650) show the remarkable fitting power of the authors' theoretical approach.I did not find reference to previous works.Is this methodology applied here for the first time?This is the first time we compare the impact of diverse types of subordinators on the data analyses.In previous works we relied solely on a Lé vy-stable subordinator.We will clarify this point in the revised manuscript.
The material up to here would already make an interesting and consistent paper, in which a complex description of scale-dependent statistics is successfully applied in its full power to actual neutron porosity data in several deposition environments.The subject conveyed by the current title of the manuscript, the analysis of extreme value statistics, starts here.
We are grateful to the Reviewer for this very positive appraisal of the results of our work.We have changed the title of the manuscript to reflect this point.
Sec. 6. Frequency distributions of peaks over thresholds: I think that Fig. 13 is not essential.It could be removed.Alternatively it could be complemented with a figure of autocorrelations between POTs of neutron porosity increments, which is needed to justify the use of generalized Pareto distributions (GPDs) to represent frequency distributions of POTs.
We partially agree.We will eliminate this figure in the revised manuscript and add a comment in the body of the text.
The sentence about p > 0.05 (lines 24-26 p. 11651) should be rewritten in a less technical way -for clarity.
We will provide a clear explanation of the way p-values obtained should be interpreted in the revised manuscript.Sec. 7. Statistical scaling of POTs: POTs of absolute increments exhibit similar scaling behavior and Hurst exponents than unfiltered porosity increments, and also verify ESS.These results are summarized in Figs.16 and 17.Being supposed to be the core of the paper, I found surprising that they are not discussed in depth.I encourage the authors to provide a more elaborate discussion (in the new Discussion section) of the results on POTs of absolute increments.A question of interest would be to what extent are POTs of increments (for signals exhibiting the kind of scale-dependent statistics analyzed in the present paper) expected to follow the same statistical trends than the original increments.I missed also a discussion of the physical meaning of GPD shape and scale parameters.
We will provide a clear explanation of the GPD shape and scale parameters in the revised manuscript.This will also accommodate a corresponding request from Reviewer No. 2.
To the best of our knowledge, there is no theoretical reason why POTs of increments should be expected to follow the same scaling behavior of the population from which they are extracted.This is the first time that this behavior is uncovered and documented.Future developments can benefit from our findings to study the theoretical grounds upon which these analogies might rest.
Sec. 8: Conclusions I found the conclusions concise and well written.Several things that are said here can be found almost literally also in other sections of the manuscript.Maybe they could be rephrased there or removed.
Please, see our comments above about this particular point, involving restructuring of the manuscript.We will rephrase our conclusions trying to avoid overlap with the body of the text.
I hope that the authors will find the previous comments and suggestions worth of consideration.They are intended to improve the clarity and readability of the original manuscript, thereby improving the impact of the research reported.I will be glad to review the manuscript again after revision.
We close by thanking again the Reviewer and we look forward to the possibility of future interactions with him/her in the context of our works.The paper presents a scaling analysis aimed at investigating the behavior of extensive sets of neutron porosity data collected in the field.Section 2 describes the dataset, consisting of data collected at six different wells located within three different geological environments, and presents their frequency distribution.Section 3 analyzes the frequency distribution of data increments as a function of lag distance; these are seen to follow Levy stable or normallognormal distributions at smaller lags, while becoming Gaussian at larger lags.Section 4 discusses the scaling behavior of sample structure functions of generalized order, showing a dual behavior of the scaling exponent, that is distinctly larger for smaller lags than for larger lags.Corresponding Hurst coefficients, estimated via the method of moments, are of order 0.7 and 0.1 respectively, showing persistence is associated with intra-layer variability, antipersistence with inter-layer variability.The relationship between structure functions of different order satisfies Extended Self Similarity (ESS) and provides support to the unified theoretical framework, proposed by the authors in previous papers, which views data to be consistent with sub-Gaussian random fields subordinated to tfBm/tfGn (truncated fractional Brownian motion/ truncated fractional Gaussian noise).Section 5 presents the general scaling theory and the derivation of variogram parameters for the specific data set.Further elements of the theoretical framework are reported in Appendices A and B. Sections 6 and 7 of the paper deal with statistics of peaks over threshold (POTs), defined as such when exceeding the 95% quantile.Their frequency distribution follow a generalized Pareto distribution.The structure functions of their increments exhibit behavior similar to unfiltered field.The paper is fully within the scope of HESS, and of interest to its readership.My concern is that the title reflects only partially the contents of the paper.In fact, the application of the methodology to a large dataset constitutes an important contribution in itself.The extension to extreme values is an entirely new topic, yet it covers only two sections out of seven.I suggest to rephrase the title to include all material covered in the manuscript.The paper structure, subdivision into sections, and language are sound; the paper cannot be shortened significantly.The reference section is broad.
The Reviewer has summarized with clarity the key points of the manuscript and we thank him/her for his/her very positive appreciation of our work.Also following the comments of Reviewer No. 1, we will modify the tile of the manuscript to de-emphasize the aspects associated purely with extreme values and make it in line with the overall content of the work.It now reads: "Scalable statistics of correlated random data and extremes applied to deep borehole porosities." 1.The tendency of increments to follow a Levy stable or NLN distributions at smaller lags, while becoming Gaussian at larger lags, is common to other applications: could they be compared?
We can provide a qualitative comparison amongst all cases we have examined and/or are presented in the literature, in the sense that this pattern is common to a variety of data we have analyzed.We are not convinced that a quantitative comparison is appropriate as the specific results, in terms of parameter variability with lag can be data dependent.We will include appropriate reference to support the qualitative comparison amongst a variety of available cases presented in the context of the current literature.These are not limited to Earth and environmental sciences and include applications to financial, biological and other types of data.
2. Well 6 exhibits much larger variability than other wells (figure 1); correspondingly, its statistics are different.Is there a geological explanation?
Well 6 is associated with the Tabnak formation, which is the richest in terms of carbonate content amongst the three types of depositional environments we consider.Heterogeneity of carbonate rocks can be stronger that that displayed by sandstone-based rocks and we clarify this point in the revised manuscript.
3. Lambda_u (correlation length associated with support scale) is taken to be zero in section 5. Comparison of support scale with other length scales would corroborate this assumption.
We think the Reviewer is referring to the lower, not upper, cutoff scale, i.e.,  l .Theoretically, the value of  l should be a fraction of the measurement scale.In our case, the measurement scale is smaller than the 0.15 m lag, which represents data resolution (in Well 6 data resolution is 0.07 m).When compared to the overall length scale spanned by each borehole (which is of the order of 10 3 m),  l can therefore be considered as negligible.As we noted in our previous work, which we reference in the manuscript), this assumption also enables us to eliminate instabilities in the Maximum Likelihood parameter estimation associated with small parameter values and does not influence the generality of the methodological approach.We add this clarification in the revised manuscript.
4. The structure functions associated with POTs behave similarly to the unfiltered field.
An interesting exception are the (extremely low, almost zero) values of the Hurst coefficient associated with (large lag) intra-layer variability.Could the authors provide a physical interpretation of this effect?
As we state in our responses to Reviewer No. 1, our estimates of H at large lags are low, indicating antipersistence of data correlated over long distances; this is typical of alternating layers formed by generally diverse geomaterials.Our interpretation is consistent with depositional events similar to those invoked by Dashtian et al. (2011), which we reference in the manuscript.
We are not convinced that there is a clear and definite interpretation for the very low estimated Hurst coefficients in the case of the POTs.These findings might be interpreted as indicative of a degree of antipersistence (tendency of low and high values to alternate rapidly, in a rough rather than a smooth manner, across layers) between POTs of a property such as neutron porosity in two diverse layers which is even higher than that associated with unfiltered values of the property.We feel that at this stage of the research this interpretation is mostly speculative and would prefer not to elaborate further on this aspect.5. Generalized Pareto distributions seem to describe the behavior of POTs.A short appendix describing their behavior and associated parameters would help in following the last section of the paper about POTs.
We will prepare such an Appendix in the revised manuscript.This will also accommodate a corresponding request from Reviewer No. 1. 6.I suggest to review the presentation of the background material (underlying theoretical framework), that is now split between Section 5 and Appendices A and B.
We are prepared to do so in our revised manuscript.This will also accommodate a corresponding request from Reviewer No. 1.
We appreciate the efforts you and the Reviewers have invested in our manuscript.Following is an itemized list of the comments of Reviewer No. 2 together with our response to these.Comments are reported in blue and our responses in black font.