Scalable statistics of correlated random variables and extremes applied to deep borehole porosities

. We analyze scale-dependent statistics of correlated random hydrogeological variables and their extremes using neutron porosity data from six deep boreholes, in three diverse depositional environments, as example. We show that key statistics of porosity increments behave and scale in manners typical of many earth and environmental (as well as other) variables. These scaling behaviors include a tendency of increments to have symmetric, non-Gaussian frequency distributions characterized by heavy tails that decay with separation distance or lag; power-law scaling of sample structure functions (statistical moments of absolute increments) in midranges of lags; linear relationships between log structure functions of successive orders at all lags, known as extended self-similarity or ESS; and nonlinear scaling of structure function power-law exponents with function order, a phenomenon commonly attributed in the literature to multi-fractals. Elsewhere we proposed, explored and demonstrated a new method of geostatistical inference that captures all of these phenomena within a uniﬁed theoretical framework. The framework views data as samples from random ﬁelds


Introduction
Hydrogeologic variables such as log permeability are known to vary with scales of measurement, observation, domain of investigation, spatial correlation and resolution (Neuman and Di Federico, 2003). The statistics of these and diverse environmental (as well as earth, financial, astrophysical, biological and many other) variables are likewise known to vary with scale. This is especially true of statistics characterizing spatial and/or temporal increments of these variables. Symptoms of such statistical scaling include irregular spatial variability, persistence or antipersistence of increments (large and small values tending to either persist or alternate rapidly in space and/or time); tendency of increments to have symmetric, non-Gaussian frequency distributions characterized by heavy tails that often decay with separation distance or lag; power-law scaling of sample structure functions (statistical moments of absolute increments) in midranges of lags, with breakdown in power-law scaling at small and/or large lags; linear relationships between log structure functions of successive orders at all lags, also known as extended selfsimilarity or ESS; and nonlinear scaling of structure function power-law exponents with function order. The traditional interpretation of these widely documented behaviors has been based on the concept of multifractals. This, however, does not Published by Copernicus Publications on behalf of the European Geosciences Union. explain observed breakdown in power-law scaling at small and large lags or extended power-law scaling (Neuman et al., 2013, and references therein).
Of special concern are the statistics of extremes, which have received much attention among hydrologists (Katz et al., 2002) and others concerned with a wide range of phenomena including snow avalanches on mountain slopes (Ancey, 2012); rupture events associated with the propagation of cracks or sliding along faults in brittle materials including rock failure, landslides and earthquakes (Amitrano, 2012;Lei, 2012;Main and Naylor, 2012) as well as volcanic eruptions, landslides, wildfires and floods (Sachs et al., 2012;Schoenberg and Patel, 2012;Süveges and Davison, 2012); demographic and financial crises (Akaev et al., 2012;Janczura and Weron, 2012); neuronal avalanches and coherence potentials in the mammalian cerebral cortex (de Arcangelis, 2012;Plenz, 2012); citations of scientific papers (Golosovsky and Solomon, 2012); and distributions of city sizes (Pisarenko and Sornette, 2012). Extreme values cluster around heavy tails of data frequency distributions which are often modeled as stretched exponential, lognormal or power functions. There is growing evidence that these frequency distributions, as well as other geospatial and/or temporal statistics of many data, vary with scale. A key related question concerns the scale dependence of frequency distributions (typically generalized extreme value or GEV in the case of block extrema and generalized Pareto distribution or GPD in the case of peaks over thresholds or POTs, e.g., Embrechts et al., 1997) and statistics of extremes at the tails of the original data distributions (e.g., Riva et al., 2013a).
In this paper we explore the statistical scaling of variables and, for the first time, corresponding POTs using as an example neutron porosity data and their POTs from six deep boreholes in three different depositional environments. These data are of interest because, as we show below, (a) they possess statistics that scale in manners typical of many earth, environmental and other variables and (b) reveal a remarkable cross-over from one scaling regime to another at certain separation distances or lags. The phenomena we uncover vis-à-vis neutron porosity data, and corresponding extremes, are of critical importance for the analysis of fluid flow and solute as well as particulate transport in complex hydrogeologic environments. This is so because spatial variability of porosity controls fluid flow velocity distributions in geologic media and has an impact on solute and particulate concentration dynamics. Extreme values of porosity are particularly relevant to depositional processes responsible for the development of preferential flow paths through heterogeneous porous and fractured media. Neutron porosity logs are widely used to characterize stratigraphic sequences and the geostatistical description of geological structures of lithotypes in multilayer systems of aquifers and aquitards (e.g., Barrash and Reboulet, 2004;Tronicke and Holliger, 2005). Combined with laboratory-determined particle size distributions, porosity data may allow one to infer spatial distribu-tions (see review of Vuković and Soro, 1992) and covariances  of hydraulic conductivity.
Statistical scaling of hydrogeological data such as permeability or hydraulic conductivity has been studied amongst others by Painter (2001), Meerschaert et al. (2004), Kozubowski et al. (2006), Siena et al. (2012Siena et al. ( , 2014, Riva et al. (2013bRiva et al. ( , 2013c, and Guadagnini et al. (2012Guadagnini et al. ( , 2013Guadagnini et al. ( , 2014. Whereas research in the subsurface hydrology literature has not addressed specifically the distribution and statistical scaling of extreme incremental values, spatial correlations between values significantly in excess of the mean have been studied vis-à-vis variables such as transmissivity and their relevance to transport processes has been highlighted. Sanchez-Vila et al. (1996) conjectured that observed scale dependence of transmissivities estimated from largescale pumping tests could be related to strong connectivity between regions of elevated transmissivity, as opposed to spatial persistence of average or low transmissivity values. Spatial correlation of extreme conductivity values was examined for the first time by Gómez-Hernández and Wen (1998). In these authors' opinion the standard multi-Gaussian assumption was not consistent with observed short solute travel times resulting from fast spatially connected pathways. Connectivity of high permeability zones thus became an important concept underlying some modern interpretations of effective conductivity and solute travel time (see for example Meier et al., 1998;Wen and Gómez-Hernández, 1998;Western et al., 2001;Fogg et al., 2000;Zinn and Harvey, 2003;Carrera, 2005, 2006;Nield, 2008, and references therein). The above ideas have motivated the development of multipoint geostatistical methods of analysis such as those described in a recent special issue of the journal Mathematical Geosciences on 20 years of multipoint statistics (e.g., Renard and Mariethoz, 2014;Mariethoz and Renard, 2014, and references therein).
Notably, attempts by hydrologists to investigate the manner in which statistics of extremes vary with scale have centered almost exclusively on peak rainfall intensities and stream flows. Whereas some have found statistical measures of rainfall extremes to exhibit linear (sometimes termed simple) scaling (Menabde et al., 1999;Garcia-Bartual and Schneider, 2001;De Michele et al., 2001) under at least some conditions (Burlando and Rosso, 1996;Veneziano and Furcolo, 2002;Yu et al., 2004), most authors describe them by means of nonlinear (often called multiscaling) models (Burlando and Rosso, 1996;Veneziano and Furcolo, 2002;Castro et al., 2004;Langousis and Veneziano, 2007;Mohymont and Demarée, 2006). Statistical measures of peak stream flows were considered by Javelle et al. (1999), Menabde and Sivapalan (2001) and Rigon et al. (2011) to scale linearly. Work on the scaling of GEVs and/or GPDs associated with extreme rainfall and/or stream flow was reported amongst others by Nguyen et al. (1998), Menabde et al. (1999), Menabde and Sivapalan (2001), Willems (2000), Trefry et al. (2005), Veneziano et al. (2009) and Veneziano and Yoon (2013 The general tendency has been to interpret linear scaling as a manifestation of monofractal behavior analogous to that of fractional Brownian motion (fBm) or fractional Gaussian noise (fGn). Nonlinear scaling has commonly been attributed to multifractal behavior, a viewpoint espoused originally by Schertzer and Lovejoy (1987) and expanded on recently by Veneziano and Yoon (2013). Work by our group has demonstrated theoretically (Neuman, 2010(Neuman, , 2011Guadagnini and Neuman, 2011;Siena et al., 2012;Neuman et al., 2013), computationally Neuman et al., 2013) and on the basis of varied pedological, hydrological and hydrogeological data (Siena et al., 2012(Siena et al., , 2014Riva et al., 2013b, c;Guadagnini et al., 2012Guadagnini et al., , 2013Guadagnini et al., , 2014) that statistical scaling behaviors of the kind traditionally attributed to multifractals can be interpreted more simply and consistently by viewing the data as samples from stationary sub-Gaussian random fields subordinated to truncated fBm (tfBm) or fGn (tfGn). Such sub-Gaussian fields are scale mixtures of stationary Gaussian fields with random variances (Andrews and Mallows, 1974;West, 1987) that we model as being lognormal or Lévy-stable (Samorodnitsky and Taqqu, 1994). In this sense our approach bears partial relationship to cascades of Gaussian-scale mixtures that Ebtehaj and Foufoula-Georgiou (2010) use to reproduce coherent structures and extremes of precipitation reflectivity images in the wavelet domain.
Our analysis suggests that, quantitatively, the statistics of neutron porosity increments and their POTs at intra-layer vertical separation scales (or lags) differ from those at interlayer scales. Qualitatively, however, the statistics of porosity increments at each of these two scales behave in a manner that the literature would typically associate with multifractals. This behavior includes all statistical scaling symptoms described above. Our alternative interpretation of the data allows us to obtain maximum likelihood (ML) estimates of all parameters characterizing the underlying truncated sub-Gaussian fields at both intra-and inter-layer scales. Most importantly, we offer what appears to be the first data-driven exploration (following a synthetic study of outliers by Riva et al., 2013a) of how statistics of POTs associated with such families of sub-Gaussian fields vary with scale.

Source of neutron porosity data
As stated in Sect. 1, we illustrate and explore our approach on neutron porosity data from six deep vertical boreholes in three different depositional environments. These are part of a broader set of geophysical logs from the same boreholes, previously described and analyzed within a multifractal framework by Dashtian et al. (2011), provided to us courtesy of Professor Muhammad Sahimi, University of Southern California. Three of the wells (numbered here 1, 2 and 3) are drilled in the Maroon field within which a gas drive is used to produce oil and natural gas, Wells 4 and 5 in the Ahwaz oil field, and Well 6 in the Tabnak gas field. The Maroon and Ahwaz fields in southwestern Iran, and the Tabnak field in southern Iran, have distinct geologies. Whereas carbonate rock content is highest in the Tabnak and lowest in the Maroon and Ahwaz fields, the opposite is true about sandstone content. Though we do not have information about the relative geographic locations of the six wells, we note that Dashtian et al. (2011) analyzed data from each well independently of those from the remaining five wells. We do the same on the assumption that distances between the wells are sufficiently large to allow treating data from each well as being statistically independent of the rest.

Theoretical basis and method of inference
Summary information about the available neutron porosity (P ) data is listed in Table 1. As the sampling interval between available values in Well 6 is half of that in Wells 1-5, we disregard every other measurement in analyzing these data, leaving a total of 4267 values. Most of our analysis concerns increments in recorded P values at various separation distances or lags, s, in each well. Lags are taken to be integer multiples, s = s n × z, of the vertical spacing, z = 0.1524 m, between recorded values. As stated in Sec. 1, we view the data as samples from stationary sub-Gaussian random fields subordinated to truncated fBm (tfBm) or fGn (tfGn). Sub-Gaussian random variables, defined in Appendix A following standard statistical terminology (e.g., Samorodnitsky and Taqqu, 1994), are scale mixtures of Gaussian variables with random variances. We consider two sub-Gaussian variables, one α-stable with Gaussian variances that are α/2-stable, and another normallognormal (NLN) variable with lognormal Gaussian variances. There is no physical basis for their choice, just as there usually is no such basis for working with the Gaussian distribution. Lévy-stable (or α-stable) probability distributions are frequently employed due to their ability to interpret heavy tails displayed by empirical distributions of data. While convenient in this sense, this model has the drawback of being associated with densities with diverging moments of order larger than α, notably the variance (e.g., Neuman et al., 2013, and references therein). The use of a lognormal subordinator provides us with the ability to represent tailing behaviors reasonably well with the additional benefit that associated densities possess finite moments of all orders. Regardless of this choice, our approach is compatible with diverse types of subordinators. Using ML we compare the ability of the above two subordinators to (a) capture critical distributional features of our data and (b) yield reliable parameters of the underlying sub-Gaussian random fields.
Statistical scaling of the data is analyzed in part on the basis of sample structure functions, S q N (s n ), of order q. Structure functions are moments of order q of absolute increments (e.g., Frisch, 1995). The corresponding sample moments are constructed with N (s n ) absolute increments at normalized (by z) lags s n : where P j (s n ) is the j th increment of P values separated by lag s n . The variable P is said exhibits power-law scal- n , where the power or scaling exponent, ξ (q), depends solely on the order q. The exponent is estimated through linear fits of log(S q N ) to log(s n ) within the range of lags where such linear behavior is indicated. We refer to this approach of assessing and quantifying power-law scaling as method of moments.
As shown by Neuman et al. (2013, and references therein), another way to assess the dependence of scaling exponents ξ(q) on q is through ESS or extended power-law scaling. ESS is an empirical approach originally introduced by Benzi et al. (1993aBenzi et al. ( , b, 1996 to widen the range of lags over which velocities in fully developed turbulence scale according to Eq. (1). The approach calls for plotting the S q+1 N vs. S q N for various q values and quantifying the resulting linear dependence between them (see Neuman et al., 2013, and references therein). In this work we apply both methods to available neutron porosity data.
To estimate parameters characterizing the distribution of the underlying (Gaussian) tfBm or tfGn, we consider the zero-mean tfBm G (x; λ l , λ u ) defined by Di Federico and Neuman (1997) as a Gaussian random function of space having variance variogram or semi-structure function of second order and integral autocorrelation scale where for m = l, u, A is a coefficient, H is a Hurst scaling exponent and s is lag. The tfBm variogram γ G (s; λ l , λ u ) is a weighted integral of variograms characterizing stationary Gaussian fields, or modes, having integral scales λ and variances σ 2 (λ) = Aλ 2H /2H , between lower and upper cutoff scales, λ l and λ u , respectively. Here we consider modes having Gaussian variograms in which case where (·, ·) is the incomplete gamma function. In the limits λ l → 0 and λ u → ∞, γ G (s; λ l , λ u ) tends to a power variogram (PV) being the gamma function. The stationary tfBm G (x; λ l , λ u ) thus tends to nonstationary fBm, G (x; 0, ∞), the stationary increments of which, G (x, x + s; 0, ∞), form fGn. It follows that when λ u < ∞, γ G (s; λ l , λ u ) is a truncated power variogram (TPV) characterizing a (stationary) truncated version of fBm (tfBm).
We treat neutron porosity increments in each borehole as a sample from a zero-mean random field, Y (x, x + s; λ l , λ u ), subordinated to tfBm according to (see Appendix A) where s ≥ 0 is lag and the subordinator, W , is a non-negative random variable independent of G (and of G ). above, we allow W to be Lévy-stable or lognormal. Appendix A explains that, in the first case, W is α/2-stable, totally skewed to the right of zero (hence non-negative) with scale parameter σ S = cos πα 4 2/α , unit skewness and zero shift. The corresponding univariate pdf (probability density function) of Y (x, x + s; λ l , λ u ) is symmetric α-stable with zero skewness and shift. The pdf possesses heavy, powerlaw tails. In the second case This renders W 1/2 ≡ 1 when α = 2 and its pdf increasingly skewed to the right as α diminishes. The corresponding univariate NLN pdf of Y (x, x + s; λ l , λ u ) possesses heavier tails than the exponential tails of the Gaussian to which NLN tends asymptotically as α increases toward 2. Whereas α-stable variables do not possess finite moments of order ≥ α, all moments of NLN variables are finite. Parameters of the variogram characterizing the underlying Gaussian field are estimated through ML model calibration, as detailed in Sect. 7 for the two types of subordinators we consider. Figure 1 shows how the neutron porosity data vary with depth in Wells 1, 4, 5 and 6. Frequency distributions of deviations, P = P −P a , from average values, P a , in Wells 1, 4 and 6 are plotted on arithmetic and semi-logarithmic scales in Fig. 2. The empirical frequency distributions exhibit sharp peaks, asymmetry and slight bimodality. Also shown in Fig. 2 are ML fits of a Gaussian and two sub-Gaussian pdfs to the empirical frequency distributions. Figure 1 shows that neutron porosity values in Well 6 exhibit greater variability than in other wells. This could be due to a larger carbonate content in formations penetrated by Well 6 than in those penetrated by other wells (see Sect. 2), rendering the former more heterogeneous than the rest. ML fits to Gaussian and α-stable pdfs is accomplished with a code developed by Nolan (2001) and to NLN using a code we have written in Matlab. The quality of these fits is variable; in the case of Well 1, the NLN model is seen to fit the empirical frequency distribution slightly better than do the other two models but, in the case of Well 6, the α-stable model is seen to be best and Gaussian model worst. Formal Kolmogorov-Smirnov (KS), χ 2 and Shapiro-Wilk tests conducted on some of the data tend to reject the Gaussian model at a significance level of 0.05.

Frequency distributions of neutron porosity increments
Rather than presenting results in terms of lag s we report them below in terms of normalized (by z) integer values, s n . Figure 3 shows how increments P (s n ) at three different normalized lags (s n = 1, 32, 1024) vary with sequential (integer) vertical position in Wells 1 (Maroon field), 4 (Ahwaz field) and 6 (Tabnak field). Frequency distributions of P (s n ) at the same three lags in Wells 1 and 4 are plotted on semi-logarithmic scale in Fig. 4. The empirical frequency distributions exhibit pronounced symmetry with sharp peaks and heavy tails, which decay toward Gaussian shapes as lags increase. At all lags, the empirical frequency distributions of increments are rep- Sample number Sample number Sample number Well 6 -s n = 1 Well 6 -s n = 32 Well 6 -s n = 1024    resented quite closely by α-stable and NLN models fitted to them by ML. Negative log likelihood (NLL) measures of best fit associated with these two models as well as values of the Kashyap (1982) information criterion, KIC, demonstrate that they fit the empirical frequency distributions equally well (not shown). The same is true for all increments in all other wells. Frequency distributions of P (s n ) plotted for two normalized lags in Well 6 ( Fig. 5) are likewise symmetric with sharp peaks and heavy tails which, however, do not decay with lag. Empirical frequency distributions of P (s n )   Figure 6. ML estimatesα andσ of stability and scale parameters, respectively, characterizing α-stable distribution models of increments P (s n ) of P in all wells versus normalized lag.
in Well 6 are represented equally well by α-stable and NLN models. Figure 6 shows how estimatesα andσ of stability and scale parameters, respectively, characterizing α-stable distribution models (see Appendix A) of neutron porosity increments in all wells vary with normalized lag. Estimatesα of the stability index, α, in Wells 1-3 (Maroon field) and 4-5 (Ahwaz field) exceed 1 and increase asymptotically toward 2 with increasing lag, confirming that the increments become Gaussian at large lags. In Well 6 (Tabnak field)α fluctuates around a value that exceeds 1 by a small amount. Estimatesσ of the scaling index σ , which measures the width of the α-stable distribution, first increase with lag and then stabilize in all wells. All these behaviors are consistent with sub-Gaussian random fields associated with α-stable subordinators; whether or not α does or does not grow with lag depends on how these fields are generated (see Riva et al., 2013c;Neuman et al., 2013). We do not show but note that parameters of NLN distribution models fitted to the increments also vary with lag in a way that renders them asymptotically Gaussian at large lags, with the exception of Well 6.

Statistical scaling of neutron porosity increments
Next we analyze the scaling behavior of sample structure functions, S q N (s n ), of order q defined in Eq. (1). Figure 7 shows how such structure functions of orders q = 0.5, 1.0 and 2.0 vary with s n in Wells 1 (Maroon) and 6 (Tabnak). Log-log regression lines fitted to the data separately at vertical distance scales s n < 10 and s n > 12 suggest, at relatively high levels of confidence (coefficients of determination, R 2 , Hydrol. Earth Syst. Sci., 19, 729-745  ranging from 0.98 to 0.99 at s n < 10 and from 0.89 to 0.99 at s n > 12), that S q N (s n ) varies as a power of s n in each of these two scale ranges. Power-law exponents are larger at small (s n < 10) than at large (s n > 12) lags. We thus have a crossover between two diverse power-law regimes at distance scales 1.5-1.8 m delineated in Fig. 7 by a dashed red line. We interpret the power-law scaling of S q N (s n ) with s n < 10 representing variability within, and that at s n > 12 variability between, sedimentary layers at each site. Similar dual powerlaw scaling behavior is exhibited by structure functions of increments from Wells 2-5 (not shown). The identification of layers of diverse geomaterials is related to depositional processes which take place over time in any sedimentary basin of the kind we deal with here. Dashtian et al. (2011) concluded that these formations are layered based on complete suites of well logs at each of the three sites. We note further that a similar dual-scaling phenomenon has recently been reported by Siena et al. (2014) vis-á-vis porosities and specific surface areas imaged using X-ray computer microtomography throughout a millimeter-scale block of Estaillades limestone, at a spatial resolution of 3.3 µm, as well as Lagrangian velocities computed by solving the Stokes equation in the sample pore space.
Following the most recent examples of  we use the method of moments described in Sect. 3 to obtain estimates,Ĥ w andĤ b , of Hurst scaling exponents, H w and H b , characterizing the within-and between-layers scaling behaviors of neutron porosity increments, respectively, in each well.Ĥ w andĤ b are set equal to the slopes, ξ w (q = 1) and ξ b (q = 1), of regression lines fitted to S 1 N (s n ) on log-log scale at s n < 10 and s n > 12, respectively. Values of these estimates are listed, for all six wells,  in Table 2. AsĤ w > 1/α andĤ b 1/α in all cases, we conclude that whereas intra-layer variability is persistent (large values tend to follow large values and small values tend to follow small values), inter-layer variability is strongly antipersistent (small and large values tend to alternate rapidly). The latter is likely a manifestation of strong variations in environments responsible for the deposition of alternating sedimentary layers.
As no theory other than ours (Siena et al., 2012;) is known to explain extended self-similarity (ESS) of variables that do not necessarily satisfy Burger's equation (Chakraborty et al., 2010), demonstrating that P (s n ) satisfy ESS is akin to verifying that these data conform to our theoretical scaling framework. That this is indeed the case becomes evident upon examining the high-confidence (R 2 = 0.91-0.99) straight-line relationships between log S q+1 N and log S q N , and corresponding power-law relationships between S q+1 N and S q N , at s n < 10 and s n > 12 in Fig. 8 for q = 1, 2 and 3 in Wells 1 (Maroon) and 6 (Tabnak). Similar ESS relationships hold (not shown) in Wells 2-5.
Well 1 s n < 10 Well 1 s n > 12 Well 6 s n < 10 Well 6 s n > 12 Figure 9. ξ w (q) and ξ b (q) evaluated as functions of q by the method of moments (M) and ESS in (a) Well 1 at s n < 10, (b) Well 1 at s n > 12, (c) Well 6 at s n < 10, and (d) Well 6 at s n > 12.
Our next step is to compute functional relationships between power exponents ξ w (q) and ξ b (q), and the order q, of structure functions that scale as power laws of lag. In the method of moments these powers are the slopes of regression lines fitted to log-log plots of S q N (s n ) versus s n , such as those depicted in Fig. 7. In the case of ESS we use ξ w (q = 1) and ξ b (q = 1), determined by the method of moments, as reference values for the sequential computation of ξ w (q) and ξ b (q) at q > 1 based on known power-law relationships between S q+1 N and S q N , such as those given in Fig. 8. Corresponding plots of ξ w (q) and ξ b (q) as functions of q, evaluated by the method of moments and ESS in Wells 1 and 6 at s n < 10 and s n > 12, are presented in Fig. 9. Results obtained by the two methods are, for the most part, very similar. With the exception of ξ w (q) at s n < 10 in Wells 1, 2, and 3 (Maroon field), in all cases (including those corresponding to Wells 2-5, which we do not show) ξ w (q) and ξ b (q) delineate convex functions that fall below straight lines having slopesĤ w andĤ b , respectively, which pass through the origin. Tradition has it that whereas such straight lines are characteristic of monofractal (self-affine, additive) random fields, nonlinear variations of power exponents such as those exhibited by ξ w (q) and ξ b (q) in Fig. 9 are symptomatic of (multiplicative) multifractals. Yet we have seen that the data in this paper conform to a statistical scaling theory in which the underlying random fields are subordinated to truncated versions of monofractal fBm or fGn. As we have previously demonstrated theoretically (Neuman, 2010(Neuman, , 2011Neuman et al., 2013) and computationally , nonlinear scaling of such data is nothing but a random artifact of sampling from similar fields.

Estimation of variogram parameters
We saw that our analysis supports treating the neutron porosity data from each well as a random sample from a stationary sub-Gaussian random field subordinated to tfBm or tfGn. Our previous ML fits of univariate α-stable and NLN pdf models to neutron porosity increments in each well have yielded estimates of all distributional parameters characterizing these models. We also found the data to exhibit different modes of scaling at s n < 10 and s n > 12 and obtained estimates of H for each of these two ranges of lags. All that remains to fully characterize the multivariate random fields, Y (x, x + s; λ l , λ u ), which we take to underlie the incremental data, is to estimate the parameters A, λ l and λ u (and, optionally, H ) of TPVs corresponding to s n < 10 and s n > 12. We do so next for each of the two subordinators we consider.
Assuming first that neutron porosity increments in each well are α-stable, one can estimate the scale parameter σ (s; λ l , λ u ) of their distribution at any lag, s, from the theoretical relationship (Samorodnitsky and Taqqu, 1994) σ (s; λ l , λ u ) = γ G (s; λ l , λ u ).
Here we employ this relationship separately for normalized lag ranges s n < 10 and s n > 12. We saw earlier that structure functions of neutron porosity data in both lag ranges, including second-order structure functions, can be closely represented in each well by power laws. In other words, the TPVs within these lag ranges are effectively PVs. We recall that this happens in the limits as λ l and λ u tend, respectively, to zero and infinity. We note further that λ l should be a fraction of the measurement scale. In our case, the measurement scale can be considered as smaller than the 0.15 m data resolution scale (in Well 6 data resolution is 0.07 m). When compared to the much larger length scale of each borehole (on the order of 10 3 m), λ l is negligibly small and can be disregarded. Accordingly, we set λ l = 0 and λ u to a sufficiently large number to ensure that the TPV γ G (s; λ l , λ u ) reduces, within both working lag ranges, to the PV γ (s) = Bs 2H . Then, in a manner analogous to that outlined most recently by Guadagnini et al. (2013Guadagnini et al. ( , 2014, we obtain ML estimatesÂ of A in two ways: once by adopting corresponding method-of-moment estimatesĤ w andĤ b from Table 2 and once by estimating the latter jointly with A. Both sets of estimates are obtained upon fitting the theoretical PV γ (s) = Bs 2H to sample scale parametersσ (s n ) such as those plotted versus s n in Fig. 7b. The fits are depicted graphically in Fig. 10 for Wells 1 and 6. The corresponding parameter estimates and 95 % confidence limits are listed, for all wells and both lag ranges, in Table 3. The two sets of estimates lie within each other's 95 % confidence intervals, implying that they are equally reliable.
Next we consider the case where neutron porosity increments in each well are NLN. Due to finiteness of all (statistical) moments associated with this model, structure functions of order q = 2 in Fig. 7 coincide with twice the variogram of Hydrol. Earth Syst. Sci., 19, 729-745, 2015 www.hydrol-earth-syst-sci.net/19/729/2015/ neutron porosity. As shown in Appendix A, the variogram of Y (x; λ l , λ u ) is given by where µ w and σ 2 w are defined in (Eq. A1). We replace Eq. (10) by γ Y (s) = Cs 2H and fit the latter by ML to secondorder sample structure functions of porosity increments in each well, separately for s n < 10 and s n > 12. Joint estimates of C and H for each range of lags, as well as ML estimates of C based on method-of-moment estimatesĤ w andĤ b from Table 2, together with associated 95 % confidence intervals, are listed in Table 4. Corresponding best fits are depicted graphically in Fig. 11. Here again the two sets of estimates lie within each other's 95 % confidence intervals, implying that they are equally reliable.

Frequency distributions of peaks over thresholds
Extreme value analyses of randomly varying data typically concern block maxima (BM) and/or POTs. The number of neutron porosity increments, P (s n ), available to us at any normalized lag at any well are insufficient to conduct a statistically meaningful analysis of BM. For this reason, and for the fact that POTs provide a higher resolution of maxima than do BM, we focus in this paper exclusively on the former. In way of illustration we consider absolute increments | P (s n )| to constitute POTs whenever they exceed   Figure 10. Sample scale parameter squareσ 2 (s n ) as functions of s n (squares), ML fitted PVs (solid lines) and 95 % confidence limits (broken curves) in Wells 1 and 6 based on (a, b) estimatesÂ given estimatesĤ from Table 2 and (c, d) joint estimates ofÂ andĤ .   Figure 11. Sample structure functions, S 2 N (s n ), of order q = 2 as functions of s n (squares), ML fitted PVs (solid lines) and 95 % confidence limits (broken curves) in Wells 1 and 6 based on (a, b) esti-matesĈ given estimatesĤ from Table 2 and (c, d) joint estimates ofĈ andĤ . a non-negative threshold, u t , equal to the 95 % quantile of | P (s n )| values in a sample. This renders about 5 % of all sampled | P (s n )| values of POTs. Figure 12 identifies POTs associated with sequences of porosity increments depicted in Fig. 3.
In each well, sample autocorrelation of non-overlapping neutron porosity increments at diverse normalized lags diminishes rapidly with the number, n, of these normalized increments (not shown), in line with theoretical expressions (18)-(20) of Neuman (2010). We expect autocorrelations between POTs to be weaker, possibly justifying a representation of their frequency distributions by GPDs (see Appendix B) which, theoretically, apply to independent identi-    Fig. 13 quantile-quantile (Q-Q) plots of GPD fits to frequency distributions of POTs identified in Fig. 12. Included in Fig. 13 are 95 % confidence intervals of these fits and p values of KS goodness-of-fit tests. A list of POT sample sizes and p values associated with the same three lags in all wells is provided in Table 5. The p value is the probability of obtaining given data when a null hypothesis is true. As all p values in    Table 5 exceed 0.05, one cannot reject (at a significance level of 0.05) the null hypothesis that all POTs have GPDs. Figure 14 shows variations of best fit GPD shape (ξ POT , governing the tail behavior of the distribution) and scale (σ POT , governing the spread of the distribution) parameters with normalized lag, and corresponding 95 % uncertainty bounds, in the same wells as in Fig. 13. With the exception of Well 6 in which ξ POT first diminishes with lag and then stabilizes, this parameter fluctuates but does not vary systematically with lag. The same applies to the shape parameter of each fitted GPD. However, σ POT in all wells increases as a power of lag before stabilizing at larger lags, as does the scale parameter of α-stable distributions fitted to all neutron porosity increments in Fig. 6b.

Statistical scaling of peaks over thresholds
We end our analysis by exploring the scaling behavior of qorder sample structure functions of POT in absolute increments P POT,j (s n ) . Following Eq. (1), these sample struc-   ture functions are defined as where N POT (s n ) is the number of POTs at normalized lag s n . We do so as we did earlier for all increments, according to the methodology summarized in Sect. 3. Figure 15 depicts variations of S q N POT (s n ) with normalized lag for q = 0.5, 1.0, and 2.0 in Wells 1 (Maroon) and 6 (Tabnak). A red dashed line in the figure demarcates cross-over between two diverse power-law scaling regimes at s n < 10 and s n > 12. Included in Fig. 15 are logarithmic-scale regression lines and corresponding power-law relations between S q N POT (s n ) and s n in each well and scaling regime. The scaling behavior in Fig. 15 is similar to that shown previously for all (unfiltered) porosity increments in Fig. 7. Corresponding estimates of Hurst exponent are listed in Table 6; these too differ little from those obtained earlier for all porosity increments (Table 2) with the exception of estimatesĤ b which are consistently lower than those associated with unfiltered increments. Like the latter (Fig. 8), POTs exhibit ESS at all lags in the scaling intervals s n < 10 and s n > 12 (not shown).
Our final step is to compute functional relationships between power exponents ξ w (q) and ξ b (q), and the order q, of POT structure functions that scale as power-laws of lag. We  do so as we did previously for unfiltered porosity increments. Corresponding plots of ξ w (q) and ξ b (q) as functions of q, evaluated by the method of moments and ESS in Wells 1 and 6 at s n < 10 and s n > 12, are presented in Fig. 16. Results obtained by the two methods are again, for the most part, very similar. Similar behavior has been shown by us elsewhere  to be consistent with increments sampled from random fields subordinated to tfBm or tfGn.

Conclusions
After showing that neutron porosity data from six deep boreholes in three geologic environments have statistical scaling properties characteristic of samples from scale mixtures of truncated fractional Brownian motion (tfBm) or fractional Gaussian noise (tfGn), we used these data to explore the statistical behavior of extreme porosity increments, the absolute values of which exceed certain thresholds. We expect our results to hold for many earth, environmental and other variables that were shown elsewhere to possess similar statistical scaling properties. These results include the following: 1. The frequency distributions of neutron porosities in any well, or group of wells in any one of the three geologic environments, are non-Gaussian with sharp peaks, asymmetry and slight bimodality.
2. The frequency distributions of neutron porosity increments in any well, or group of wells at one of the three sites, are zero-mean symmetric with heavy tails that decay with increasing vertical separation distance or lag. At all lags, the distributions are represented closely by Well 1 s n > 12 Well 6 s n < 10 Well 6 s n > 12 Figure 16. ξ w (q) and ξ b (q) evaluated for POTs as functions of q by M and ESS in (a) Well 1 at s n < 10, (b) Well 1 at s n > 12, (c) Well 6 at s n < 10, and (d) Well 6 at s n > 12. either α-stable or normal-lognormal probability density models that tend to Gaussian with increasing lag.
3. Order q structure functions of absolute neutron porosity increments grow approximately as positive powers ξ w (q) of normalized lag, s n , at s n < 10 and as much smaller positive powers, ξ b (q), of s n at s n > 12. We interpret this dual power-law scaling to represent withinor intra-layer variability at s n < 10 and between-or inter-layer variability at s n > 12. Values of ξ w (q = 1) and ξ b (q = 1) provide method-of-moment estimates of Hurst exponents H w and H b for these two power-law scaling ranges, respectively.
4. Structure functions of absolute neutron porosity increments exhibit extended self similarity (ESS) at all normalized lags within both power-law scaling ranges, s n < 10 and s n > 12.
5. Values of power-law exponents ξ w (q) and ξ b (q) associated with absolute neutron porosity data, computed by the method of moments and by ESS, are for the most part very similar. Whereas such nonlinear scaling of power-law exponents has traditionally been viewed as a hallmark of multifractality (or, more recently, of fractional Laplace motion), we find the neutron porosity data in this paper to behave in a way fully consistent with that of samples from sub-Gaussian random fields subordinated to truncated (monofractal, selfaffine, Gaussian) fractional Brownian motion or fractional Gaussian noise. The latter is the only view known to be theoretically consistent with ESS in the case of data, such as those considered here, that do not necessarily satisfy Burger's equation.
6. Our method of interpretation allows one to fully characterize the sub-Gaussian random field that underlies a 7. The autocorrelation of neutron porosity increments diminishes rapidly with the number, n, of nonoverlapping increments in a separation distance (lag). This helps explain why sample distributions of peaks over thresholds (POTs, taken here to be absolute increments which exceed their 95 % quantile) are described reasonably well by a generalized Pareto distribution (GPD) model, which in theory applies to iid extrema. Whereas GPD shape parameter estimates do not show systematic variations with lag except in one well, corresponding estimates of GPD shape parameters tend to increase as a power of small lags and stabilize at larger lags. The same happens with scale parameters of α-stable distributions fitted to all (unfiltered) neutron porosity increments.
8. In all other respects, POTs show statistical scaling very similar to that of unfiltered increments. Estimates of POT Hurst exponents are very close to those obtained for unfiltered increments, with the exception ofĤ b , that are consistently lower than those associated with unfiltered increments. Such nonlinear scaling is consistent with our method of interpreting the data. To our knowledge, this is the first documented example of POT statistical scaling interpreted on the basis of sub-Gaussian theory. We are not aware of any known theoretical reason why statistics of POT increments would necessarily scale in a manner similar to that of their parent population, as they do here.

Appendix A
Let Y (x, x + s) = W 1/2 G(x, x + s), where x is a spatial (or temporal) coordinate, s ≥ 0 is lag, W 1/2 is a random variable acting as subordinator, and G is a zero-mean Gaussian random field of increments with pdf (probability density function) f G ( g) and variance σ 2 G dependent on lag, G and W 1/2 being statistically independent of each other at all lags. In this paper we consider W to be either Lévy-stable or lognormal.
Correspondingly, the pdf of Y is where U = W 1/2 , and u = w 1/2 . Since U = W 1/2 > 0 one has As G ∼ N (0, σ 2 G ) and U = W 1/2 ∼ ln N (0, σ 2 V ), Eq. (A3) becomes This is the normal-lognormal (NLN) pdf we refer to in the text. In it σ G plays the role of a scale parameter, and σ V of a shape factor. Letting σ V → 0 is tantamount to letting Eq. (A4) converge to a normal density f Y ( y) = 1 √ 2π σ G e − ( y) 2 2σ 2 G . The larger is σ V the heavier are the tails and the sharper is the peak of the NLN distribution. Fitting Eq. (A4) by maximum likelihood (ML) to sample frequency distributions of Y allows one to estimate σ 2 G and σ 2 V , which in turn allows one to estimate µ w and σ 2 w according to Eq. (A1). The variance of Y is σ 2 Y = µ 2 w + σ 2 w σ 2 G and the variogram of Y is γ Y (s) = 1 2 E ( Y (x, s)) 2 = E W 1/2 2 · 1 2 E ( G(x, s)) 2 = σ 2 w + µ 2 w γ G (s) , where γ G (s) is the variogram of G . Once µ w and σ 2 w have been estimated by maximum likelihood on the basis of Y data as described above, fitting Eq. (A5) to corresponding second-order sample structure functions allows one to estimate all parameters of γ G (s).
In case G has a power variogram, γ G (s) = Bs 2H , of the kind we consider in the manuscript so does Y , γ Y (s) = σ 2 w + µ 2 w γ G (s) = Cs 2H , where C is a coefficient. Fitting Eq. (A6) to second-order sample structure functions of corresponding increments allows one to estimate C and H .

Appendix B
In this work empirical distributions of POTs (peaks over thresholds) of absolute neutron porosity increments at normalized lag s n , | P (s n )|, are shown to fit well-known twoparameter generalized Pareto distributions (GPDs). A GPD is described in terms of the following cumulative distribution function (CDF): H (y) = 1 − (1 + y ξ POT /σ POT ) −1/ξ POT , y = | P (s n )| − u t > 0 (B1) where ξ POT and σ POT are the shape and scale parameters, respectively, governing tail behavior and spread of the distribution; and u t is the predetermined threshold. Equation (B1) reduces to a Pareto (type-II) distribution when ξ POT > 0, an exponential distribution when ξ POT = 0 and a generalized beta distribution (of the first kind) when ξ POT < 0 (Arnold, 2008