We employ elements of information theory to quantify (i) the information content related to data collected at given measurement scales within the
same porous medium domain and (ii) the relationships among information contents of datasets associated with differing scales. We focus on gas permeability data collected over Berea Sandstone and Topopah Spring Tuff
blocks, considering four measurement scales. We quantify the way information
is shared across these scales through (i) the Shannon entropy of the data
associated with each support scale, (ii) mutual information shared between
data taken at increasing support scales, and (iii) multivariate mutual
information shared within triplets of datasets, each associated with a given
scale. We also assess the level of uniqueness, redundancy and synergy
(rendering, i.e., information partitioning) of information content that the
data associated with the intermediate and largest scales provide with
respect to the information embedded in the data collected at the smallest
support scale in a triplet.

Information theory allows characterization of the information content of permeability data related to differing measurement scales.

An increase in the measurement scale is associated with quantifiable loss of information about permeability.

Redundant, unique and synergetic contributions of information are evaluated for triplets of permeability datasets, each taken at a given scale.

Characterization of permeability of porous media plays a major role in a variety of hydrological settings. There are abundant studies documenting that permeability values and their associated statistics depend on a variety of scales, i.e., the measurement support (or data support), the sampling window (domain of investigation), the spatial correlation (degree of structural coherence) and the spatial resolution (rendering the degree of the descriptive detail associated with the characterization of a porous system) (see, e.g., Brace, 1984; Clauser, 1992; Neuman, 1994; Schad and Teutsch, 1994; Rovey and Cherkauer, 1995; Sanchez-Villa et al., 1996; Schulze-Makuch and Cherkauer, 1998; Schulze-Makuch et al., 1999; Tidwell and Wilson, 1999a, b, 2000; Vesselinov et al., 2001a, b; Winter and Tartakovsky, 2001; Hyun et al., 2002; Neuman and Di Federico, 2003; Maréchal et al., 2004; Illman, 2004; Cintoli et al., 2005; Riva et al., 2013; Guadagnini et al., 2013, 2018, and references therein). Among these scales, we focus here on the characteristic length associated with data collection (i.e., support scale).

In this context, experimental evidence at the laboratory scale (observation scale on the order 0.1–1.0 m) suggests that the mean value and the correlation length of the permeability field tend to increase with the size of the data support, the opposite trend being documented for the variance (e.g., Tidwell and Wilson, 1999a, b, 2000). Similar observations, albeit with some discrepancies, are also tied to investigations at larger scales (i.e., 10–1000 m) (Andersson et al., 1988; Guzman et al., 1994, 1996; Neumann, 1994; Schulze-Makuch and Cherkauer, 1998; Zlotnik et al., 2000). We consider here laboratory-scale permeability datasets which are associated with various measurement scales.

Examples of spatial distributions of the natural logarithm of normalized gas permeability,

The above-mentioned documented pattern suggests that the spatial distribution of permeability tends to be characterized by an increased
degree of homogeneity (as evidenced by a decreased variance and an increased
spatial correlation) as the support/measurement scale increases. At the same
time, increasing the measurement scale somehow hampers the ability to detect
locally low permeability values, as reflected by the observed increased mean
value of the data. As an example of the kind of data we consider in this
study to clearly document these features, Fig. 1 depicts the spatial
distribution of the natural logarithm of (normalized) gas permeabilities,
i.e.,

Our study aims at providing an assessment and a firm quantification of these aspects upon relying on information theory (IT) (e.g., Stone, 2015) and the multi-scale collection of data described above. We consider such a framework of analysis as it provides the elements to quantify (i) the information content associated with a dataset collected at a given scale as well as (ii) the information shared between pairs or triplets of data, each associated with a unique scale (while preserving the design of the measurement device). In this context, IT represents a convenient theoretical framework to properly assist the characterization of the way the information content is distributed across sets of measurements, without being confined to a linear analysis (relying, e.g., on analyses of linear correlation coefficients) or invoking some tailored assumption(s) about the nature of the heterogeneity of permeability (e.g., the characterization of the datasets through a Gaussian model).

To the best of our knowledge, as compared to surface hydrology systems only a limited set of works consider relying on IT concepts to analyze scenarios related to processes taking place in subsurface porous media. Nevertheless, we note a great variety in the topics covered in these works, reflecting the broad potential for applicability of IT concepts. These studies include, e.g., the works of Woodbury and Ulrych (1993, 1996, 2000), who apply the principle of minimum relative entropy to tackle uncertainty propagation and inverse modeling in a groundwater system. The principle of maximum entropy is employed by Gotovac et al. (2010) to characterize the probability distribution function of travel time of a solute migrating across a heterogeneous porous formation. Within the same context, Kitanidis (1994) leverages the definition of entropy and introduces the concept of a dilution index to quantify the dilution state of a solute cloud migrating within an aquifer. Mishra et al. (2009) and Zeng et al. (2012) evaluate the mutual information shared between pairs of (uncertain) model input(s) and output(s) of interest and view this metric as a measure of global sensitivity. Nowak and Guthke (2016) focus on sorption of metals onto soil and the identification of an optimal experimental design procedure in the presence of multiple models to describe sorption. Boso and Tartakovsky (2018) illustrate an IT approach to upscale/downscale equations of flow in synthetic settings mimicking heterogeneous porous media. Relying on IT metrics, Butera et al. (2018) assess the relevance of non-linear effects for the characterization of the spatial dependence of flow and solute transport related observables. Bianchi and Pedretti (2017, 2018) developed novel concepts, mutuated by IT, for the characterization of heterogeneity within a porous system and its links to salient solute transport features. Wellman and Regenaur-Lieb (2012) and Wellman (2013) leverage IT concepts to quantify uncertainty and its reduction about the spatial arrangement of geological units of a subsurface formation. Recently, Mälicke et al. (2020) combined geostatistics and IT to analyze soil moisture data (representative of a given measurement scale) to assess the persistence over time of the spatial organization the soil moisture under diverse hydrological regimes.

Here, we focus on the aforementioned datasets of Tidwell and Wilson (1999a, b), who conducted extensive measurement campaigns collecting air permeability data across the faces of Berea Sandstone and Topopah Spring Tuff blocks, considering four different support/measurement scales (see Sect. 2 for details). While our study does not tackle directly issues associated with the way one can upscale (flow or transport) attributes of porous media, we leverage such unique and truly multi-scale datasets to address research questions such as “How much information about the natural logarithm of (normalized) gas permeabilities is lost as the support scale increases?” and “How informative are data taken at a coarser support scale(s) with respect to those associated with a finer support scale?” (see Sect. 3). In this sense, our study yields a unique perspective of the assessment of the value of hydrogeological information collected at differing scales.

We consider the datasets provided by Tidwell and Wilson (1999a, b), who rely
on a multisupport permeameter (MSP) to evaluate spatial distributions of air
permeabilities across the faces of a cubic block of Berea Sandstone
(hereafter denoted as Berea) and Topopah Spring Tuff (hereafter denoted as
Topopah). Data are collected at uniform intervals with spacing

The two types of rocks analyzed display distinct features. The Berea sample
may be classified as a very fine-grained, well-sorted quartz sandstone.
Following Tidwell and Wilson (1999a), visual inspection of the spatial
distributions of

We note that the IT elements described in Sect. 3 refer to discrete variables. While corresponding definitions are available also for continuous
variables (i.e., summation(s) and probability mass function(s) are replaced
by integral(s) and probability density function(s), respectively), these are
characterized by a less intuitive and immediate interpretation (e.g.,
entropy could be negative, infinite or could not be evaluated in case of probability density function(s) involving a Dirac delta; see, e.g., Kaiser and Schreiber, 2002; Cover and Thomas, 2006). Moreover, in case the probability density functions of the analyzed continuous variables cannot be associated with an analytical expression, it is necessary to subject these variables to quantization, and the IT metrics related to the continuous variables are estimated through their quantized counterparts (see Cover and Thomas, 2006). In general, the quality of these estimates increases (in a way which depends on the specific metric) with the level of quantization of the continuous variables (see, e.g., Kaiser and Schreiber, 2002). This leads us to treat

Considering a discrete random variable,

The information content shared by two random variables, i.e.,

Venn diagram representation of the information theory concepts considering two sources, i.e.,

When considering three discrete random variables, it is possible to quantify
the amount of information that two of these (termed sources, i.e.,

The bivariate mutual information shared by the target and each source can be
written as

An additional element of relevance for the aim of our study is the
interaction information

Inspection of Eqs. (4)–(7) reveals that an additional equation is required to evaluate all components in Eq. (4). Various strategies have been proposed in this context (e.g., Williams and Beer, 2010; Harder et al., 2013;
Bertschinger et al., 2014; Griffith and Koch, 2014; Olbrich et al., 2015;
Griffith and Ho, 2015). We rest here on the recent partitioning strategy
formalized by Goodwell and Kumar (2017), due to its capability of accounting
for the (possible) dependences between sources when evaluating the unique
and redundant contributions. The rationale underpinning this strategy is
that (i) each of the two sources can provide a unique contribution of
information to the target even as these are correlated, and (ii) redundancy
should be lowest in case of independent sources. The redundant contribution
can then be evaluated as (Goodwell and Kumar, 2017)

We emphasize that, despite some additional complexities, analyzing the
partitioning of the multivariate mutual information provides valuable
insights into the way information is shared across three variables, these being here permeability data associated with three diverse support scales.
In summary, addressing information partitioning enables us to (i) quantify
and (ii) characterize the nature of the information that two variables
(sources) provide to a third one (target) as a

Evaluation of the quantities introduced in Sect. 3.1 is accomplished
according to three main steps. We employ the Kernel Density Estimator (KDE)
routines in Matlab2018© to estimate the continuous counterparts of the probability mass functions

We remark that the bivariate and multivariate mutual information metrics are evaluated by focusing on the joint probability mass function based on the multi-scale data collected at the same location on the sampling grids.

Probability mass function of the logarithm of normalized gas permeability,

Figure 3 depicts the probability mass function

Inspection of Fig. 3a and b reveals that distributions related to increasing
values of

Information partitioning of the multivariate mutual information,

Otherwise, two distinct behaviors emerge with regard to the location of the
peak(s) of the distributions: (i) the location of the peak of the
distributions is virtually insensitive to

Inspection of Fig. 3d reveals that, given a reference support scale

Figure 4 depicts the results of the information partitioning procedure
detailed in Sect. 2.3 considering the Berea sample and two triplets of
datasets

Otherwise, inspection of Fig. 4c and d reveals that, for the Topopah rock sample, (i) most of the multivariate information coincides with the unique
information associated with the intermediate scale; (ii) the redundant and
unique contributions associated with the largest scale are still non-negligible yet substantially smaller than the uniqueness contribution provided by the intermediate scale; and (iii) there is practically no synergetic information. This set of results derives from the moderate or marked discrepancies displayed by

We recall that the focus of the present study is the quantification of the information content and information shared between pairs and triplets of datasets of air permeability observations associated with diverse sizes of the measurement/support scale. We exemplify our analysis by relying on data collected across two different types of rocks, i.e., a Berea and a Topopah sample, that are characterized by different degrees of heterogeneity.

These datasets (or part of these) have been considered in some prior studies. Tidwell and Wilson (1999a, b) and Lowry and Tidwell (2005) assess the impact of the size of the support/measurement scale on key summary one-point (i.e., mean and variance) and two-point (i.e., variogram) statistics within the context of classical geostatistical methods and evaluate kriging-based estimates of the underlying random fields. Siena et al. (2012) and Riva et al. (2013) analyze the scaling behavior of the main statistics of the log permeability data and of their increments (i.e., sample structure functions of various orders), with emphasis on the assessment of power-law scaling behavior. On these bases, Riva et al. (2013) conclude that the data related to the Berea sample can be interpreted as observations from a sub-Gaussian random field subordinated to truncated fractional Brownian motion or Gaussian noise. All of these studies focus on (a) the geostatistical interpretation of the behavior displayed by the probability density function (and key moments) of the data and their spatial increments and (b) the analysis of the skill of selected models to interpret the observed behavior of the main statistical descriptors evaluated upon considering separately data associated with diverse measurement/support scales. Furthermore, Tidwell and Wilson (2002) analyzed the Berea and Topopah datasets (considering separately data characterized by diverse support scales) to assess possible correspondences between the permeability field and some attributes of the rock samples determined visually through digital imaging and conclude that image analysis can assist delineation of spatial patterns of permeability.

We remark that in all of the studies mentioned above the datasets associated
with a given support (or measurement) scale are analyzed separately.
Otherwise, we leverage elements of IT, which allow a unique opportunity to circumvent limitations of linear metrics (e.g., Pearson correlation) and
analyze the relationships (in terms of shared amount of information) between
pairs (i.e., bivariate mutual information) or triplets (i.e., multivariate
mutual information) of variables. We also note that, even as visual
inspection of

Considering an operational context, including, e.g., groundwater resource management or (conventional/unconventional) oil recovery, we observe that it is common to have at our disposal permeability data associated with diverse support scales. These can be inferred from, e.g., large-scale pumping tests, downhole impeller flowmeter measurements, core flood experiments at the laboratory scale, geophysical investigations, or particle-size curves (see, e.g., Paillet, 1989; Oliver, 1990; Dykaar and Kitanidis, 1992a, b; Harvey, 1992; Deutsch and Journel, 1994; Zhang and Winter, 2000; Attinger, 2003; Pavelic et al., 2006; Neuman et al., 2008; Riva et al., 2013; Barahona-Palomo et al., 2011; Quinn et al., 2012; Shapiro et al., 2015; Galvão et al., 2016; Menafoglio et al., 2016; Medici et al., 2018; Dausse et al., 2019, and reference therein). Assessing (i) the information content and (ii) the amount of information shared between permeability data associated with differing support scales (and/or diverse measuring devices/techniques) along the lines illustrated in the present study can be beneficial for obtaining a quantitative appraisal of possible feedbacks among diverse approaches employed for aquifer/reservoir characterization. Results of such an analysis can potentially serve as a guidance for the screening of datasets which are most informative to provide a comprehensive description of the spatially heterogeneous distribution of permeability. While the methodology detailed in Sect. 3 is readily transferable to scenarios where multi-scale permeability is available, the appraisal of the general nature of some specific findings of the present study (e.g., decrease in the Shannon entropy as the support scale increases, regularity in the trends displayed by the normalized bivariate mutual information) still remains an open issue which will be the subject of future works.

We rely on elements of information theory to interpret multi-scale permeability data collected over blocks of Berea Sandstone and a Topopah Spring Tuff, representing a nearly homogeneous and a heterogeneous porous
medium composed of a two-material mixture, respectively. The unique
multi-scale nature of the data enables us to quantify the way information is
shared across measurement scales, clearly identifying information losses
and/or redundancies that can be associated with the joint use of
permeability data collected at differing scales. Our study leads to the
following major conclusions.

An increase in the characteristic length associated with the scale at which the laboratory-scale (normalized) gas permeability data are collected corresponds to a quantifiable decrease in the Shannon entropy of the associated probability mass function. This result is consistent with the qualitative observation that the ability to capture the degree of spatial heterogeneity of the system decreases as the data support scale increases.

The (normalized) bivariate mutual information shared between pairs of permeability datasets collected at (i) a fixed fine scale (taken as the reference) and (ii) larger scales decreases in a mostly regular fashion independent of the size of the reference scale, once the bivariate mutual information is normalized by the Shannon entropy of the data taken at the reference scale. This result highlights a consistency in the way information associated with data at diverse scales is shared for the instrument and the porous systems here analyzed.

As the degree of heterogeneity of the system increases, we document a corresponding increase in the Shannon entropy (given a support scale) and a decrease in the values of the normalized bivariate mutual information (given two support scales) between permeability data collected at the differing measurement scales.

Results of the information partitioning of the multivariate mutual information shared by permeability data collected at three increasing support scales for the Berea Sandstone sample exhibit a marked level of redundancy and high/low uniqueness for the data collected at the intermediate/coarser scale in the triplets with respect to the data associated with the finest scale. This result can be linked to the fairly homogeneous nature of the sample that is also reflected in the moderate variation of the observed (normalized) gas permeability values with increasing size of the support scale.

Information partitioning for the Topopah tuff sample indicates the occurrence of a still significant amount of unique information associated with the data collected at the intermediate scale, while the redundant portion and the unique contribution linked to the largest scale in a triplet are clearly diminished. This result descends from the heterogeneous structure of the Topopah porous system, where the recorded (normalized) gas permeabilities display moderate or marked discrepancies as

For both rock samples considered, the simultaneous knowledge of permeability data taken at the intermediate and coarser support scales in a triplet does not provide significant additional information with respect to that already contained in the data taken at the fine scale; i.e., the synergic contribution in the resulting datasets is virtually zero.

Data employed were graciously provided by Tidwell, VC, and are available online (

The supplement related to this article is available online at:

The methodology was developed by AD and supervised by and discussed with AG and MR. All codes were developed by AD. The manuscript was drafted by AD. Structure, narrative and language of the manuscript were revised and significantly improved by AG and MR.

The authors declare that they have no conflict of interest.

The authors would like to thank the EU and MIUR for funding in the framework of the collaborative international consortium (WE-NEED) financed under the ERA-NET WaterWorks2014 Cofunded Call. This ERA-NET is an integral part of the 2015 Joint Activities developed by the Water Challenges for a Changing World Joint Programme Initiative (Water JPI). Alberto Guadagnini is grateful for funding from Région Grand-Est and Strasbourg-Eurométropole through “Chair Gutenberg”.

This research has been supported by the EU and MIUR (grant no. ERA-NET WaterWorks2014).

This paper was edited by Erwin Zehe and reviewed by Ralf Loritz and one anonymous referee.