The usefulness of outcrop analogue air permeameter measurements for analysing aquifer terogeneity : testing outcrop hydrogeological parameters with independent borehole data

Introduction Conclusions References


Introduction
Compared to core drilling for sample collection and analysis, outcropping sediments are easily accessible analogues for studying subsurface sediments.This outcrop-analogue concept has been extensively applied in the oil industry for the analysis and modelling of reservoirs (e.g.Flint and Bryant, 1993;McKinley et al., 2004) resulting in various tools to characterize geological facies geometries, their connectivity and continuity (Pringle et al., 2004), and to create 3-D virtual Published by Copernicus Publications on behalf of the European Geosciences Union.

B. Rogiers et al.:
Testing outcrop hydrogeological parameters with independent borehole data outcrop models (Pringle et al., 2006).The concept has also been used with small-scale outcrops in unconsolidated material (e.g.Teutsch et al., 1998;Bayer et al., 2011), collecting both hydraulic and geophysical data.Most of these studies are more concerned with defining the geological facies geometry rather than determining the corresponding hydrogeological parameters and hence direct quantification of these parameters and certainly a comparison with the corresponding subsurface parameters is often lacking.
In slightly dipping unconsolidated stratigraphic settings, a very limited number of facies are generally encountered in a single outcrop.The information contained within such lithofacies type potentially represents key stratigraphic features and hydrogeological parameters for building conceptual groundwater flow models.Furthermore, different outcrops may represent different parts of a stratigraphic or landscape succession series (Beerten et al., 2012).The combination of several outcrops can then be used to obtain a composite picture of an aquifer system containing the same or at least similar sediments.As demonstrated by Rogiers et al. (2013a), the use of a hand-held air permeameter is a very accurate and cost-effective approach for quantifying hydraulic conductivity (K) and its spatial variability in situ on outcropping sediments.The question that remains however is how representative the obtained outcrop parameters are for the actual subsurface sediments.
In first instance, the outcrop sediments may differ in some aspects from their subsurface equivalents as a result of slightly differing depositional contexts, e.g. with respect to the position in the basin (palaeogeographical conditions).Inherently, this problem is largely circumvented by comparing outcrop and subcrop sediments from one and the same formation.
Secondly, the outcropping sediments could also be influenced by post-depositional processes such as surficial weathering and compaction due to slightly different overburden sedimentation and erosion histories.During the initial loading of sands, a rapid increase of packing density and soil strength is expected due to grain reorganization (Pettersen, 2007).As packing becomes tighter, further packing will be increasingly more difficult to achieve, each packing level is more stable than previous levels and deformation is permanent.This process should be visible in the porosity, bulk density and eventually K data of a progressively compacted material.Overconsolidated sands should however not show dilation properties, and unloading would thus have little effect.However, the amounts of silt and clay present throughout the aquifer sediments might initiate such dilation properties.Moreover, dissolution of certain mineral phases or framework grains by meteoric water might also enhance permeability, as shown by Lambert et al. (1997).
The objectives of this paper are therefore (i) to test whether the hydraulic conductivity and its spatial heterogeneity in outcrops obtained through air permeametry are comparable to those of nearby aquifer and aquitard sediments, (ii) to evaluate major differences between outcrop and aquifer sediment K heterogeneity including the transferability of information from outcrop to aquifer sediments, and (iii) to discuss the scale effect and overall outcrop parameter representativity for use in groundwater modelling.For this purpose the results from the outcrop study by Rogiers et al. (2013a) are compared with more standard borehole core analyses and pumping test results.Moreover, grain-size analyses are used to verify the similarity between outcrop and subsurface sediments.In a final step, we provide possible explanations for the observed differences in K behaviour and options on how to integrate air permeametry-based data with existing knowledge available from borehole and pump test analyses in view of developing more reliable groundwater flow models.

Materials and methods
Table 1 provides an overview of all data used in this paper.The hydrogeological setting and the outcrop measurements are discussed first.The data at each outcrop has been upscaled to an equivalent K tensor.Next, the constant-head measurements on the borehole core samples are discussed.The procedure for obtaining grain-size distributions is described, and we shortly introduce the used pumping test methods and analyses.Finally we outline the approach for variography of the data to quantify spatial variability.Rogiers et al. (2013a) proposed a methodology to measure small-scale K variability from unconsolidated outcrop sediments and to calculate outcrop-scale equivalent K values.This methodology relies on air permeability measurements that are converted to saturated K values using the empirical equation from Iversen et al. (2003), and a subsequent numerical upscaling step.The air permeability measurements are performed with a hand-held air permeameter, the Tinyperm II (New England Research & Vindum Engineering, 2011), on a regular grid of measurement locations at the outcrop face.The TinyPerm II device has an inner tip diameter of 9 mm, resulting in an investigation depth of 9-18 mm, corresponding to a maximum spatial support of ∼ 24 cm 3 .Pressing the device plunger will create a vacuum to withdraw air from the outcrop sediments.A microprocessor analyzes the pressure increase, and returns air permeability.The resulting values cannot be converted directly to saturated hydraulic conductivity because corrections are needed in regards to (i) the polar characteristics of water, (ii) the fact that air at atmospheric pressure does not act as a true fluid continuum in soil (e.g.gas slippage might occur at the interface with solids), and (iii) the difficulty in obtaining totally dry conditions in the investigated sediments.The use of empirical relationships like that of Iversen et al. (2003) has proven to be very effective in converting air permeability into hydraulic conductivity.This methodology was tested on five outcrops from three key formations of the Neogene aquifer in north-eastern Belgium (from top to bottom): the Mol formation (the abbreviation Fm. will be used in the subsequent discussions), sandy and clayey parts of the Kasterlee Fm., and the clayey and sandy parts of the Diest Fm.For these five formations additional geological and hydrogeological data is available from a recent characterization campaign (Beerten et al. 2010) of the shallow aquifer sediments in Mol/Dessel (up to about 40 m depth), including seven cored boreholes (Fig. 2 in Rogiers et al., 2013a).This lithostratigraphical succession and its main characteristics are presented in Fig. 1.Apart from the minimum and maximum unit thickness obtained from this recent characterization campaign, a typical borehole core from the clayey Kasterlee Fm. is displayed, as well as a grain-size and glauconite content profile through most of the units.The most striking features are the high clay and fine silt contents within the aquitard represented by the clayey part of the Kasterlee Fm., the sudden increase of the glauconite content in the sediments below this unit, and the contrast in coarse sand content between the upper and lower aquifers separated by the aquitard.

Hydrogeological setting and outcrop analyses
In addition to the individual air-permeameter measurements (spatial support of ∼ 24 cm 3 ) and their statistics, the measurement grids were numerically upscaled to obtain equivalent horizontal and vertical K values at the scale of the outcrop (i.e.typically several m 2 ; Rogiers et al., 2013a).This was done by using the approach of Li et al. (2011).The measurements on the sampling grid were converted into a numerical grid, with one extra grid cell at all sides.By invoking flow conservation for a combination of different boundary conditions an equivalent K tensor was obtained.An overview of this approach for all outcrops characterized by air-permeameter measurements within the study area is provided by Rogiers et al. (2013b).The individual small-scale air-permeameter results show a correlation of 0.93 with independent constant-head laboratory permeameter measurements on 100 cm 3 ring samples taken from the same outcrop measurement grid (Rogiers et al., 2013a).The average ratio between both log-transformed K data (air permeameter/constant head) equals 1.03, and is between 0.78 and 1.24 for individual samples.Repeatability of the TinyPerm II measurements was tested on a set of different lithologies with K ranging from 10 −3.5 to 10 −6.5 m s −1 , with maximum log 10 (K) error variance of 0.007.Given this high repeatability, and the absence of visible macropores in the investigated outcrop faces, the K data obtained from the outcrops is deemed accurate and unbiased.

Constant-head K measurements
To characterize the aquifer sediments' hydraulic conductivity variability, multiple undisturbed 100 cm 3 ring samples (with diameter of 53 mm) were taken from contiguous borehole cores (Beerten et al., 2010).The ring samples were pushed in the cores in horizontal or vertical direction, for characterization of respectively horizontal or vertical K.The gathered data enclose several hundred hydraulic conductivity measurements on such 100 cm 3 ring samples from seven cored boreholes, representing 350 m of core material.Two samples were taken each 2 m, for horizontal and vertical K, but the anisotropy at the sample scale was generally negligible (Beerten et al., 2010).The average thickness of the Mol and Kasterlee formations in these boreholes is respectively 20 and 10 m.The highly stratified clayey part of the Kasterlee Fm. -coarse sand layers alternate with heavy clay lenses with thickness varying from less than a cm to several cm -varies in thickness from 2 to 6 m.The Diest formation is not penetrated fully, but was characterized on average across 15 m.
All 100 cm 3 ring samples were analyzed in the lab using the constant-head method (Klute, 1965), using a lowpressure device for coarse material and a high-pressure device (approx.6 bar) for the clay material expected to display low K values (see Beerten et al., 2010 for more details).Total porosity was also determined for most core samples, as well as bulk density and volumetric moisture content for the outcrop samples, by repeatedly weighing the samples after drying and complete saturation.The methodology is similar to that used by Rogiers et al. (2013a) to validate the outcrop air-permeameter measurements.

Grain-size measurements
A SediGraph or a combination of standard sieving and a suspension cylinder (European standard EN 933-1) was used to quantify, respectively, 20 and eight grain-size fractions of the borehole core samples.All samples were prepared by removing carbonates and organic matter.Clay samples were analyzed with the SediGraph, after removing particles larger than 250 µm by sieving.For more details on the data, the reader is referred to Beerten et al. (2010) and Rogiers et al. (2012).
Grain-size analyses of outcrop samples were performed by laser diffraction with a Malvern Mastersizer (Malvern Instruments Ltd., UK).This method consists of monitoring the amount of reflection and diffraction that is transmitted back from a laser beam directed at the particles, and quantifies 64 grain-size fractions.Each sample was divided into 10 sub-samples by a rotary sample splitter to enable repeated measurements on a single sample, and all samples were measured at least twice.The final result was based on the average grain-size distribution of all sub-samples.Note that particle sizes are expressed as the size of an equivalent sphere with an identical diffraction pattern.

Pumping tests
Step drawdown, constant discharge and recovery tests were performed at different locations within the study area, including some of the borehole locations.The transient groundwater head observations were interpreted with analytical as well as numerical models (Meyus and Helsen, 2012).Results from these large-scale tests are used here to illustrate the scale effect for hydraulic conductivity determination on subsurface sediments, and to compare such large-scale measurements with the numerically upscaled K values for the outcrops.

Variography
The experimental variograms are all fitted with spherical models, using a weighted least squares approach.Two approaches are tested: (1) treating both data sets separately (variogram models for the outcrops are taken from Rogiers et al., 2013a), and (2) using a pooled data set which combines both outcrop and borehole data.In the latter case equal weight is given to both data sets in the least squares fitting.In the former case individual experimental variogram points are weighted according to the number of point pairs they represent.The initial variogram parameters for the nugget, total sill and range were respectively set to the overall minimum semivariance, the data variance, and the maximum lag distance.In certain cases singular model fits occurred due to non-uniqueness (data does not allow to discriminate between different equivalent models, e.g.pure nugget vs spherical model with zero range).The responsible parameters were then fixed at their initial value, before re-initialising the model fitting procedures.All variography was performed with the gstat package (Pebesma, 2004).

Grain-size distributions
Prior to comparing K values obtained from different measurement methods, a comparison is made between grain-size distributions for the outcrop sediments and aquifer materials collected from cored boreholes (Beerten et al., 2010).This evaluation is necessary to verify if the outcrop and aquifer sediments represent the same lithostratigraphical units, and to highlight possible discrepancies between both to inform the comparison of their corresponding K values.
Overall there is good correspondence between outcrop/aquifer grain-size distributions for the sandy part of the Kasterlee Fm. and clayey part of the Diest Fm. (Fig. 2ac), with a somewhat larger fraction of fines (i.e. between 2 and 22 µm) for the outcrop samples.Van Ranst and De Coninck (1983) suggested that post-depositional weathering of glauconite material, a green iron-rich clay mineral, might increase the relative amount of fines.Kasterlee formation samples collected from boreholes contain glauconite up to a few percent, but for the Diest formation it is at least 10 to 20 % (Beerten et al., 2010).The disintegration of the glauconite fractions in the outcrops could thus have increased the fines content.
The comparison further illustrates that the clay fraction (< 2 µm) of the clayey part of the Kasterlee Fm. is about 20 % lower in the outcrop samples compared to the aquifer material.Since we are dealing with outcrop samples that are close to the surface, post-depositional migration of clay out of the clay lenses (e.g.Mažvila et al., 2008) together with bioturbation in the outcrops is a plausible explanation for the lower clay content in the outcrop.Weathering of clay lenses or drapes close to the surface would be another plausible explanation.For the clayey Kasterlee Fm. outcrop, the individual grain-size distribution curves (Fig. 2b) indicate a continuous gradation between two extreme cases, i.e. from a clay lens texture (approximately 40 % clay) to coarse sand without fines (> 90 % sand).The corresponding grain-size distributions for boreholes show no overlap between the clay and sand samples, an illustration of the existence of two distinctly different materials within the clayey part of the Kasterlee Fm. (i.e.heavy clay lenses embedded in coarse sands characterized by a sharp interface) (Beerten et al., 2010).
In conclusion, weathering, clay migration, and bioturbation may have influenced the lower end of the outcrop samples' grain-size distribution considerably.Furthermore, dissimilarities in palaeogeographic conditions and sediment source regions between the outcrop and borehole locations may equally explain such differences.However, the consistent stratigraphic position of the clayey Kasterlee Fm. sediments on top of the Diest Fm. and the relatively good correspondence in particle size for the sandy material (i.e.sand layers within the Kasterlee Fm.), are sufficient underpinning arguments to support using the studied clayey Kasterlee Fm. outcrop at Heist-op-den-berg (for details of the outcrop see Rogiers et al., 2013a) as surrogate for the clayey Kasterlee Fm. aquitard (Gulinck, 1963;Laga, 1973;Fobe, 1995).Additional insight could be obtained from tracing the exact origin and initial composition of the outcrop materials; however, this is beyond the scope of the current paper.

Hydraulic conductivity distributions
Figure 3 provides a comparison of outcrop and borehole (aquifer) K kernel density estimates of the probability density functions (pdfs) for the five sediments.Statistically significant differences exist for all sediments, with p values for F tests all below 4 × 10 −3 , while the corresponding t tests p values are all below 1 × 10 −5 indicating statistically significant differences for both the variance and mean.All outcrop pdfs have higher mean K values than their borehole complement.While most outcrop samples display conductivities between 10 −5 and 10 −3 m s −1 , borehole samples have their most frequent K values between 10 −6 and 10 −4 m s −1 .Moreover, the standard deviations for the borehole samples are consistently larger than those based on the outcrop samples.The left tail of the pdfs tends to be much larger for the borehole data while the peaks tend to be wider (one to two orders of magnitude for the outcrops versus two to four orders of magnitude for the borehole data), especially for the sandy Kasterlee Fm. (Fig. 3b).Relative variability expressed as coefficient of variation (CV) is approximately two times larger for borehole pdfs than for outcrop pdfs (Mol Fm.: −13.4 % vs −5.9 %; Kasterlee Fm. sands: −24.5 % vs −12.9 %; Diest Fm. sands: −23.9 % vs −18.8 %) while it is similar for the clayey parts of the Kasterlee Fm. (−23.9 % vs −18.8 %) and Diest Fm. (−15.8 % vs −17.4 %).For the borehole data, sampling occurred over a large geographical area (several tens of km 2 vs as little as a few m 2 to at most a few tens of m 2 for the outcrops) and over a much larger depth (up to 50 m) thus having the opportunity to sample a much larger spatial heterogeneity.
Several characteristics typical of heterogeneity in K are however visible in both the outcrop and borehole K distributions.For the sandy part of the Kasterlee Fm. (Fig. 3b), a long tail towards low values is present both in the outcrop and in the boreholes, while the majority of samples is within a much narrower distribution in the outcrop.For the clayey part of the Kasterlee Fm. (Fig. 3c), a multi-modal distribution is present for both data sets and representative of samples belonging mainly to clay lenses or sand layers.The clayey part of the Diest Fm. (Fig. 3d) displays a similar pdf in both data sets (ratio of borehole to outcrop CV = 0.91), and the sandy Diest Fm. data (Fig. 3e) shows the best absolute match in terms of the mean K, although the second peak with lower K values was not observed in the outcrop.
Validation of air permeameter K with core-based outcrop K demonstrated absence of systematic bias in the airpermeameter K estimates (Rogiers et al., 2013a).Therefore, differences in K distributions between outcrop and aquifer sediments can be attributed to the scale of investigation (a single outcrop with a typical measurement grid of a few m 2 vs seven ∼ 50 m-deep vertical transects through the different lithostratigraphical formations, Fig. 1), different evolutionary states of the outcropping and subsurface sediments, and possibly different sedimentation conditions.

Linear rescaling correction
To investigate the (dis)similarities between the outcrop and borehole data across these five lithological units, the minimum and maximum values are plotted in Fig. 4, with all deciles (10th, 20th, . . ., 90th percentile) in between.This shows that linear scaling of the outcrop values to the corresponding borehole distributions is possible for all outcrops.The extreme values are however not always in line with the centre of the distributions (as indicated by the deviation of the overall shape of the first and last line segments).All outcrops exhibit a more or less similar trend for at least part of the data, which is supported by the linear model fit on all minimum, maximum and decile points (r 2 = 0.7).The slope, larger than 45 • , indicates that the deviation between outcrop and boreholes is larger for low K than for higher K values, which is consistent with the previous observations.The sandy Diest Fm. curve lies apart and above the other curves, and is much closer to the 1 : 1 line of perfect agreement.This is as expected based on the good correspondence in pdfs (see Fig. 3e).In other words, the Diest Fm. outcrop is well and truly representative for the entire aquifer unit.

Porosity and compaction state
Weathering of clay layers at the surface has certainly contributed to produce higher K values for the fine material in the outcrops, but the systematic bias of about one order of magnitude that is also present for the sands remains unexplained.
Trends in porosity or bulk density with depth are very hard to detect in the borehole data due to the extensive layering of different lithologies and grain-size distributions at the study area (the same lithology may occur at different depth depending on the geographical location).Moreover, the data from the outcrops are hardly sufficient to prove differences with the subsurface sediments are statistically significant.For example, the mean total porosity for the four Mol and Kasterlee Fm. outcrop core samples is 43 % with a mean dry bulk density of 1.52 g cm 3 (see Rogiers et al., 2013a), while the borehole values of the same two formations (43 samples) are 40 % and 1.60 g cm −3 (samples between 2 and 28 m below surface).This is consistent with different compaction states (i.e.outcrop samples being less compacted than borehole samples), but the differences remain very small and are only significant for porosity at the 5 % significance level.However, even small differences in porosity can yield large differences in K (see discussion below).
The impact of the degree of compaction on K values was further investigated for the borehole data set only using total porosity as proxy for compaction, as analyses in literature show that porosity has a high influence on K, given a homogeneous grain-size distribution and chemistry (e.g.Bourbie and Zinszner, 1985).On an individual sample basis, it is hard to detect total porosity -K relationships within the borehole data set, since these are very complex owing to the influence of grain size (Rogiers et al., 2012), sorting, packing and eventually the actual accessible pore throat radii (e.g.Bakke and Øren, 1997;Øren et al., 1998).However, as indicated by the scatter plot in Fig. 5, if total porosity and K are averaged for each formation and for each borehole separately, some statistically significant relationships exist.The slopes of the linear model fits are consistently positive, and in several cases, a change of a few percent in porosity can change K drastically.For instance, a 1 % decrease in porosity yields a decrease in K of minimum 0.14 and maximum 1.08 log 10 units.This is a partial confirmation of the importance of the degree of consolidation and compaction on our K values; corroborating evidence about the effect of grain-size, sorting and packing characteristics will be sought in future research.
An additional analysis of the K -depth below surface relationship was performed but did not yield any significant dependencies (results not shown).This is probably due to the alternation of different lithologies and grain sizes with depth, hence obscuring the influence of depth on compaction and thus on porosity and K.

The scale effect and vertical anisotropy
The representativity of K measurements -whether for outcrop or aquifer sediments -for characterizing a lithostratigraphical unit depends, among others, on the size of the measurement scale (or measurement support) and the spatial extent and lithostratigraphic complexity of the sampled domain.The effect of measurement scale for individual K measurements also impacts the overall variability, as measurements with a larger support volume, like pumping tests, average out the small-scale variabilities (Mallants et al., 1997).It is thus important in the comparison between outcrop and borehole K values to consider such scale-effects.A comparison between the outcrop data (air-permeameterbased geometric meanK values and the calculated corresponding equivalent values) and the subsurface data (borehole core geometric mean K values and the pump test values) is shown in Fig. 6.It reveals the overall range is smallest for the outcrop data, both at the smallest measurement scale (data for air-permeameter measurements spans 5 orders of magnitude versus 8 orders of magnitude for borehole cores) and at the largest scale (calculated equivalent outcrop K values show a range of ∼ 2 orders of magnitude versus ∼ 5 orders of magnitude for pump tests).It is further evident that the outcrop-based equivalent K values are systematically higher than the mean borehole core values; a better correspondence is achieved with the pump test values.
Because a pump test represents a large support volume, easily tens to hundreds of m 3 , small-scale heterogeneities have much less effect on such large-scale Kvalues, hence the smaller data range.Furthermore, the support volume is commensurate with the computational domains used to calculate equivalent outcrop values.Overall the pump test values are generally only slightly smaller than the equivalent outcrop values, except for the clayey part of the Kasterlee Fm. for which the discrepancy is about three to four orders of magnitude.This again emphasizes the need for a correction if outcrop K values are used to inform building conceptual groundwater models.Correction models such as those from Fig. 4 would account for impacts of different compaction and/or weathering processes, especially for the more claybearing sediments.
The arrows in Fig. 6 indicate different effects of upscaling for the aquifer and aquitard units.Moving from the sample (cm scale) to the pumptest scale (metre scale) in most cases increases the aquifer geometric mean K values by one order of magnitude, while the outcrop values remain more or less constant when geometric means are compared with effective values.Unlike the other formations, upscaling the clayey part of the Kasterlee Fm. data results in a decrease of the average K values, for both K v and K h pertaining to the aquifer and for outcrop K v .This indicates that in both the outcrop and aquifer sediments of this particular lithostratigraphic unit a significant amount of small-scale heterogeneity is present (i.e.clay lenses) which significantly decreases the magnitude of the calculated effective K values.
Faulting could be another process involved enhancing discrepancies between small and large measurement supports.However, this process is considered to be absent as the study area is known as a zone of low seismic and limited tectonic activity (De Craen et al., 2012).
A comparison of the vertical anisotropy values (K h /K v ) is shown in Fig. 7.The K h /K v ratios based on the geometric means of the 100 cm 3 borehole cores lies between 1 and 5.The two lithostratigraphical units with the highest K h /K v values are the sandy parts of the Kasterlee and Diest Fm., which are influenced by some outliers that probably belong to the under-or overlying units.The equivalent outcrop K h /K v values are less than the corresponding borehole core anisotropy values, except for the clayey parts of the Diest and Kasterlee Fm.For the latter K h /K v increased more than one order of magnitude, when moving from the borehole core to the outcrop scale.The pump test anisotropy values mostly show larger values compared to those from the borehole cores, with a maximum vertical anisotropy of 10.The original Dessel 2 pump test interpretation by Lebbe (2002) yielded K values for the clayey part of the Kasterlee Fm. and mentions a vertical anisotropy factor of 190 for part of the aquitard.This value was obtained by inverse modelling of the pump test, but due to a limited drawdown across the aquitard, the optimized parameter values remain highly uncertain.A more reliable estimate is probably obtained from the more regional modelling of the Neogene aquifer and the flow across the aquitard by Gedeon and Mallants (2012).They obtain a vertical anisotropy of 148 by inverse conditioning on regional piezometric observations above and below the aquitard.The high vertical anisotropy determined from the outcrop supports these values, and indicates that such large values might be more realistic at larger scales.

Spatial variability
The vertical spatial variability for the outcrop and borehole data (K h only) is compared in Fig. 8 and Table 2.For the Mol For the Kasterlee Fm. sands, the borehole and outcrop data show a large difference, which might be due to the rather limited number of borehole core samples identified as the sandy part of the Kasterlee Fm. or an increased amount of heterogeneity in the outcrop due to weathering processes.The overall variability (total sill) is more or less similar for both outcrop and borehole data, suggesting that the variability captured by the outcrop samples may be used as surrogate for the variability in boreholes.Despite the presence of spatial correlation in the both data sets, the joint model fit shows a pure nugget because of the high semivariance values for the outcrop data.
The clayey Kasterlee Fm. shows the largest spatial variability of all lithological units for both the outcrop and borehole data.While the outcrop shows some spatial correlation, the borehole model shows a pure nugget.The borehole cores show higher variability due to the clay-rich lenses and correspondingly low K values, which are altered in the outcrops, but only the first data point at 0.5 m is contradicting the outcrop data.The joint model fit does reveal their compatibility, and shows spatial correlation up to a few metres.This model might be more useful than the individual variogram models due to the integration of different scales.
Most of the clayey Diest Fm. outcrop data seems to be compatible with the borehole core spatial variability.All three model fits show a range of one to two metres, and similar total sills.The sandy Diest Fm. also exhibits similar total sill in all three cases, with a larger spatial range for the borehole data.The joint model fit is compatible with that of the borehole data, but shows a higher nugget due to the higher semivariances in the outcrop data.The sill and range for the variograms that have not reached a constant semivariance within a lag distance of 14 m (Fig. 8a and e), are highly uncertain as a linear model would provide an equally poor description of the data as the used spherical model.The semivariance within the distance range of the experimental data (up to 10-15 m) is, however, hardly affected by this.
Overall, the borehole data exhibit larger correlation lengths than the outcrop data.The total sills are mostly similar, except for two cases were the borehole data clearly encompasses more heterogeneity.Three out of five experimental variograms are overlapping at certain locations, indicating that at certain scales both data sets exhibit similar spatial variability.Fitting of the joint data sets results in these cases in more robust variogram models.For the Mol formation, the variogram model root mean squared errors (RMSE; Table 2) show that fitting both data sets simultaneously improves the fit, mainly due to the very low outcrop semivariances that are compatible with the borehole data.For the sandy part of the Kasterlee formation, the data sets are not compatible and the joint pure nugget fit shows the highest RMSE.For the clayey Kasterlee formation, both data sets seem to be compatible, except for the borehole data point with the smallest lag distance.For the clayey Diest formation, the variogram models are very similar, as are the RMSE values.For the sandy Diest formation, the range is very different, but the sill values are similar.
This indicates that small-scale structural information, such as alternation of relatively thin clay and sand layers, and its effect on spatial variability in K may be preserved in outcrop sediments.Therefore, analysis of outcrop stratigraphy and hydraulic conductivity variability can yield valuable qualitative and quantitative insight about such properties for similar aquifer and aquitard sediments.

Perspectives
Despite the limitations of and systematic differences between the outcrop and borehole data sets, we have demonstrated that outcrop studies can provide useful information for developing more reliable groundwater flow and contaminant transport models.Because of the systematic differences observed here between outcrop and subsurface sediments, the obtained outcrop K values are not directly applicable in groundwater flow modelling, unless a correction is applied.Furthermore, the different K distributions are comparable at least in a relative way, and linear scaling based on deciles was shown to be relatively accurate.In other words, results such as the spatial heterogeneity models, the equivalent vertical anisotropy factors, and relative differences between the different sediments provide us with information useful to guide conceptual groundwater flow model building and constraining model parameterization.
Potential applications of our findings for building conceptual and numerical models of groundwater flow include (i) where possible highly structured heterogeneity should either be represented explicitly in the models or use should be made of appropriate geostatistical tools (e.g.multiple point statistics) based on detailed structural information visible in and quantifiable from outcrops; (ii) use of the obtained equivalent vertical anisotropy factors can influence conceptual model choices for isotropy/anisotropy for certain units, and the actual value represents a minimum of the parameter range in larger scale groundwater flow simulations (especially in a layered stratigraphical setting); (iii) to avoid over-parameterization, ratios between K values of different units can be fixed during model optimization (e.g.Gedeon and Mallants, 2012) using the ratios obtained from equivalent outcrop estimates; and (iv) use of the obtained outcrop variogram models can complement information from a larger scale (e.g.boreholes), or be used for small-scale geostatistical simulations for detailed local transport simulations.All these applications will be most beneficial when combined with the traditional borehole coring and measurements and other invasive and non-invasive subsurface characterization techniques.

Conclusions
Analysis of outcrop sediments considered to be analogues for various lithostratigraphical units within a sedimentary aquifer provided a qualitative understanding of aquifer and aquitard stratigraphy and a quantitative estimate about K variability at the centimetre-metre scale.Comparison between outcrop and independent borehole core K values revealed significant differences between both data sets.Such differences are believed to be induced mainly by weathering, different palaeoenvironmental conditions and differential compaction, and can be corrected for as was demonstrated on the basis of a linear model.Hence, outcrop information can be used for building better stratigraphic models including determination of spatial structure by variogram fitting for further use in geostatistical simulations.Moreover, the relative variability in K values with similar coefficients of variation for borehole and outcrop K, and the derived anisotropy values are very useful to get a more complete understanding of the heterogeneity within the Neogene aquifer.
Comparison of outcrop and borehole K values demonstrated the borehole K probability density functions had broader peaks, longer tails towards low values, and the presence of a systematic bias.The reasons behind this discrepancy are manifold, and include weathering of the outcrop sediments and a lesser degree of consolidation and associated stress states in outcrops.Also, measurements performed on outcrops sometimes several tens of kilometres away from the main study site may further invoke differences in K. Grainsize analyses showed that the sediments from the investigated outcrops and boreholes are similar but not necessarily exactly the same.Clay migration and bioturbation in the outcrop sediments probably contributed to the observed discrepancies, as well as slight differences in palaeoenvironmental settings.The degree of (over)consolidation and stress states might also have an impact, but further research is needed to confirm or quantify this, as trends with the current depth of the sediments are hard to detect due to the alternation of different lithologies.
Based on all data a linear scaling relationship was derived (r 2 = 0.7) that permits rescaling of outcrop K values to their subsurface equivalents.For most individual units, the differences between outcrop and subsurface sediments were similar (except for the extremes of the distributions).The sandy part of the Diest Fm. however showed a considerably better fit between outcrop and aquifer than the other cases.
In a comparison with K values obtained through other means, outcrop-based equivalent K values were systematically higher than those from pump tests (especially for the clayey part of the Kasterlee Fm.), whose support volumes are considerably larger than the simulation domains considered in the outcrops.Mean borehole core samples resulted in the overall smallest K values.Smaller compaction at shallow depth and long-term biophysical weathering processes presumably contributed to outcrop equivalent K values being larger than any other estimate of large-scale K available in this study.
In most cases the semivariograms for the outcrop and borehole data are compatible.Only for the sandy Kasterlee Fm. the outcrop data clearly shows higher variability than the borehole data.Spatial correlation (i.e.increasing semivariance with distance) is present in most cases, either in the outcrop or borehole data, or both.The clayey Diest Fm. shows however a pure nugget effect for both data sets.For the Mol Fm. and the clayey Kasterlee Fm. both data sets complement each other resulting in more robust semivariogram model fits.For the sandy Diest Fm. there seems to be a discrepancy in the range between both data sets.
Given the small number and limited size of the studied outcrops, transfer of information from outcrops to the corresponding aquifer sediments can be improved by expanding the number of outcrops for the same lithostratigraphical units.In addition, more complementary aquifer information could be collected for developing a depth dependency in aquifer K that incorporates effects of compaction which could then be used to rescale outcrop K values to sediment values at a given depth.Such information, together with geostatistical parameters, may be used as input or prior information to stochastic flow models.
Next to the quantitative information tested in this paper, information about facies geometry, like the alternating clay and sand layers within the clayey Kasterlee Fm., cannot be revealed easily using available in situ methods, and represents very important qualitative knowledge obtained from outcrops.

Fig. 1 .
Fig.1.Overview of the studied lithostratigraphical succession with formation thicknesses, typical glauconite content (weight percentage; % wt), and a typical grain-size profile.A picture of a borehole core from the clayey part of the Kasterlee formation is provided to illustrate its heterogeneity.For more information, seeBeerten et al. (2010).

Fig. 3 .
Fig. 3. Comparison between distributions (kernel density estimates of the probability density functions) for air-permeameter-based outcrop K and constant-head K measurements on undisturbed samples from cored boreholes, for (A) the Mol Fm., (B) the sandy Kasterlee Fm., (C) the clayey Kasterlee Fm., (D) the clayey Diest Fm. and (E) the sandy Diest Fm.Mean (µ) and standard deviation (σ ) are given for both data sources.

Fig. 5 .
Fig. 5. Scatter plot of log 10 -transformed hydraulic conductivity K versus porosity (borehole data set only) for the five lithostratigraphical units with corresponding linear model fits.Each data point represents the mean porosity and mean K of all measurements pertaining to one formation for one particular borehole.

Fig. 6 .
Fig. 6.Comparison of geometric mean K values obtained from borehole core samples, pump tests, outcrop air-permeameter measurements and calculated equivalent values.The gray boxes represent the data limits, and the arrows indicate the contrasting effects of upscaling for the aquifer and aquitard units.

Fig. 7 .
Fig. 7. Comparison of the vertical anisotropy factors derived from the geometric mean K values from Fig. 6.The pluses between round and square brackets represent respectively the parameter value obtained by Gedeon and Mallants (2012) using regional inverse modelling and the value representing a part of the aquitard in the original Dessel 2 pump test interpretation by Lebbe (2002).

Table 1 .
Overview of the different K and grain-size samples used in this paper.

Table 2 .
Rogiers et al. (2013a)erical variogram model parameters for the vertical experimental variograms (range = correlation length).The outcrop data is taken fromRogiers et al. (2013a).The root mean squared error (RMSE) is provided as a measure of goodness of fit.
* Fixed during variogram model fit.