Large-scale 3-D modeling by integration of resistivity models and borehole data through inversion

. We present an automatic method for parameterization of a 3-D model of the subsurface, integrating litho-logical information from boreholes with resistivity models through an inverse optimization, with the objective of further detailing of geological models, or as direct input into ground-water models. The parameter of interest is the clay fraction, expressed as the relative length of clay units in a depth interval. The clay fraction is obtained from lithological logs and the clay fraction from the resistivity is obtained by establishing a simple petrophysical relationship, a translator function, between resistivity and the clay fraction. Through inversion we use the lithological data and the resistivity data to determine the optimum spatially distributed translator function. Applying the translator function we get a 3-D clay fraction model, which holds information from the resistivity data set and the borehole data set in one variable. Finally, we use k - means clustering to generate a 3-D model of the subsurface structures. We apply the procedure to the Norsminde survey in Denmark,


Introduction
In a large-scale geological and hydrogeological modeling context, borehole data seldom provide an adequate database due to low spatial density in relation to the complexity of the subsurface to be mapped.Conversely, dense areal coverage can be obtained from geophysical measurements, and airborne electromagnetic (EM) methods in particular are suitable for 3-D mapping, as they cover large areas in a short period of time.However, the geological and hydrogeological parameters are only mapped indirectly, and an interpretation of the airborne results is needed, often based on sitespecific relationships.Linking electrical resistivity to hydrological properties is thus an area of increased interest, as reviewed by Slater (2007).
Integrating geophysical models and borehole information has proved to be a powerful combination for 3-D geological mapping (Jørgensen et al., 2012;Sandersen et al., 2009), and several modeling approaches have been reported.One way of building 3-D models is through a knowledge-driven (cognitive), manual approach (Jørgensen et al., 2013a).This can be carried out by making layer-cake models composed of stacked layers or by making models composed of structured or unstructured 3-D meshes where each voxel is assigned a geological/hydrogeological property.The latter allows for a higher degree of model complexity to be incorporated (Turner, 2006;Jørgensen et al., 2013a).The cognitive approach enables various types of background knowledge such as the sedimentary processes, sequence stratigraphy, etc. to be utilized.However, the cognitive modeling approach is difficult to document and to reproduce due to its Published by Copernicus Publications on behalf of the European Geosciences Union.
subjective nature.Moreover, any cognitive approach will be quite time consuming, especially when incorporating large airborne electromagnetic (AEM) surveys, easily exceeding 100 000 resistivity models.
Geostatistical modeling approaches such as multiple-point geostatistical methods (Daly and Caers, 2010;Strebelle, 2002), transition probability indicator simulation (Carle and Fogg, 1996), or sequential indicator simulation (Deutsch and Journel, 1998) provide models with a higher degree of objectivity in shorter time compared to the cognitive, manual modeling approaches.An example of combining AEM and borehole information in a transition probability indicator simulation approach is given by He et al. (2014).Geostatistical modeling approaches based primarily on borehole data often face the problem that the data are too sparse to represent the lateral heterogeneity at the desired spatial scale.Including geophysical data enables a more accurate estimation of the geostatistical properties, especially laterally.This could be determination of the transition probabilities and the mean lengths of the different units.However, the geophysical data also open the question of to what degree the different data types should be honored in the model simulations and estimations.Combined use of geostatistical and cognitive approaches can be a suitable solution in some cases (Jørgensen et al., 2013b;Raiber et al., 2012;Stafleu et al., 2011).Direct integration of borehole information and geological knowledge as prior information into the inversion of the geophysical data is another technique for combining the two types of information and thereby creating better geophysical models and subsequently better geological and hydrological models (Høyer et al., 2014;Wisén et al., 2005).
Geological models are commonly used as the basis for hydrostratigraphical input to groundwater models.However, even though groundwater model predictions are sensitive to variations in the hydrostratigraphy, the groundwater model calibration is non-unique, and different hydrostratigraphic models may produce similar results (Seifert et al., 2012).
Sequential, joint, and coupled hydrogeophysical inversion techniques (Hinnell et al., 2010) have been used to inform groundwater models with both geophysical and traditional hydrogeological observations.Such techniques use petrophysical relationships to translate between geophysical and hydrogeological parameter spaces.For applications in groundwater modeling using electromagnetic data, see, for instance, Dam and Christensen (2003) and Herckenrath et al. (2013).Also, clustering analyses can be used to delineate subsurface hydrogeological properties.Fuzzy c-means clustering has been used to delineate geological features from measured EM34 signals with varying penetration depths (Triantafilis and Buchanan, 2009) and to delineate the porosity field from tomography-inverted radar attenuation and velocities and seismic velocities (Paasche et al., 2006).
We present an automatic procedure for parameterization of a 3-D model of the subsurface.The geological parameter we map is the clay fraction (CF).In this paper we refer to clay as material described as clay in a lithological well log regardless the type of clay: clay till, mica clay, Paleogene clay, etc.This term is robust in the sense that most geologists and drillers have a common conception on the description of clay and it can easily be derived from the lithological logs.The clay fraction is then the cumulated thickness of clay layers in a depth interval divided by the length of the depth interval.The CF procedure integrates lithological information from boreholes with resistivity information, typically from large-scale geophysical AEM surveys.We obtain the CF from the resistivity data by establishing a petrophysical relationship, a translator function, between resistivity and the CF.Through an inverse mathematical formulation we use the lithological borehole data to determine the optimum parameters of the translator function.Hence, the 3-D CF model holds information from the resistivity data set and the borehole data set in one variable.As a last step, we cluster our model space represented by the CF model and geophysical resistivity model using k-means clustering to form a structural 3-D cluster model, with the objective of further detailing for geological models, or as direct input into groundwater models.
Lithological interpretation of a resistivity model is not trivial since the resistivity of a geological media is controlled by porosity, pore water conductivity, degree of saturation, amount of clay minerals, etc. Different, primarily empirical, models try to explain the different phenomena, where Archie's law (Archie, 1942) is the most fundamental empirical model, taking the porosity, pore water conductivity, and the degree of saturation into account, but does not account for electrical conduction of currents taking place on the surface of the clay minerals.The Waxman and Smits model (Waxman and Smits, 1968) together with the dual-water model of Clavier et al. (1984) provides a fundamental basis for widely and repeatedly used empirical rules for shaly sands and material containing clay (e.g., Bussian, 1983;Sen, 1987;Revil and Glover, 1998).However, in a sedimentary depositional environment, it can be assumed in general that clay or clayrich sediments will exhibit lower resistivities than the nonclay sediments, silt, sand, gravel, and chalk.As such, discrimination between clay and non-clay sediments based on resistivity models is feasible and the CF value is a suitable parameter to work with in the integration of resistivity models and lithological logs.A 3-D CF model or clay/sand model will also contain key structural information for a groundwater model, since it delineates the impermeable clay units and the permeable sand/gravel units.
With the CF procedure we use a two-parameter resistivityto-CF translator function, which relies on the lithological logs providing the local information for the optimum resistivity-to-CF translation.Hence, we avoid describing the physical relationships underlying the resistivity images explicitly.
First, we give an overall introduction to the CF procedure, and then we move to a more detailed description of

Methodology
Conceptually, our approach sets up a function that best describes the petrophysical relationship between clay fraction and resistivity.Through inversion we determine the optimum parameters of this translator function by minimizing the difference between the clay fraction calculated from the resistivity models ( res ) and the observed clay fraction in the lithological well logs ( log ).
A key aspect in the CF procedure is that the translator function can change horizontally and vertically, adapting to the local conditions and borehole data.The calculation is carried out in a number of elevation intervals (calculation intervals) to cover an entire 3-D model space.Having obtained the optimum and spatially distributed translator function we can transform the resistivity models to form a 3-D clay fraction model, incorporating the key information from both the resistivity models and the lithological logs into one parameter.The CF procedure is a further development to three dimensions of the accumulated clay thickness procedure by Christiansen et al. (2014), which is formulated in 2-D.
The flowchart in Fig. 1 provides an overview of the CF procedure.The observed clay fraction ( log ) is calculated from the lithological logs (box 1) in the calculation intervals.The translator function (box 2) and the resistivity models (box 3) form the forward response, which produces a resistivity-based clay fraction (box 4) in the different calculation intervals.The parameters of the translator function are updated during the inversion to obtain the best consistency between res and log .The output is the optimum resistivityto-CF translator function (box 5), and when applying this to the resistivity models (the forward response of the final iteration), we obtain the optimum res and block kriging is used to generate a regular 3-D CF model (box 6).
The final step is a k-means clustering analysis (box 7).With the clustering we achieve a 3-D model of the subsurface delineating a predefined number of clusters that represent zones of similar physical properties, which can be used as input in, for example, a detailed geological model or as structural delineation for a groundwater model.
The subsequent paragraphs detail the description of the individual parts of the CF procedure.

Observed data -lithological logs and clay fraction
The common parameter derived from the lithological logs and resistivity data sets is the clay fraction (Fig. 1, boxes 1-4).The clay fraction of a given depth interval in a bore- hole (named log ) is calculated as the cumulative thickness of layers described as clay divided by the length of the interval.By using this definition of clay and clay fraction we can easily calculate log in depth intervals for any lithological well log, as the example in Fig. 2a shows.Having retrieved the log values, we then need to estimate their uncertainties since a variance estimate, σ 2 log , is needed in the evaluation of the misfit to res .
The drillings are conducted with a range of different methods.This has a large impact on the uncertainties of the lithological well log data.The drilling methods span from core drilling resulting in a very good base for the lithology classification to direct circulation drillings (cuttings are flushed to the surface between the drill rod and the formation) resulting in poorly determined layer boundaries and a very high risk of introducing contamination into the samples due to the travel time from the bottom to the surface.Other parameters affecting the uncertainty of the log are, to mention a few important ones, sample intervals and sample density, accuracy of the geographical positioning and elevation, and the credibility of the driller.

Forward data -the translator function
For calculating the clay fraction for a resistivity model, res , we use the translator function as shown in Fig. 2b values for a clay-sand interpretation of the resistivity models.Thin geological layers are often not directly visible in the resistivity models, whereas they will most often appear in carefully described boreholes.The length of the calculation intervals reflects the resolution capability of the geophysical method of choice, which means that in some cases the calculation intervals contain both sand and clay layers when imposed on the lithological logs.The translator function must therefore be able to translate resistivity values as partly clay and partly sand to obtain consistency with the lithological logs.This is possible with the translator function in Fig. 2b, where m low and m up represent the clay and sand cut-off values.Thus for resistivity values below m low the layer is entirely clay (weight ≈ 1) and for resistivity values above m up the layer is entirely sand or non-clay (weight ≈ 0).Many functions fulfilling the above criteria could have been chosen, but we use the one shown because it is differentiable throughout while being flat at both ends and fully described by just two parameters.The translator function (W (ρ)) is mathematically a scaled complementary error function, defined as where m low and m up are defined as the resistivity (ρ) at which the translator function, W (ρ), returns a weight of 0.975 and 0.025, respectively (the k value scales the erfc function accordingly).For a layered resistivity model the res value in one calculation interval is then calculated as res = where N is the number of resistivity layers in the calculation interval, W (ρ i ) is the clay weight for the resistivity in layer i, t i is the thickness of the resistivity layer, and t i is the length of the calculation interval.In other words, W weights the thickness of a resistivity layer, so for a resistivity below m low the layer thickness is counted as clay (W ≈ 1) while for a resistivity above m up the layer is counted as non-clay (W ≈ 0). Figure 2a shows how a single resistivity model is translated into res in numbers of calculation intervals.
The resistivity models are also associated with an uncertainty, and if the variance estimates of the resistivities and thicknesses for the geophysical models are available, we take these into account.The propagation of the uncertainty from the resistivity models to the res values is described in detail in Christiansen et al. (2014).
To allow for variation, both laterally and vertically, in the resistivity-tores translation, a regular 3-D grid is defined for the survey block (Fig. 3).Each grid node holds one set of m up and m low parameters.The vertical discretization follows the clay fraction calculation intervals, varying between 4 and 20 m increasing with depth.The horizontal discretization is typically 0.5-2 km and a 2-D bilinear horizontal interpolation of the m up and m low is applied to define the translator function uniquely at the positions of the resistivity models.
To migrate information of the translator function from regions with many boreholes to regions with few or no boreholes, horizontal and vertical smoothness constraints are applied between the translator functions at each node point as shown in Fig. 3. Choosing appropriate constraints is based on the balance between fitting the data while having a reasonable model.The balance is site and data specific, but would typically be based on visual evaluations comparing the results against key boreholes.The smoothness constraints furthermore act as regularization and stabilize the inversion scheme.
Finally, we need to estimate res values at the log positions (named * res ) for evaluation.We estimate the * res values by performing a point kriging interpolation of the res values and associated uncertainties within a search radius of typically 500 m.The experimental semi-variogram is calcu-lated from the res values for the given calculation interval and can normally be approximated well with an exponential function, which then enters the kriging interpolation.The code Gstat (Pebesma and Wesseling, 1998) is used for kriging, variogram calculation, and variogram fitting.Hence, for the output estimates of the * res , both the original variance of res and the variance on the kriging interpolation itself is included to provide total variance estimates of the * res values (σ 2 * res ), which are needed for a meaningful evaluation of the data misfit at the borehole positions.

Inversion -objective function and minimization
The inversion algorithm in its basic form consists of a nonlinear forward mapping of the model to the data space: where δ obs denotes the difference between the observed data ( log ) and the nonlinear mapping of the model to the data space ( res ).δm true represents the difference between the model parameters (m up , m low ) of the true, but unknown, translator function and an arbitrary reference model (the initial starting model for the first iteration, then at later iterations the model from the previous iteration).e log is the observational error, and G denotes the Jacobian matrix that contains the partial derivatives of the mapping.The general solution to the nonlinear inversion problem of Eq. ( 1) is described by Christiansen et al. (2014) and is based on Auken and Christiansen ( 2004) and Auken et al. (2005).
The objective function, Q, to be minimized includes a data term, R dat , and a regularization term from the horizontal and vertical constraints, R con .R dat is given as where N dat is the number of log values and σ 2 i is the combined variance of the ith log (σ 2 log ) and res (σ 2 * res ) given as The inversion is performed in logarithmic model space to prevent negative parameters, and R con is therefore defined as where e r is the regularizing constraint between the two constrained parameters m j and m k of the translator function and N con is the number constraint pairs.The e r values in Eq. ( 6) are stated as constraint factors, meaning that an e i factor of  1.2 corresponds approximately to a model change of ±20 %.
In total the objective function Q becomes Furthermore, is it possible to add prior information as a prior constraint on the parameters of the translator function, which just adds a third component to Q in Eq. ( 7) similar to R con in Eq. ( 6).
The minimization of the nonlinear problem is performed in a least-squares sense by using an iterative Gauss-Newton minimization scheme with a Marquardt modification.The full set of inversion equations and solutions are presented in Christiansen et al. (2014).

Cluster analysis
The delineation of the 3-D model is obtained through a k-means clustering analysis, which distinguishes groups of common properties within multivariate data.We have based the clustering analysis on the CF model and the resistivity model.Other data which are informative for structural delineation of geological or hydrological properties can also be included in the cluster analysis.For example, this could be geological a priori information or groundwater quality data.The resistivity model is part of the CF model, but is reused for the clustering analysis because the representation of lithology used in the CF model inversion has simplified the geological heterogeneity captured in the resistivity model.
K-means clustering is a hard clustering algorithm used to group multivariate data.A k-means cluster analysis is iterative optimization with the objective of minimizing a distance function between data points and a predefined number of clusters (Wu, 2012).We have used Euclidean length as a measure of distance.We use the k-means algorithm in MAT-LAB R2013a, which has implemented a two-phase search, batch and sequential, to minimize the risk of reaching a local minimum (Wu, 2012).K-means clustering can be performed on several variables, but for variables to impact the clustering equally, data must be standardized and uncorrelated.The CF model and resistivity model are by definition correlated.We use principal component analysis (PCA) to obtain uncorrelated variables.
PCA is a statistical analysis based on data variance formulated by Hotelling (1933).The aim of a PCA is to find linear combinations of original data while obtaining maximum variance of the linear combinations (Härdle and Simar, 2012).This results in an orthogonal transformation of the original multidimensional variables into a space where dimension one has largest variance, dimension two has second largest variance, etc.In this case the PCA is not used to reduce variable space but only to obtain an orthogonal representation of the original variable space to use in the clustering analysis.Principal components are orthogonal and thus uncorrelated, which makes the principal components useful in the subsequent clustering analysis.The PCA is scale sensitive and the original variables must therefore be standardized prior to the analysis.Because the principal components have no physical meaning, a weighting of the CF model and the resistivity model cannot be included in the k-means clustering.Instead the variables are weighed prior to the PCA.

Norsminde case
The Norsminde case model area is located in eastern Jutland, Denmark (Fig. 4), around the town of Odder (Fig. 5) and covers 156 km 2 , representing the Norsminde Fjord catchment.The catchment area has been mapped and studied intensely in the NiCA research project in connection with nitrate reduction in geologically heterogeneous catchments (Refsgaard et al., 2014).The modeling area has a high degree of geological complexity in the upper part of the section.The area is characterized by Paleogene and Neogene sediments covered by glacial Pleistocene deposits.The Paleogene is composed of fine-grained marl and clay, and the Neogene layers consist of marine Miocene clay interbedded with deltaic sand layers (Rasmussen et al., 2010).The Neogene is not present in the southern and eastern part of the area, where the glacial sediments therefore directly overlie the Paleogene clay.The Paleogene and Neogene layers in the region are frequently incised by Pleistocene buried tunnel valleys, and one of these is present in the southern part, where it crosses the model area to great depths with an overall E-W orientation (Jørgensen and Sandersen, 2006).The Pleistocene deposits generally appear very heterogeneous, and according to boreholes they are composed of glacial meltwater sediments and till.

Borehole data
In Denmark, the borehole data are stored in the national database Jupiter (Møller et al., 2009), dating back to 1926, which is an archive for all data and information obtained by drilling.Today, the Jupiter database holds information about more than 240 000 boreholes.All borehole layers in the database are assigned a lithology code, which makes it easy to extract the different types of clay layers for the calculation of the log values in the different calculation intervals.
For the model area, approximately 700 boreholes are stored in the database.Based on borehole meta-data found in the database, we use an automatic quality-rating system, where each borehole is rated from 1 to 4 (He et al., 2014).The ratings are used to assign different uncertainty (weights) to the lithological logs/the log values in the CF-procedure.
The meta-data used for the quality-rating are as follows: -drill method: auger, direct circulation, air-lift drilling, etc.; -vertical sample density; -accuracy of the geographical position: GPS or manual map location; -accuracy of the elevation: differential GPS or other; -drilling purpose: scientific, water abstraction, geophysical shot holes, etc.; -credibility of drilling contractor.
The boreholes are assigned points in the different categories and finally grouped into four quality groups according to their total score.Boreholes in the lowest quality group (4) are primarily boreholes with low sample frequencies (less than one sample per 10 m), low accuracy in geographical position, and/or drilled as geophysical shot holes for seismic exploration.
The locations, quality ratings, and drill depths of the boreholes are shown in Fig. 5b.The drill depths and quality ratings are summarized in Fig. 6.As the top bar in Fig. 6 shows, 4 % of the boreholes are categorized as quality 1, 46 % as quality 2, 32 % as quality 3, and 18 % as quality 4. The uncertainties of the log values for the quality groups 1-4 are based on a subjective evaluation and are defined as 10, 20, 30, and 50 %, respectively.The number of boreholes drastically decreases with depth as shown in Fig. 6.Thus, while about 100 boreholes are present at a depth of 60 m, only 25 boreholes reach a depth greater than 90 m.

EM data
The major part of the model area is covered by SkyTEM data, and adjoining ground-based transient electromagnetic (TEM) soundings are included in the resistivity data set (Fig. 5a).
The SkyTEM data were collected with the newly developed SkyTEM 101 system (Schamper et al., 2014b).The SkyTEM 101 system has the ability to measure very early times, which improves the resolution of the near-surface geological layers when careful system calibration and advanced  processing and inversion methodologies are applied (Schamper et al., 2014a).The recorded times span the interval from ∼ 3 µs to 1-2 ms after end of the turn-off ramp, which gives a depth of investigation (DOI) (Christiansen and Auken, 2012) of approximately 100 m for an average ground resistivity of 50 m.The SkyTEM survey was performed with a dense line spacing of 50 m for the western part and 100 m line spacing for eastern part (Fig. 5a).Additional cross lines were made in a smaller area resulting in a total of 2000 line km for the complete survey.The sounding spacing along the lines is approximately 15 m, resulting in a total of 106 770 1-D resistivity models.The inversion was carried out in a spatially constrained inversion setup (Viezzoli et al., 2008) with a smooth 1-D model formulation (29 layers, with fixed layer boundaries), using the AarhusInv inversion code (Auken et al., 2014) and the Aarhus Workbench software package (Auken et al., 2009).The resistivity models have been terminated individually at their estimated DOI, calculated as described by Christiansen and Auken (2012).
The ground-based TEM soundings originate from mapping campaigns in the mid-1990s.The TEM soundings were all acquired with the Geonics TEM47/PROTEM system (Geonics Limited, 2012) in a central loop configuration with a 40 m by 40 m transmitter loop.One-dimensional layered resistivity models with three to five layers were used in the interpretation of the TEM sounding data.

Model setup
The 3-D translator function grid has a horizontal discretization of 1 km, with 16 nodes in the x direction and 18 nodes in the y direction.Vertically the model spans from 100 m above sea level (a.s.l.) (highest surface elevation) to 120 m below sea level (b.s.l.).The vertical discretization is 4 m for layers a.s.l. and 8 m for layers b.s.l., which results in 40 calculation intervals.Hence, in total, the model grid holds 16 × 18 × 40 = 11 520 translator functions each holding two parameters.Translator functions in the 3-D grid situated above terrain, below DOI of the resistivity models, and outside geophysical coverage does not contribute at all, and are only included to make the translator function grid regular for easier computation/bookkeeping.The effective number of translator functions is therefore close to 5200.
The regularization constraints between neighboring translator functions nodes are set relatively loose to promote a predominantly data driven inversion problem.In this case we use horizontal constraint factors of 2 and vertical constraint factors of 3.This roughly allows the two parameters of the translator function to vary by a factor of 2 (horizontal) and a factor of 3 (vertical) relative to adjacent translator function parameters.The resulting variations in the translator model grid are a trade-off between data, data uncertainties, and the constraints (Eq.7).A spatially uniform initial translator function was used with m low = 35 m and m up = 55 m.
To create the final regular 3-D CF model the res values from the geophysical models, the log values from the boreholes, and associated variances are used in a 2-D kriging interpolation for each calculation interval.The 2-D grids are then stacked to form the 3-D CF model.The log values are primarily used to close gaps in the resistivity data set where boreholes are present, as seen for the large central hole in the resistivity survey (Fig. 8b), which is partly closed in the CF model domain (Fig. 8d) by borehole information.In order to match the computational grid setup of a subsequent groundwater model, a horizontal discretization of 100 m is used for the 3-D CF model grid.In this case the dense EM airborne survey data could actually support a finer horizontal discretization (25-50 m) in the CF model.
The k-means clustering is performed on two variables, the CF model and resistivity model, in a 3-D grid with regular horizontal discretization of 100 m and vertical discretization of 4 m between 96 and 0 m a.s.l. and 8 m between 0 and 120 m b.s.l.CF model values range between 0 and 1 and have therefore not been standardized.The resistivity values have been log-transformed and standardized by first subtracting the mean and then dividing by four times the standard deviation.The standardization of the resistivity was performed in this way to balance the weight between the two variables in the clustering.A five-cluster delineation is presented for the Norsminde case in the Results section.

Results
CF modeling results from the Norsminde area are presented in cross sections in Fig. 7 and as horizontal slices in Fig. 8.The total misfit of Eq. ( 7) is 0.37, but, probably more interesting, the isolated data fit (Eq. 1) is 1.26, meaning that we fit the data almost to the level of the assigned noise.parameters in section view.The vertical variation in the translator is pronounced in the resistivity transition zones, because sharp layer boundaries have a smoother representation in the resistivity domain.
For the deeper part of the model (deeper than 10 m b.s.l.), the translator functions vary less.This corresponds well to the general geological setting of the area with relatively homogenous clay sequences in the deeper part, but it is also a result of very limited borehole information for the deeper model parts.The general geological setting of the area is also clearly reflected in the translator function in the horizontal slices in Fig. 8a and b.The eastern part of the area with lowest m low values (dark blue in Fig. 8a) and lowest m up values (light blue/green in Fig. 8b) corresponds to the area where the highly conductive Paleogene clays are present.In the western part of the area, the cross section intersects the glacial complex, where the clays are mostly tills, and higher m low and m up values are needed to get the optimum translation.
The resistivity cross section in Fig. 7c and the slice section in Fig. 8c reveal a detailed picture of the effect of the geological structures seen in the resistivity data.Generally, a good correlation with the boreholes is observed.Translating the resistivities, we obtain the CF model presented in Figs.7d  and 8d.The majority of the voxels in the CF model have values close to 0 or 1.This is expected since the lithological logs are described as binary clay/non-clay, and log values not equal to 0 or 1 can only occur if both clay and non-clay lithologies are present in the same calculation interval in a particular borehole.
From evaluation of the result in Figs.7d and 8d, it is obvious that the very resistive zones are translated into a CF value close to 0 and the very conductive zones are translated into CF value close to 1. Focusing on the intermediate resistivities (20-60 m) it is clear that the translation of resistivity to CF is not one to one.For example, the buried valley structure (profile coordinate 6500-8500 m, Fig. 7d they are clearly separated in the CF model space and span the entire interval (Fig. 10b).The opposite is observed for clusters 3, 4, and 5, which are clearly separated in the resistivity space (Fig. 10a), but strongly overlap in the CF model space (Fig. 10b).The CF model does not differentiate between clay types, in contrast to the EM resistivity data, which have a good resolution in the low-resistivity range and are therefore able, to some degree, to distinguish between clay types.This results in the two-part clustering of the low resistivity (> 20 m) values as seen in Fig. 10a.

Translator function, grid, and discretization
The spatially varying resistivity-to-CF translator function is the key to achieving consistency between the borehole information and the resistivity models, and the spatial variations of the translator model account for, at least, two main phenomena: (1) changes in the resistivity-lithology petrophysical relationship, and (2) the resolution capability in the geophysical results.
The first issue includes spatial changes in the pore water resistivity, the degree of water saturation, and/or contents of clay minerals for the sediments described lithologically as clay.The spatial variation in the pore water resistivity on this modeling scale is probably relatively smooth and small and will therefore only have a minor impact on the resistivity-tolithology/CF translation.Even in the case with larger fluctuations in the pore water resistivity (e.g., presence of saline pore water), the translator function will automatically adapt to this as long as we have borehole information available that resembles the changes and the basic assumption that the clayrich formations are more conductive than coarse-grained sediments is fulfilled.
In the Norsminde area used in the case history, the groundwater table is generally located a few meters below the surface and the groundwater is fresh.This means that the neither pore water resistivity nor the water saturation plays a major role in the resistivity-clay-fraction relationship and thus the translator function.However, in the case with a thicker unsaturated zone like for the pore water resistivity, the translator function will automatically adapt to this situation as long as borehole information is available.
The varying content of clay minerals in the lithologies described as clay will effect the translator model.The correlation between the clay mineral content and resistivity is quite strong and could be the key parameter instead of the simple clay fraction of this procedure, but it would require clay mineral content values available in boreholes on a large modeling scale, which is why we disregard this approach and use the intentionally simple definition of clay and clay fraction.The second issue concerns the resolution of the true formation resistivity in the resistivity models.Lithological logs contain point information with a good and uniform vertical resolution.In contrast, AEM data provide a good spatial coverage, but the vertical resolution is relatively poor and decreases with depth.Detailed geological layer sequences might only be represented by an average conductivity or only have a weak signature in the resistivity models.By allowing spatial variation in the translation we can, to some degree, resolve weak layer indications in the resistivity models by utilizing the vertically detailed structural information from the lithological logs via the translator function.
The resolution in the final CF model is strongly correlated to the resolution in the resistivity model, since the resistivity data set contributes the majority of the information.In general, EM methods are sensitive to absolute changes in the electric conductivity, which makes the resolution in the lowresistivity end superior to the resolution of high-resistivity contrasts.The diffusive behavior of EM methods results in a decreasing horizontal and vertical resolution capability with depth, and the vertical resolution capability furthermore strongly depends on the layer sequence.A sequence of thin lithological layering may therefore be represented as a single resistivity layer with an average conductivity, which is obviously challenging for the geological interpretation.The horizontal resolution strongly depends on the sample/line density of the geophysical measurements, but the footprint of a single measurement sets the lower limit for the horizontal resolution.The Norsminde airborne SkyTEM survey is conducted with a very dense line spacing, giving a very high  The horizontal sampling of the translator function should in principle be able to reproduce the true (but unknown) variations in the resistivity-to-CF translation.However, it is primarily the borehole density and secondarily the complexity of the petrophysical relationship between clay and resistivity that dictate the needed horizontal sampling of the translator function.In our experience, a horizontal discretization of the translator function grid of 1-2 km (linearly interpolated between nodes) is sufficient to obtain an acceptable consistency between the lithological logs and the translated resistivities.For the deeper part of the model domain where the borehole information is sparse, a coarser translator function grid would be sufficient.
Starting model values for the translator function in the inversion scheme become important in areas with very low borehole density, primarily the deeper part of the model domain.The starting model values are selected based on experience and by visual comparison of the resistivity models with key lithological logs.The horizontal and vertical constraints migrate information from regions with many boreholes to regions with few or no boreholes.As in most inversion tasks, a few initial inversions are performed to fine-tune and evaluate the effect of different starting models and constraint setups.
The CF procedure supports both uncertainty estimates on the input data, on the output translator functions, and on the final CF model.Generally, the uncertainties in the CF model are closely related to the borehole density and quality, as well as resolution and density of the resistivity models.The calculation and estimation of input and output uncertainties is described in detail in Christiansen et al. (2014).

Clustering and validation
For the clustered 3-D model, each cluster represents some unit with fairly uniform characteristics.It could be hydrostratigraphic units where the hydraulic conductivity of the cluster units is determined through a subsequent groundwater model calibration, typically constrained by hydrological head and discharge data.Groundwater model calibration of the Norsminde 3-D cluster model has been performed with a preliminary positive outcome, but more experiments are needed before drawing final conclusions.In this process one needs to evaluate the cluster validity, i.e., how many clusters the data can support.Cluster validity can be assessed with various statistical measures (e.g., Halkidi et al., 2002).The number of clusters resulting in the best hydrological performance might also be used as a measure of cluster validity.The validity of the clusters and the resulting groundwater model is still to be explored in more detail.

Conclusions
We have presented a procedure to produce 3-D clay fraction (CF) models, integrating the key sources of information in a well-documented and objective way.
The CF procedure combines lithological borehole information with geophysical resistivity models in producing large-scale 3-D clay fraction models.The integration of the lithological borehole data and the resistivity models is accomplished through inversion, where the optimum resistivity-to-CF function minimizes the difference between the observed clay fraction from boreholes and the clay fraction found through the geophysical resistivity models.The CF procedure allows for horizontal and lateral variation in the resistivity-to-CF translation with smoothness constraints as regularization.The spatially varying translator function is the key to achieving consistency between the borehole Hydrol.Earth Syst.Sci., 18, 4349-4362, 2014 www.hydrol-earth-syst-sci.net/18/4349/2014/ information and the resistivity models.The CF procedure furthermore handles uncertainties in both input and output data.
The CF procedure was applied to a 156 km 2 survey with more than 700 boreholes and 100 000 resistivity models from an airborne survey.The output was a detailed 3-D clay fraction model combining resistivity models and lithological borehole information into one parameter.
Finally a cluster analysis was applied to achieve a predefined number of geological/hydrostratigraphic clusters in the 3-D model and enabled us to integrate various sources of information, both geological and geophysical.The final five-cluster model differentiates between clay materials and different high-resistivity materials from information held in resistivity model and borehole observations, respectively.
With the CF procedure and clustering we aim to build 3-D models suitable as structural input for groundwater models.Each cluster will then represent a hydrostratigraphic unit and the hydraulic conductivity of the units will be determined through the groundwater model calibration constrained by hydrological head and discharge data.
The 3-D clay fraction model can also be seen as a binomial geological sand-clay model by interpreting the high and low CF values as clay and sand, respectively, as the color scale for the CF model example in Figs.7 and 8 indicates.Integration and further development of the CF model into more-complex geological models have been carried out with success (Jørgensen et al., 2013b).

Figure 1 .
Figure 1.Conceptual flowchart for the CF procedure and clustering.
Figure 2. (a) Example of how a lithological log translates into a log and how a resistivity model translates into res , for a number of calculation intervals.The resistivity values and the resulting clay fraction values are stated on the bars, but are also indicated by color according to the color scales of Fig. 7. (b) The translator function returns a weight, W , between 0 and 1 for a given resistivity value.The translator function is defined by the two parameters m low and m up .In this example the m low and m up parameters are 40 and 70 m, respectively.

Figure 3 .
Figure 3.The translator function and 3-D translator function grid.Each node in the 3-D translator function grid holds a set of m up and m low .The m up and m low parameters are constrained to all neighboring parameters as indicated with the black arrows from the black center node.

Figure 4 .
Figure 4.The black square marks the Norsminde survey area.
Figure 5. (a) Resistivity model positions for the SkyTEM survey and the ground-based TEM soundings.(b) Borehole locations, quality (shape), and drill depth (color).Quality 1 corresponds to the highest quality and 4 to the lowest quality.The red dashed line outlines the catchment area (156 km 2 ).

Figure 6 .
Figure 6.Number of boreholes vs. drill depth for the Norsminde survey area.The bars show how many boreholes reach a certain depth.The value to the right of the bars specifies the number of boreholes per km 2 at the different depths.The color coding of the bars marks the borehole quality grouping.

Figure 7 .
Figure 7. Northwest-southeast cross sections (vertical exaggeration x6).Location and orientation of the cross sections are marked in Fig. 8. (a) The m low parameters of the translator function.(b) The m up parameters of the translator function.(c) The resistivity section with boreholes within 200 m of the profile superimposed.Black and yellow vertical bars show the position of boreholes: black blocks mark the clay layers, and yellow blocks mark sand and gravel layers.(d) Clay fraction section and boreholes (same boreholes as plotted in the resistivity section).

Figure 8 .
Figure 8. Horizontal slices at 2 m b.s.l.cropped to the catchment area (dashed line).(a) The m low parameters of the translator function superimposed on the 1 km translator function grid (black dots).(b) The m up parameters of the translator function superimposed on the 1 km translator function grid (black dots).(c) Resistivity slice (interpolated).Note that no EM data are available around the town of Odder (see Fig. 5a), resulting in a "hole" in the resistivity map.(d) Resulting CF model.The hole in the resistivity map is partly closed here because CF values from boreholes are available in this area.

Figure 9 .
Figure 9. Horizontal slices in four depths of the 3-D cluster model.

Figure 10 .
Figure 10.Cluster statistics.The histograms show which data from the original variables make up the five clusters.(a) The distribution of the resistivity data in the five clusters.(b) The distribution of the CF data in the five clusters.

Hydrol. Earth Syst. Sci., 18, 4349-4362, 2014 www.hydrol-earth-syst-sci.net/18/4349/2014/ the
different parts: observed data and uncertainty, forward modeling, inversion and minimization, and clustering.Last we demonstrate the method in a field example with resistivity data from an airborne SkyTEM survey combined with quality-rated borehole information.