Enhanced Markov-type Categorical Prediction with geophysical soft constraints for hydrostratigraphic modeling

Guo, Liming; Hermans, Thomas; Benoit, Nicolas; Dudal, David; Van De Vijver, Ellen; Madsen, Rasmus; Nørgaard, Jesper; Deleersnyder, Wouter

doi:10.5194/hess-30-1421-2026

Articles | Volume 30, issue 5

https://doi.org/10.5194/hess-30-1421-2026

Articles | Volume 30, issue 5

Research article

17 Mar 2026

Research article |

| 17 Mar 2026

Enhanced Markov-type Categorical Prediction with geophysical soft constraints for hydrostratigraphic modeling

Liming Guo, Thomas Hermans, Nicolas Benoit, David Dudal, Ellen Van De Vijver, Rasmus Madsen, Jesper Nørgaard, and Wouter Deleersnyder

Abstract

Accurately characterizing hydrostratigraphic structures is essential for reliable groundwater flow and transport modeling. Due to limited borehole coverage and geological complexity, uncertainty analysis plays a vital role in supporting robust hydrogeological modeling. Traditional geostatistical approaches such as Multiple-Point Statistics (MPS), offer flexibility in reproducing complex geological patterns and uncertainties, but they are computationally demanding, may struggle to maintain stratigraphic consistency, and can be difficult to apply in practice. Alternatively, the Markov-type Categorical Prediction (MCP) framework has a better computational efficiency and enforces stratigraphic ordering. However, its effectiveness is challenged in areas with sparse borehole data. To address this limitation, this study presents an enhanced MCP approach that incorporates airborne electromagnetic (AEM) geophysical data as soft probabilistic constraints on lithology occurrence. A tunable parameter controls the relative contribution of geophysical and geological information, allowing for flexible data integration within the simulation process. The approach is tested on both synthetic and real-world cases. Synthetic experiments of different scenarios demonstrate that incorporating geophysical constraints improves lithological prediction accuracy, particularly when combined with borehole data. In the field application from Egebjerg, Denmark, we demonstrate how a statistical relationship between lithology and resistivity can be derived by integrating SkyTEM data with borehole lithological logs and depth information. That relation is then combined with conditional probabilities from training images extracted from a 3D interpreted model, using the MCP framework. The results show that the integrated approach enhances the generations of complex geological features, such as buried valleys, especially in areas with limited direct observations. By embedding geophysical information into the MCP framework, the method combines the spatial consistency and stratigraphic ordering of MCP with the extensive coverage and subsurface sensitivity of geophysical data. This integration overcomes a key limitation of MCP and enables more reliable simulations in regions where direct subsurface observations are limited, providing a practical and adaptable tool for improving geological modeling in groundwater studies.

Download & links

How to cite.

Received: 02 Jul 2025 – Discussion started: 06 Aug 2025 – Revised: 26 Jan 2026 – Accepted: 24 Feb 2026 – Published: 17 Mar 2026

1 Introduction

In hydrogeological modeling, uncertainty analysis is essential to account for the incomplete knowledge of subsurface conditions and to support robust decision-making in water resource management, contamination risk assessment, and infrastructure planning (e.g. Feyen and Caers, 2006; Hermans et al., 2023). Subsurface environments are inherently heterogeneous and data available for modeling, such as borehole logs, are typically sparse and unevenly distributed in space. Without explicitly quantifying uncertainty, predictions based on deterministic models risk to misrepresent hydrogeological processes and may lead to ineffective or even damaging management strategies (Koch et al., 2014; Vilhelmsen et al., 2019). This is particularly true when deterministic models are used as the basis for groundwater flow and solute transport simulations, where small inaccuracies in hydrostratigraphic structure can lead to significant deviations in predicted flow paths, travel times, and contaminant breakthrough curves (Dai et al., 2017; Zuo et al., 2023; Song et al., 2019). By incorporating uncertainty into the simulation of geological heterogeneity, geostatistical approaches provide not only plausible geological scenarios but also essential input for ensemble-based hydrogeological forecasting, which is one type of probabilistic approach that relies on multiple realizations to assess model uncertainty (e.g. Enemark et al., 2024; Hermans et al., 2015).

In constructing geostatistical models, borehole data are the most commonly used conditioning data because they provide direct observations of subsurface lithology (Boyd et al., 2019; Madsen et al., 2021). However, the cost, time, and logistical constraints of drilling often result in insufficient borehole coverage, especially at greater depths or in poorly accessible regions (Bongajum et al., 2013; Madsen et al., 2022). In contrast, geophysical exploration methods, such as electrical resistivity tomography (ERT) or airborne transient electromagnetic (AEM) surveys, offer a cost-effective way to obtain relatively high-density spatial coverage (e.g. Maurya et al., 2023; Prikhodko et al., 2024). Although geophysical data do not directly measure lithology, they provide property contrasts (e.g., in resistivity) that, after inversion and interpretation, can be statistically linked to hydrofacies distributions (Michel et al., 2020; Pirot et al., 2017).

Within geostatistical modeling, constructing a reliable hydrostratigraphic model typically requires integrating multiple sources of information, including hard data (e.g. borehole observations), probabilistic constraints (e.g. resistivity models derived from inverted airborne geophysical measurements), and prior geological knowledge about spatial variability and continuity, typically expressed through the form of a covariance model or training image (TI) (He et al., 2014; Høyer et al., 2017; Barfod et al., 2018). Each of these sources contributes complementary information: boreholes offer detailed but localized information, geophysical data provide indirect insights into large-scale hydrostratigraphic architecture, and TIs capture prior geological knowledge and spatial continuity. Combining them leads to more robust and geologically realistic models (Barfod et al., 2018; Jha et al., 2014). However, this integration is not straightforward, as it requires a quantitative understanding of the relationships between different data types. Without careful treatment, combining data of varying resolution, quality, and interpretability may lead to overinterpretation or underutilization of valuable information (Levy et al., 2024).

Multiple-Point Statistics (MPS) has become a widely used geostatistical method in hydrostratigraphical modelling. MPS uses Training Images (TIs) to quantify the spatial variability and reproduce complex geological patterns that cannot be captured by traditional two-point geostatistics (Guardiano and Srivastava, 1993; Mariethoz and Caers, 2014). In recent years, several studies have explored introducing probabilistic spatial constraints, including geophysical models, into MPS frameworks. For example, Barfod et al. (2018) demonstrated that conditioning MPS simulations with airborne electromagnetic data significantly improved the delineation of buried valleys in Denmark. Madsen et al. (2021) proposed treating uncertain geological interpretations as probabilistic constraints, comparing MPS and Gaussian simulation methods, and showed that MPS produced more geologically plausible and connected realizations. Hermans et al. (2015) developed a full MPS-based inversion framework that used ERT data both to falsify prior geological scenarios and to locally constrain groundwater simulations, showing the strength of MPS in quantifying uncertainty and integrating multiple data types. Lochbühler et al. (2014) demonstrated that tomographic images can be used to condition multiple-point statistics facies simulations, thereby improving the structural consistency of simulated geological models. Despite these advances, several challenges remain, MPS methods are computationally expensive and highly sensitive to TIs that are geologically realistic and representative of the entire site (He et al., 2014; Levy et al., 2024). A TI of limited extent can lead to biased simulations that misrepresent key structural features. Most existing MPS approaches, such as Direct Sampling (Meerschman et al., 2013), still treat soft or geophysical data in a heuristic manner, through secondary TIs rather than probabilistic conditioning (e.g., Lochbühler et al., 2014), so they cannot explicitly address non-stationarity within a probabilistic framework. Many existing implementations of geostatistical simulation, such as MPS-based methods, cannot enforce explicit geological transition rules, for instance those arising from successive sedimentary deposition processes, which means that they may fail to represent stratigraphic relationships accurately, particularly those relevant to spatial shifts in facies proportions or variations in layer thickness. Both limitations are critical when simulating subsurface structures for groundwater flow and transport modeling (Cordua et al., 2016; Kim et al., 2017).

A recently applied geostatistical approach by Benoit et al. (2018), known as Markov-type Categorical Prediction (MCP), provides an alternative framework to traditional MPS for simulating categorical geological units. MCP uses bivariate transition probabilities derived from a TI. One of the key advantages of MCP is that it reduces the dependence on high-quality or highly repetitive TIs, which can be a limiting factor in some MPS implementations (Allard et al., 2011). When key features in the TI are sparse, irregular, or unique, MPS may struggle to reproduce them consistently, potentially leading to artificial discontinuities or oversimplified realizations (Barfod et al., 2018). By contrast, MCP operates on a different principle. Rather than trying to reproduce entire patterns from the TI, MCP uses pairwise transition probabilities between units to capture the likelihood of one unit being adjacent to another (Benoit et al., 2018). This approach allows MCP to extract essential geological information in a non-stationary fashion without needing a complete TI. Furthermore, MCP remains computationally efficient, even when simulating models with a large number of lithological categories because it avoids high-order pattern scanning or search-tree construction. One of MCP's strengths is its ability to strictly respect geological rules when certain transitions between units are geologically impossible. For example, if a specific lithological unit is never observed directly above another in the training data, MCP ensures that this configuration will not appear in the simulated model based on zero bivariate probability of these two units (Benoit et al., 2018). Yet, previous applications of the MCP framework have relied almost exclusively on hard conditioning data, such as borehole lithology. In settings where such data are sparse, the method often defaults to random simulation, which can result in geologically unrealistic outputs (Benoit et al., 2018). However, MCP offers greater transparency and flexibility in conditioning, making it well suited for the integration of soft information derived from geophysical inversion models, as we have demonstrated before on 2D synthetic data in Guo et al. (2024). The present work extends the MCP framework by incorporating geophysical soft constraints into the simulation process, weighted through the principle of permanence of ratios (Isunza Manrique et al., 2023). Applied to a real-world 3D geological setting, this integration enables a comprehensive quantitative evaluation of uncertainty reduction and improved geological realism, particularly in areas that are poorly constrained by hard data.

This paper is organized as follows. Section 2 introduces the principles of the Markov-type Categorical Prediction (MCP) framework and outlines its extension for integrating geophysical soft constraints. Section 3 presents the results in two parts. In Sect. 3.1, a synthetic test case is developed, where a true lithological model with multiple layers is defined and conductivity values are assigned using a Gaussian random field. Airborne electromagnetic data are simulated via 1D forward modeling and subsequently inverted using a 1D inversion scheme. The resulting inverted conductivity model is then statistically linked to the true lithology to derive a stochastic relationship, which is incorporated into the MCP framework as an additional constraint. Section 3.2 applies the proposed approach to a real-case study in Egebjerg, Denmark. Bivariate transition probabilities are computed from multiple 2D transects extracted from a 3D interpreted geological model and merged using the Extended Logistic Opinion Pool (ELOP) method to ensure directional consistency. The resulting representative transition probabilities serve as priors for MCP simulations along all transects, enabling the construction of a coherent 3D lithological model. Two representative transects, located in areas with shallow boreholes, are selected for detailed analysis to evaluate the added value of geophysical constraints. Borehole lithology, borehole depth information, and inverted resistivity from AEM data are statistically linked, and this relationship is combined with MCP probabilities through a tunable integration parameter. Finally, constrained MCP simulations are performed to generate multiple lithological realizations, allowing uncertainty quantification and assessment of geophysical constraints. Section 4 discusses the key methodological implications and insights derived from these extended MCP framework applications.

2 Methodology

2.1 Markov-Type Categorical Prediction

Markov-Type Categorical Prediction (MCP) is a probabilistic approach to generate categorical geological models that effectively maintains spatial relationships in a computationally efficient fashion. Within the MCP framework, it is assumed that the influence of neighboring categories on the target location can be captured through individual transition probabilities from the target to each neighbor, with independent interactions among the neighbors themselves. This conditional independence assumption simplifies computations by considering that the categorical states of neighboring points are independent once the category at the target location is known. Although a simplification, it has been shown to produce accurate and unbiased predictions in some geostatistical applications (Benoit et al., 2018).

Given a set of known categories $i_{1}, i_{2}, \dots, i_{n}$ at neighborhood locations $x_{1}, x_{2}, \dots, x_{n}$ , the conditional probability of category i₀ at an unsampled location x₀ is calculated as: (Allard et al., 2011):

\begin{matrix} (1) & P_{i_{0} | i_{1}, \dots, i_{n}}^{MCP} = \frac{p_{i_{0}} \prod_{k = 1}^{n} p_{i_{k} | i_{0}}}{\sum_{i_{0} = 1}^{I} p_{i_{0}} \prod_{k = 1}^{n} p_{i_{k} | i_{0}}} = \frac{p_{i_{0}}^{1 - n} \prod_{k = 1}^{n} p_{i_{k}, i_{0}}}{\sum_{i_{0} = 1}^{I} p_{i_{0}}^{1 - n} \prod_{k = 1}^{n} p_{i_{k}, i_{0}}}, \end{matrix}

where i₀ is the category being predicted, i_k are the observed categories in the neighborhood of x₀, and I is the total number of possible categories (e.g., lithological units). Given a set of observed categories ${i_{1}, i_{2}, \dots, i_{n}}$ at neighborhood locations ${x_{1}, x_{2}, \dots, x_{n}}$ , the bivariate probabilities $p_{i_{k}, i_{0}}$ quantify the likelihood of co-existence between two categorical states. The marginal probability $p_{i_{0}}$ represents the prior likelihood of category i₀; in this study, it is defined as the mean proportion of each lithology observed in the training image.

A key step in MCP implementation involves defining the spatial range of the search neighborhood (x_n) around each estimation point. This range is a user-defined parameter that can be tuned to balance spatial continuity and computational cost. Following Benoit et al. (2018), we adopted an octant-based search strategy with at least one neighbor per octant, resulting in up to eight conditioning data per location.

The MCP approach requires a representative TI, from which the bivariate probabilities P(i_k,i₀) are derived and used in Eq. (1). These probabilities describe the likelihood of observing a pair of categories (i,j) separated by a specific lag h, and they are computed directly from the TI. To compute the bivariate probability between categories i at location x and category j at location x+h, we use the cross-indicator covariance:

\begin{matrix} (2) & P (I (x), J (x + h)) = E [I (x) J (x + h)], \end{matrix}

where I(x) and J(x+h) are binary indicator functions defined as I(x)=1 if the category at location x is i, and 0 otherwise. Similarly, J(x+h) takes value one if j is observed at point x+h. The expectation is taken over all valid pairs of grid nodes separated by vector h in the TI. The estimation of the probabilities is efficiently handled by a method by Marcotte (1996), which relies on the fast Fourier transform to calculate correlations accross multiple spatial lags.

The method is implemented using a sequential simulation framework, where categorical values are assigned iteratively based on the computed conditional probabilities using Eq. (1). The simulation begins with a blank grid, except for available hard data (e.g., borehole observations) . At each iteration, a location x_i is selected following a random path. If sufficient neighboring points are found within the defined octant-based search range, the conditional probability for each possible category at that location is computed using the bivariate transition probabilities. If no neighbors are available, the simulation falls back to using marginal probabilities alone derived from the training image. Once the conditional probability distribution is determined, a category is sampled using random sampling, and the simulated value is assigned to the location. The newly simulated point then is added to the conditioning neighborhood and contributes to the estimation of subsequent nearby nodes. This process continues until the entire simulation domain is completed, resulting in a single categorical realization. The procedure can be repeated multiple times to generate an ensemble of realizations, which can then be used for uncertainty analysis.

A distinctive feature of MCP is its zero-forcing property, which guarantees that if a bivariate probability between two categories is zero, the corresponding category transition over a considered distance is completely restricted in the generated realizations (Allard et al., 2011). This property makes MCP particularly suitable for modeling stratigraphic sequences where ordering constraints must be preserved (Benoit et al., 2018). However, in regions with limited conditioning data, MCP may revert to using marginal probabilities when neighborhood information is insufficient, which can lead to the occurrence of geologically incompatible transitions. To address this, a post-processing correction procedure is introduced in Sect. 2.4 to enforce geological consistency in the final simulations.

2.2 Integration of geophysical data

Geophysical data can provide additional constraints to geostatistical simulations by linking lithological categories with physical properties (Hermans et al., 2015). In the present study, a stochastic resistivity-lithology relationship is established by deriving conditional probabilities from inverted resistivity models. Unlike most previous MPS-based approaches, which typically overlook the spatial degradation of geophysical resolution, our method explicitly incorporates both the smoothing effects of regularized inversion and the loss of resolution with depth into the probabilistic framework (Hermans and Irving, 2017; Hermans et al., 2015; Barfod et al., 2018; Deleersnyder et al., 2023). These probabilities represent the likelihood of observing specific lithological classes given the resistivity values at corresponding locations.

To integrate this information with the categorical constraints from the MCP framework, the permanence-of-ratios principle is applied. This principle assumes that the relative influence of different data sources can be preserved through a ratio-based formulation. The intermediate quantities used in this formulation are defined as follows (Journel, 2002; Isunza Manrique et al., 2023):

\begin{matrix} (3) & \begin{aligned} a = \frac{1 - P (A)}{P (A)}, b = \frac{1 - P (A | B)}{P (A | B)}, \\ c = \frac{1 - P (A | C)}{P (A | C)}, x = \frac{1 - P (A | B, C)}{P (A | B, C)}, \end{aligned} \end{matrix}

where P(A) represents the marginal probability of lithology A, P(A|B) is the conditional probability of lithology A based on MCP constraints B, and P(A|C) reflects the conditional probability of lithology A given the geophysical data C, which is derived from inverted resistivity models through lithology–resistivity calibration. For the synthetic case, this relationship is established using the complete training image, whereas for the real-field application, it is calibrated using borehole lithological data. The final joint probability $P (A | B, C)$ , conditioned on both sources, is obtained using the following proportionality:

\begin{matrix} (4) & \frac{x}{b} = {(\frac{c}{a})}^{τ} . \end{matrix}

Solving for $P (A | B, C)$ yields:

\begin{matrix} (5) & P (A | B, C) = \frac{a^{τ}}{a^{τ} + b \cdot c^{τ}} . \end{matrix}

Here, the exponent parameter τ controls how strongly the model emphasizes differences between the components a and c. As τ increases, the contrast between these terms becomes more pronounced, allowing the model to more clearly favor the source (prior or geophysical) with the stronger signal. This enables flexible adjustment of the relative influence of geophysical data, depending on the level of confidence in its relationship to the target property.

During the sequential MCP simulation, for each unsimulated node, the conditional probability P(A|B) is first computed using Eq. (1), based on the bivariate transition probabilities inferred from neighboring lithological categories. This MCP-based probability is then combined with the geophysically derived conditional probability P(A|C) through the permanence-of-ratios formulation (Eq. 5), yielding the joint probability $P (A | B, C)$ . The resulting distribution $P (A | B, C)$ is subsequently used as the sampling distribution to assign the lithofacies at that node.

Once the joint probabilities are computed for all categories and locations, multiple realizations of the subsurface lithology are generated using the sequential simulation framework of the MCP algorithm. The integration of resistivity based probabilistic information serves to guide the categorical simulation toward geologically plausible outcomes that also honor the spatial patterns suggested by the geophysical data.

To better evaluate the uncertainty of added geophysical constraints in our MCP simulations, we computed entropy maps which measure the diversity of predicted lithology categories at each location based on the Shannon entropy (Pirot et al., 2022) across realizations.

The entropy at each grid location (i,j) was calculated using Shannon's entropy to quantify the uncertainty in lithological prediction:

\begin{matrix} (6) & H (Z_{i, j}) = - \sum_{l = 1}^{L} p_{l} (i, j) \log (p_{l} (i, j) + ε) \end{matrix}

where H(Z_i,j) denotes the entropy at location (i,j), L is the total number of lithology classes, p_l(i,j) is the empirical probability of class l at location (i,j) based on the realizations, and ε is a small constant added to prevent numerical issues when $p_{l} (i, j) = 0$ .

2.3 General Workflow

The general workflow of the MCP framework incorporating geophysical data constraints is as follows (Fig. 1). Case-specific workflows for the synthetic and real-field applications are presented later to highlight differences in data sources and processing steps.

https://hess.copernicus.org/articles/30/1421/2026/hess-30-1421-2026-f01

Figure 1Workflow for the MCP framework constraint with geophysical data. Blue boxes represent the operational steps at the current unsimulated node. Green boxes represent the processing of geophysical data and its statistical calibration with lithology. Yellow boxes illustrate the extraction and integration of prior geological information from TIs for MCP calculations. Red boxes indicate the final probabilistic merging and categorical simulation steps.

Enhanced Markov-type Categorical Prediction with geophysical soft constraints for hydrostratigraphic modeling

2.1 Markov-Type Categorical Prediction

2.2 Integration of geophysical data

2.3 General Workflow

2.4 Post-processing

3.1 Synthetic case

3.1.1 Model Setup

3.1.2 Geophysical Forward and Inversion Modelling

3.1.3 Four simulation Scenarios

3.1.4 Synthetic Results

3.1.5 Sensitivity analysis of the geophysical constraint

3.2 Real-case application

3.2.1 Study area

3.2.2 Interpreted model and selection of TIs

3.2.3 Geophysical data

3.2.4 Workflow for the field case

3.2.5 Simulation results of real-case study

A1 Transect at Y=6 201 300 UTM

A1.1 Example realizations from MCP without geophysical constraint

A1.2 Example realizations from MCP with geophysical constraint

A2 Transect at X=553 600 UTM

A2.1 Example realizations from MCP without geophysical constraint

A2.2 Example realizations from MCP with geophysical constraint