Locality-based 3-D multiple-point statistics reconstruction using 2-D geological cross sections

. Multiple-point statistics (MPS) has shown promise in representing complicated subsurface structures. For a practical three-dimensional (3-D) application, however, one of the critical issues is the difﬁculty in obtaining a credible 3-D training image. However, bidimensional (2-D) training images are often available because established work-ﬂows exist to derive 2-D sections from scattered boreholes and/or other samples. In this work, we propose a locality-based MPS approach to reconstruct 3-D geological models on the basis of such 2-D cross sections (3DRCS), making 3-D training images unnecessary. Only several local training subsections closer to the central uninformed node are used in the MPS simulation. The main advantages of this partitioned search strategy are the high computational efﬁciency and a relaxation of the stationarity assumption. We embed this strategy into a standard MPS framework. Two probability aggregation formulas and their combinations are used to assemble the probability density functions (PDFs) from different subsections. Moreover, a novel strategy is adopted to capture more stable PDFs, where the distances between patterns and ﬂexible neighborhoods are integrated on multiple grids. A series of sensitivity analyses demonstrate the stability of the proposed approach. Several hydrogeological 3-D application examples illustrate the applicability of the 3DRCS approach in reproducing complex geological features. The results, in comparison with previous MPS methods, show better performance in portraying anisotropy characteristics and in CPU cost.

1 Introduction 3-D characterization of geological architectures plays a crucial role in the quantification of subsurface water, oil and ore resources (Chen et al., 2017(Chen et al., , 2018;;Foged et al., 2014;Hoffman and Caers, 2007;Jackson et al., 2015;Kessler et al., 2013;Raiber et al., 2012;Wambeke and Benndorf, 2016).Heterogeneity and connectivity of sedimentary reservoirs exert controls on underground fluid transport (Gaud et al., 2004;Renard and Allard, 2013;Weissmann et al., 1999) which is vital to quantify and forecast the formation and distribution of subsurface resources.For a practical 3-D application, however, these attributes are notoriously difficult to characterize and model since the informed data we can acquire are very sparse.Two-point geostatistics (Pyrcz and Deutsch, 2014;Ritzi, 2000) and object-based methods (Deutsch and Tran, 2002;Maharaja, 2008;Pyrcz et al., 2009) are not effective at reproducing anisotropic features and connectivity patterns properly (Heinz et al., 2003;Klise et al., 2009;Knudby and Carrera, 2005;Vassena et al., 2010) due to the lack of high-order statistics and the difficulty in parameterization.To overcome the abovementioned limitations, multiple-point statistics (MPS) was developed over recent years and has shown prospects in modeling subsurface anisotropic structures, such as porous media, hydrofacies, reservoirs and other sedimentary structures (Dell Arciprete et al., 2012;Hajizadeh et al., 2011;Hu and Chugunova, 2008;Oriani et al., 2014;Pirot et al., 2015;Wu et al., 2006).
Published by Copernicus Publications on behalf of the European Geosciences Union.
Q. Chen et al.: Locality-based 3-D multiple-point statistics reconstruction using 2-D geological cross sections The first MPS approach was suggested by Guardiano and Srivastava (1993) and is designed to reproduce heterogeneous geometries by extracting spatial patterns from training images directly rather than through variograms.A training image is a conceptual model derived from observations, and it bears a crucial role in MPS-based stochastic simulation.The first efficient implementation of MPS was developed by Strebelle (2002) on the basis of a tree structure.Several attempts have thereafter focused on improving MPS algorithms (Arpat and Caers, 2007;Caers, 2001;Mariethoz et al., 2010;Straubhaar et al., 2011;Tahmasebi et al., 2012;Wu et al., 2008;Yang et al., 2016;Zhang et al., 2006).With these methods, training images are scanned with a fixed search template and the MPS patterns are stored in a tree or a list data structure.An important difficulty lies in choosing the size of data template to best reproduce large-scale structures (Strebelle, 2002).The larger the size of the data event, the fewer replicates of this data event will be found over the training images for inferring the corresponding conditional probability density function (CPDF).However, when the size of data template is too small, large-scale structures of the training image cannot be reproduced (Mariethoz et al., 2010).In addition, a search template including too many nodes can lead to storing a large number of patterns, increasing CPU cost and memory consumption.The multiple grids concept (Tran, 1994;Strebelle, 2002) mitigates the abovementioned limitations, but they still present due to the rigidity of data templates and multiple grids.A more straightforward MPS method, direct sampling (DS), was proposed in a study by Mariethoz et al. (2010), in which the high order statistics are sampled directly from the training image without storing patterns and without the need for multiple-point grids.One of the main advantages of this approach is that several types of distances between patterns can be considered, making it possible to simulate continuous variables, or even multivariate simulation.As an approximation, pattern distance was used to express the matching degree between the neighborhood of a node and a data event in the training image (Chugunova and Hu, 2008;Mariethoz et al., 2010Mariethoz et al., , 2015)).
No matter which MPS algorithm is used, a suitable training image is the fundamental requirement.Although such algorithms are gaining popularity in hydrogeological applications (Hermans et al., 2015;He et al., 2014;Høyer et al., 2017;Hu and Chugunova, 2008;Huysmans et al., 2014;Jha et al., 2014;Mahmud et al., 2015), they still suffer from one vital limitation: the lack of training images, especially for 3-D situations.Object-based or process-based techniques are one possibility to build 3-D training images (de Marsily et al., 2005;de Vries et al., 2009;Feyen and Caers, 2004;Maharaja, 2008;Pyrcz et al., 2009).Besides inherent limitations in the parameterization of these algorithms, it is also challenging to reproduce the various aspects of geological geometries from a high-resolution outcrop map, or even from insufficient borehole data (Comunian et al., 2014;Pirot et al., 2015).To overcome this difficulty of obtaining 3-D training images, scholars have attempted to use low-dimensional data (e.g., boreholes, cross sections, outcrops and remote sensing and geophysical images) to reconstruct 3-D models directly instead of a training image in the entire 3-D domain (Bayer et al., 2011;Comunian et al., 2011;Hu et al., 2011;Weissmann et al., 2015).In particular, a reconstruction method of partial datasets was proposed by Mariethoz and Renard (2010) by using and adapting the DS algorithm.However, largescale 3-D models contain millions of nodes, and thus a very large number of scan attempts will be carried out for each simulated node by using this method, especially in the early stages of a simulation due to the sparse known data.Therefore, this method still suffers from a severe computational burden for fine 3-D applications.Moreover, it assumes stationarity of the modeled domain, which is not often the case in practice.Comunian et al. (2012) proposed an approach to tackle the lack of a full 3-D training image using sequential 2-D simulations with conditioning data (s2Dcd): a 3-D domain is filled by preserving an overall coherence due to a series of 2-D simulations performed using 2-D training images along orthogonal directions.However, this strategy is not effective at characterizing the connectivity of structures in all directions of a 3-D domain, because each 2-D simulation only considers the high-order statistics in this direction.Moreover, it also suffers from the limitation of nonstationarity of geological phenomena due to the global search in a 2-D plane.To integrate the benefits of both approaches, Gueting et al. (2017) proposed a new combination of the two existing approaches.The combination is achieved by starting with the sequential two-dimensional approach (Comunian et al., 2012), and then switching to the three-dimensional reconstruction approach (Mariethoz and Renard, 2010).However, the abovementioned limitations of the two approaches still remain because this combination is an optimization of the workflow and does not substantially improve the methods.To combine the CPDFs from different directions, several probability aggregation methods were tested and discussed (Allard et al., 2012;Bordley, 1982;Genest and Zidek, 1986;Journel, 2002;Krishnan, 2008;Mariethoz et al., 2009;Stone, 1961).Other 3-D applications to represent geological structures using MPS and partial data include filling in the shadow zone of a 3-D seismic cube (Wu et al., 2008), generating smallscale 3-D models of porous media (Okabe and Blunt, 2007) and building a 3-D training image with digital outcrop data (Pickel et al., 2015).
From another perspective, using very common workflows, geologists can obtain 2-D geological maps or sections from scattered boreholes and/or other samples by studying analogs (Caumon et al., 2009).With increasingly sophisticated data acquisition methods, 2-D high-resolution images can be acquired.For example, large-scale outcrop maps can be captured by using terrestrial lidar (Dai et al., 2005;Heinz et al., 2003;Nichols et al., 2011;Pickel et al., 2015;Zappa et al., 2006), and fine-scale pore images can be derived from progressive imaging techniques (Zhang et al., 2010).Therefore, Hydrol.Earth Syst.Sci., 22, 6547-6566, 2018 www.hydrol-earth-syst-sci.net/22/6547/2018/ there are many ways to acquire low-dimensional data for reconstructing a full 3-D model.In practice, however, these methods using real geological analogs or sections as training images still face significant nonstationarity due to the heterogeneity of geological phenomena and processes (Comunian et al., 2011;de Vries et al., 2009).
To address the insufficient access to a 3-D training image and the challenge of nonstationarity, we present a new strategy to reconstruct 3-D geological heterogeneities using 2-D cross sections (3DRCS) instead of an entire training image.Compared to previous MPS implementations relying on partial data, our proposal is to use only several local subsections closer to the simulated node as training images, rather than full planes perpendicular to the x, y and z directions (Comunian et al., 2012) or searching in the entire 3-D domain (Mariethoz and Renard, 2010).Compared with the filling by a series of 2-D simulations in s2Dcd (Comunian et al., 2012), a random simulation path containing all uninformed locations is used so that multiple-point (MP) statistics in a 3-D domain are captured.The local subsections are able to offer more coherent and reliable statistics since they are spatially closer to the simulated node which is going to be simulated.Moreover, the original cross sections are divided into many subsections according to their spatial relationships, and thus nonstationarity is reduced since it is restricted into a local cube consisting of six or fewer subsections.In principle, our proposal can be applied into any multiple-point stochastic simulation method.In this work, we embed this strategy into a standard MPS framework called ENESIM (Guardiano and Srivastava, 1993).The blocking strategy proposed in this work can significantly reduce the search space of training images, which makes it possible to get a 3-D reconstruction using ENESIM for a reasonable CPU cost.As with DS, in our approach MP statistics are not stored and the neighborhood is flexible.To integrate the patterns from different subsections, two probability aggregation formulas and their combinations are used.As an approximation of the matching degree between neighborhoods and data events, pattern distances are used to enhance the stability of CPDFs.Furthermore, we adapt multiple grids into the 3DRCS approach, where the geometries of data templates are not fixed for grids of different scales.Besides cross sections, any other scattered samples can also be involved into the proposal as conditional data (hard data).
The remainder of this paper is organized as follows.Section 2 gives background information used in the following sections.Section 3 presents the main concepts of the localitybased 3-D MPS reconstruction using 2-D cross sections and the detailed steps of the proposed approach.Section 4 shows a parameter sensitivity analysis and the performance comparison with other MPS algorithms.Section 5 gives a synthetic example in hydrogeology to illustrate the effectiveness of the 3DRCS approach when facing the real geological field data.The final section contains some concluding remarks and ideas for future work.
2 Background information

Pattern distance
A pattern distance d{N X , N Y } is an approximation of the dissimilarity between patterns, which is used to compare the neighborhood of a node currently simulated with a data event in the training image (Mariethoz et al., 2010).Approximate matches are accepted by using a distance threshold t.Namely for a data event N X from the simulation grid, when the condition d{N X , N Y } ≤ t (t ≥ 0) is met, the pattern N Y from the training image will be used to update the current CPDF.For a categorical variable, the distance can be formulated as follows: where For a nonstationary training image from an actual geological phenomenon, repeatability of spatial patterns could be weak so that it is hard to acquire a stable CPDF.Therefore, we adopt a pattern distance with a threshold as an approximation to sample more patterns and get a more stable CPDF.et al. (2012) presented a comprehensive literature review for aggregating probability distributions.These can be divided into additive methods and multiplicative methods according to their mathematical properties.The linear pooling formula (Stone, 1961) is a widely used method (for example, it was used by Okabe and Blunt, 2007) based on the addition of probabilities.It is appealing because of its flexibility and simplicity.Multiplicative methods include Bordley and Tau models and log-linear pooling (based on odd ratios) (Bordley, 1982;Journel, 2002;Genest and Zidek, 1986).

Linear pooling formula
The linear pooling formula, proposed by Stone (1961), is probably the most intuitive way of aggregating the probabilities P 1 , . . ., P n of an event A.
In this formula, w i are positive weights and their sum must equal 1 to obtain a global probability P G ∈ [0, 1].

Log-linear pooling formula
The log-linear pooling formula is a linear operator of the logarithms of the probabilities (Genest and Zidek, 1986).If a prior probability P 0 (A) must be included, it is written as: w i = 1 is needed to verify external Bayesianity.There are no other constraints whatsoever on the weights w i , i = 0, . . ., n.The sum S = n i=1 w i plays an important role in this formula.If S = 1, the prior probability P 0 is filtered out because w 0 = 0. Otherwise, if S > 1, the prior probability has a negative weight and P G is further away from P 0 than other probabilities.Conversely, if S < 1, P G is always closer to P 0 .Therefore, we can adjust the influence of the prior probability P 0 on the aggregated result P G by changing the value of S.

Multidimensional scaling and kernel smoothing
Tan et al. ( 2014) proposed a distance-based approach to evaluate the quality of multiple-point simulation outcomes where the Jensen-Shannon (JS) divergence is used to depict the dissimilarity of MP histograms as a quantitative metric.The information in the dissimilarity of MP histograms can be visualized using multidimensional scaling (MDS) (Caers, 2011).MDS approximates these distances by a lower-dimensional Euclidean distance in Cartesian space, which facilitates the visualization of the dissimilarity of MP histograms.Hermans et al. (2015) used an adaptive kernel smoothing (see Park et al., 2013) to estimate the probability density of the data variable for each kind of realizations f (Ref • |R i ) in the d-dimension space inferred from MDS.This allows the probability density distribution of the realizations around the reference to be estimated.For each kind of realization, its probability relative to the reference P (R i |Ref) can be calcu-lated by using Bayes' rule: 3 Methodology

Local search strategy of 3-D MPS reconstruction
In the abovementioned MPS methods, when using partial data, whether searching an entire 3-D domain or complete sections, any locations of the training images are scanned even if they are far away from the simulated node, so that one spatial pattern will be carried to a distant position.Therefore, the use of these methods is restricted to stationary training images, which are in practice seldom available.In this work, we propose a local search strategy that allows this problem to be palliated, by taking into account the spatial relationships of the real geological cross sections in a given 3-D domain.
As illustrated in Fig. 1, a 3-D domain is segmented into nine small blocks by six cross sections from three orthogonal directions where there are two sections in each direction.Every local block is surrounded by n local subsections (1 ≤ n ≤ 6).It should be noted that, sometimes, local blocks are not closed (i.e., the surrounding subsections are less than six) (Fig. 1b), and it is also possible that sections along some planes are missing; however, at least one section should be provided.For each unknown node in the local block (e.g., the gray cubes in Fig. 1c), the MP statistics are captured from the surrounding subsections rather than from the entire sections.Namely, there are n corresponding training images for each simulated node.These local subsections are the parts of the global cross sections which are closer to the uninformed nodes in the local block; thus, they are more likely to be re-garded as statistically representative.Data events are selected from the informed nodes (hard data) on three planes parallel to the subsections and through the current simulated node in three orthogonal directions by using a flexible neighborhood.
Another important point is related to handling of the search window when scanning a subsection.Here, we allow all locations of a subsection to be visited by the central node of a data event.The neighbor nodes of the data event can be placed in other adjacent subsections when matching with the training images.As shown in Fig. 2, the area inside the blue line is the search window.If only the nodes of the data event are from the subsection itself (case 1 on the figure), the training patterns are seriously reduced.We adopt a search strategy where neighbor nodes can be searched in the neighboring subsections (case 2 on the figure).Its main advantages are the coherence of the spatial patterns in a realization and the larger number of training patterns available.In addition, the size of the data events is constrained by the boundary of the global section, as illustrated in Mariethoz et al. (2010).
If more cross sections are available, a finer spatial subdivision can be used.In this case, the size of each subsection is smaller and the computational cost is reduced significantly.However, extremely small training images cannot offer enough spatial patterns, thus a minimal subsection size has to be considered.In practice, if there are many sections in each direction, a feasible solution is to select several ones as the references and use others as conditioning data only.

Strategy for aggregating the PDFs from local subsections
As an additive aggregation method, the linear pooling formula corresponds to a mixture model, which is related to the union of events and to the logical operator OR (Allard et al., 2012).This method is thus used to unite several independent probabilities into a global term P G .The log-linear pooling formula, based on the multiplication of probabilities, is related to the intersection of events and to the logical operator AND.Therefore, we usually use such a method to aggregate the probabilities with significant correlation to acquire a conjunction probability.
In this study, n PDFs (1 ≤ n ≤ 6) are computed from the surrounding local subsections (Fig. 1).For the illustrative case proposed here, a local 3-D domain is surrounded by six subsections, and six PDFs are being aggregated.There are two parallel subsections (training images) in each direction.An additive aggregation operator is more appropriate to combine the two probability distributions from parallel subsections, since we just expect a larger number of samples and thus more robust PDF by uniting both.Then, three orthogonal PDFs are obtained.We then join these PDFs containing the statistics from different directions with obvious anisotropic features.This scenario needs a multiplicative method to combine the orthogonal PDFs so as to retain the features in all directions.In summary, an optimal probability aggregation strategy is proposed by the procedure described below: 1. Aggregate the PDFs collected along the same direction for parallel subsections using the linear pooling formula described in Sect.2.2.1.
2. Aggregate the orthogonal PDFs from the above step by using the log-linear pooling formula described in Sect.2.2.2.
Of course, the probability aggregation step is not required when for step 1 there is only one subsection along a given plane, and for step 2 the PDFs that are missing along some direction are simply not included in the aggregation process.For the step 1, the weights w 1 and w 2 are related to the distances between the current location and the two parallel subsections d 1 and d 2 and are computed as follows: Such parameterization ensures that within-block trends are accounted for.
For step 2, an influence of the prior probability is desired to tune the other orthogonal PDFs.Thus, we usually use 0 < w 0 < 1, and set w i (i = 1, . . ., n) to be equal, i.e. w i = (1 − w 0 )/n, where n is the number of PDFs to be aggregated.However, the weights w i (i = 1, . . ., n) can also change; for example, they can vary at each simulation step, as described in Comunian et al. (2012)

Flexible search template on multiple grids
When large neighborhoods are considered, it is more difficult to find matching data events in the training image and thus a larger distance threshold t is required to obtain a sufficient number of samples for an acceptable CPDF.This can lead to degrading small-scale features or the removal of categories that have a low proportion.To address this issue, we propose a novel implementation of multiple grids where the search template is flexible and the distance threshold t varies according to the radius of the neighborhood.
As illustrated in Fig. 3, an example of multiple grids with three levels is used to show the relationship between neighborhoods, search radius R and distance threshold t on different grids.A neighborhood is identified by the informed and/or simulated nodes located in the circle with a radius R on the current grid and the current node (the gray nodes in Fig. 3) as a central.The initial radius R 0 and distance threshold t 0 for the first grid are assigned as the input parameters.The radius R linearly reduces to 1 from the first to the last grid, and the threshold t similarly varies from 1 to 0. The neighboring nodes (hard data and previously simulated nodes) around the central node on the current grid are selected to build a data event according to the radius R and the maximum number of points in the neighborhood.Therefore, a large data event is divided into several small parts placed on the different grids, which results in smaller neighborhoods on each grid.An acceptable threshold t is thus assigned to each neighborhood.For the last grid, the radius is reduced to 1 and at most there are eight nodes in a neighborhood.This strategy considers that small data events located on the last grid are much more repetitive (thus easier to find) than the large data events of the first grid.Figure 3 shows the flexible use of multiple grids on one plan through the current node.In the local search strategy proposed in this work, three planes through the current simulated node in three orthogonal directions are considered to obtain the neighborhoods.Thus the same strategy will be applied on other two planes.

Step-by-step algorithm using the local search strategy
Based on the strategies proposed in the above sections, the detailed steps of our simulation algorithm proceed as illustrated in Algorithm 1.
As mentioned above, we capture the MP statistics from several subsections of a local domain.Thus, the corresponding prior proportion should also be computed on the basis of these surrounding subsections (step 2).Comparing to s2Dcd, we use a fully random path on each multiple grid in the 3-D space and not within a specific section.For the current node, however, the MP statistics are only captured from several subsections in three orthogonal directions, because we only have 2-D cross sections to scan and not a 3-D training image.Obviously, step 8 is the most important procedure in our simulation algorithm, and the idea is inspired by EN-ESIM (Guardiano and Srivastava, 1993) and DS (Mariethoz et al., 2010).The main procedure is demonstrated in Algorithm 2.
The fraction of the scanned training image f and the distance threshold t are borrowed from DS and they play the same roles.χ 0 , χ 1 , γ 0 and γ 1 are the indexes of the clos-

Parameter sensitivity
The majority of parameters of 3DRCS are similar to DS.Therefore, only the sensitivity of three parameters specific to 3DRCS are tested against the 3-D reference shown in Fig. 4

Number of cross sections
The number of cross sections N cs is a new parameter in the 3DRCS approach.They are not only regarded as the training images and conditioning data, but also control the computing speed and the quality of the reconstructions.Figure 5 and Table 1 show different reconstructions and their statistical properties by increasing the sections in every direction.
In this test, the number of cross sections N cs in each direction increases from one to six, and other parameters are fixed: maximum search radius = 50, maximum number of points in a neighborhood = 35, distance threshold t 0 = 0.2, fraction of training image to scan f = 0.8, maximum of matched patterns from each training image = 100, number of multiple grids = 3, weights of the probability aggregation w 0 = w 1 = w 2 = w 3 = 0.25.We obtain 20 realizations for each set of cross sections.The main difference between the different settings is the improvement of computational efficiency with the increase in cross sections.The proportion of pores (porosity) is reproduced at a similar level for each group.Also, when increasing the number of cross sections N cs , the number of geobodies gets closer to the reference and the variability is decreased and the connectivity becomes stable, which is caused by the increase of conditioning data (i.e., informed cross sections).On the other hand, using too many cross sections will lead to a number of artifacts since the training subsections for each subblock are very small, resulting in an insufficient number of samples (see the sections extracted from the reconstructions in Fig. 5).
As a consequence, we recommend that several sections can be chosen if there are abundant candidates in one direction, which must ensure that the features of selected ones are diverse and contain enough spatial patterns, but not incurring artifacts.In this test, 3 or 4 sections in each direction are recommended, but it is related to the size of simulation grid in other 3-D application.In general, one informed section for every 50 grid cells in one direction in the simulation grid is recommended.When there are very few or no sections in a direction, a feasible solution has been suggested in a study by Gueting et al. (2017) in which sequential 2-D simulations are performed to obtained some sections first, and then both the original informed data and the obtained sections are used to reconstruct the model of the entire 3-D domain.

Maximum number of matched patterns from each training image
Table 2 shows the statistics of 20 realizations obtained by varying the maximum of matched patterns from each training image N max , which is a novel parameter adopted in this work to avoid the unnecessary searches while obtaining a CPDF from training images.Other parameters are the same as in the former test presented in Fig. 5, except for the sections in each direction which are fixed to 3. We find that the computational cost increases sharply when N max > 160 and then stabilizes.

Concerning the compared statistical properties, low values
Hydrol.Earth Syst.Sci., 22, 6547-6566, 2018 www.hydrol-earth-syst-sci.net/22/6547/2018/ Figure 5. Reconstructions and their statistical properties when increasing the number of sections in each direction.The first section along x direction of a reconstruction for each case is presented here.We only present the connectivity functions computed along the coordinate y since their features are similar in three directions.The black lines represent the corresponding features of the reference models, and the gray lines represent the features of the reconstructions.
of N max result in variabilities because it is almost like sampling the result directly from training images, and the role of CPDFs is lost.For the remaining cases, the statistics are similar except for a decrease in variances with increasing N max (Table 2).In order to better grasp the effect of N max , three cases are selected (N max = 5, 40, 320) and the corresponding realizations are shown in Fig. 6a-c.The connectivity functions vary in a large range for small N max values.Conversely, they become more stable when increasing N max (Fig. 6d).
The variance of variables bears the same tendency when increasing N max (Fig. 6e).Consequently, N max = 40 to 160 is recommended, resulting in a balance between a stable CPDF and computational cost.

Weights of the probability aggregation formulas
In this work, the strategy for aggregating the PDFs from local subsections includes two steps.In the first step the weights of the linear pooling formula for two parallel subsections are selected depending on the distances between the current location and the two subsections in the first step.Therefore, the weights are automatically set and do not need to be set.In the second step, the appropriate weights for the prior probability distribution and three orthogonal CPDFs are to be selected by the user.Figure 7 shows different realizations obtained by varying the four weights w 0 , w 1 , w 2 and w 3 .Here we increase the weight of the prior probability distribution w 0 and www.hydrol-earth-syst-sci.net/22/6547/2018/ Hydrol.Earth Syst.Sci., 22, 6547-6566, 2018  let the other three weights equal, since the CPDFs from three orthogonal directions have the same contribution.Of course, if users think the CPDF of one direction is more important than others, they can be changed, under the constraint that 3 i=0 w i = 1.It can be observed that when w 0 = 0, the spatial structures are well reproduced, but with larger variance (Fig. 7a) since all spatial patterns are inferred from the MP statistics of the surrounding subsections rather than using prior information.When increasing w 0 , the connectivity of the spatial structures is degraded, but the facies proportions are closer to the reference (Fig. 7b).Finally, in the extreme case shown in Fig. 7c the connectivity of spatial structures is lost.Therefore, 0 ≤ w 0 ≤ 0.25 is a recommended range and the other three weights can be determined by the impor-tance (e.g., complexity or variety of patterns) of the sections in each direction.
For the other parameters involved in our algorithm, most of them are similar to the parameterization of DS, which have been tested thoroughly in Meerschman et al. (2013).However, 3DRCS allows larger initial values for the neighborhood size and the distance threshold because multiple grids are used so that these initial values are decreased when increasing the level of multiple grids.

Interaction between t, f , N cs and N max
In this section, we compare the interaction between two important parameters of DS (distance threshold t and fraction of training image to scan f ) and two new parameters presented in 3DRCS (number of cross sections N cs and maximum number of matched patterns from each training image Hydrol.Earth Syst.Sci., 22, 6547-6566, 2018 www.hydrol-earth-syst-sci.net/22/6547/2018/ N max ). Figure 8 shows the interaction between t, f , N cs and N max .Running our algorithm with f = 0.2 and t = 0.4 results in noisy realizations.This is not surprising since any patterns can be accepted even if they bear a large pattern distance d{N X , N Y }.Of course, the algorithm will be very fast under these parameters because the scan for training images will be stopped at the beginning.Meerschman et al. ( 2013) tested thoroughly for the parameterization of DS.In their test, when f = 0.5 and t = 0.2, the realizations are acceptable.However, here the results still contain a lot of noise since the local search strategy reduces the size of the actually scanned training images.As the increase of t, f , N cs and N max , the results become satisfactory.The recommended ranges of N cs and N max have been given in the above sections.In 3DRCS, it is advised to use f ≥ 0.8 and t ≤ 0.1.Compared to DS, more strict restrictions for t and f are adopted due to the local search strategy.Same as the effect of t and f , N cs and N max also control the computational ef-ficiency and the quality of simulations.Therefore, when setting the parameters, we should consider finding a balance between the quality of results and the computational cost.

Comparison of reproducing heterogeneities with existing methods
To verify the validity of the 3DRCS approach for reproducing heterogeneous structures, we compare it with two MPS implementations that use partial data: DS (Mariethoz and Renard, 2010) and s2Dcd (Comunian et al., 2012).As shown in Fig. 9, six cross sections extracted from a 3-D model of folds (180×150×120 voxels) (Mariethoz and Kelly, 2011) are utilized in this test.s2Dcd is a wrapper library that requires an external MPS engine.In order to ensure comparability, here DS is employed as the engine of s2Dcd.The detailed parameters are as follows: maximum search radius = 40, maximum number of points in a neighborhood = 40, distance threshold  Because the implementation of DS is parallel, we use four processors to carry out this test in DS and s2Dcd.Only one processor is used in 3DRCS because our implementation is not parallel.In Fig. 9, one selected realization for each method is presented.From their visual appearance, it looks like s2Dcd and 3DRCS have the similar performance for reproducing the patterns shown in 3-D reference and informed cross sections.Therefore, histograms, variograms and connectivity functions are used to further analyze the performance.Figure 10 shows the comparison of proportions of the facies for the realizations by using three MPS methods.A total of 20 realizations are performed for each method.It can be To further compare the models obtained using the three different MPS approaches, MDS plots are constructed by calculating the distance of MP histograms between all the realizations of the three approaches and a 3-D reference.The resulting MDS map is shown in Fig. 13 and it can be observed that the realizations of 3DRCS are closer to the reference in the MDS map than the results obtained by the other two approaches.In addition, kernel smoothing is used to estimate  the density distribution of the realizations of three different MPS approaches around the reference.The probabilities of the realizations are calculated from kernel density estimation by using Eq. ( 4) described in Sect.2.3.According to the reference model, the three different approaches have quite similar probabilities with 29 %, 33 %, and 38 % for DS, s2Dcd and 3DRCS, respectively.However, the 3DRCS approach still gains the highest probability.
In practice, there is no fully informed 3-D reference and we only have several informed cross sections.Thus, the statistical features of the reconstructions (e.g., variograms, connectivity functions and MDS plots) are close to the reference but no one can surpass it in the above test.However, these comparisons are still able to validate the reproduction of spatial patterns for the different MPS approaches.

Computational performance
Section 4.1.1 and 4.1.2have already analyzed the influence of the number of cross sections N cs and the maximum of matched patterns from each training image N max .Sec-tion 4.1.4tested the interaction between t, f , N cs and N max .The results indicated that the effect of t and f on the computational efficiency in 3DRCS is the same as in DS.The computational performance of other parameters has been assessed clearly by Meerschman et al. (2013).The weights of the probability aggregation formulas do not affect CPU time.
A comparison of computational performance between DS, s2Dcd and 3DRCS is presented in Fig. 14.Because 3DRCS is sensitive to the number of input cross sections, we offer two and four sections in each direction respectively, and the computational efficiencies when increasing the total number of grid cells are shown in Fig. 14a and b.Other parameters are the same as the test in Sect.3.2.Note that a different time axis is used for DS-based reconstruction because it uses much more CPU time than the other two methods, even though four processors are used for DS-based reconstruction.As shown in Fig. 14a, the 3DRCS approach presents better computational performance than DS-based 3-D reconstruction since the MP statistics are captured from a smaller domain composed of several 2-D sections in s2Dcd and 3DRCS.Because four processors are used in DS and s2Dcd, 3DRCS presents a speedup of about 4 compared to s2Dcd and about 120 compared to DS in this test (Fig. 14a).When increasing the number of cross sections, the search space is divided into more subdomains in 3DRCS so as to achieve a much better performance than s2Dcd and DS (see Fig. 14b).

Synthetic example: 3-D reconstruction of hydrofacies
To further demonstrate the applicability of our algorithm, an example from a real geological application is presented in this section.The Descalvado aquifer analog dataset (Fig. 15) depicts the complex hydrofacies of a small area (28 m × 7 m × 5.8 m) in Brazil (Bayer et al., 2015).In the original dataset, there are five cross sections derived from outcrops, which are marked by black lines in Fig. 15a.They are referenced in a 3-D domain consisting of 280 × 70 ×  58 voxels.These sections allow only two parts of subdomains to be created, which is insufficient for an application of 3DRCS.Therefore, we borrow the strategy of Gueting et al. (2017) to insert three additional sections into the yz direction using a sequential 2-D multiple-point simulation approach (s2Dcd).These sections are marked by blue lines in Fig. 15a.Then, all the tests are implemented on the basis of eight cross sections (three in the xz direction and five in the yz direction), which are shown in Fig. 15b.
Figure 16 shows realizations obtained by using three different MPS approaches on the basis of the abovementioned dataset.The white lines indicate the locations of informed sections in each realization.Note that an auxiliary variable along the z coordinate is used in s2Dcd and 3DRCS.It is a continuous variable to control the changing trend of the hydrofacies along the z coordinate and a detailed description is given by Comunian et al. (2012).To further reveal the performance of the different approaches, we use MDS maps to visualize the dissimilarity of MP histograms (Fig. 17), similarly to in Sect.4.2.However, here we use it to reveal the dissimilarity between all the reconstructed sections exacted from the realizations and the eight informed sections along the two directions, rather than different 3-D realizations.Thus, for each realization, 70 sections (67 reconstructed sections and 3 informed sections) from the xz direction and 280 sections (275 reconstructed sections and 5 informed sections) from the yz direction are used to draw the MDS maps along the two directions respectively.MDS is very appropriate to present the dissimilarity for this kind of application because we only have partial cross sections instead of an entire 3-D training image.Therefore, it is necessary to assess the dissimilarity between the reconstructed sections and informed sections.As shown in Fig. 15, the sections from the xz and yz directions are very different, such as the correlation lengths and the complexity of structures.Thus, we draw different MDS maps respectively for the xz and yz directions (Fig. 17a and b).Individual sections from the realizations are compared in Fig. 17c and d.Overall, it can be observed both visually and in the MDS maps that the sections obtained from 3DRCS are closest to the informed sections.3DRCS is able to reduce the nonstationarity effect of real geological data to a certain extent due to the local search strategy.As shown in the above analysis, the patterns in the informed cross sections are very complicated and the distribution of hydrofacies is anisotropic and nonstationary, espe-cially for the facies with a lower proportion.As illustrated in Fig. 18a, a local domain is surrounded by four segments from the informed cross sections.It should be noted that there is no facies 2 in any of the four segments.We extract the local parts from three realizations by using different MPS approaches.Then we check all the segments of the three local models, and we find that facies 2 is reproduced in this local area in the realizations of DS and s2Dcd.Three segments are randomly selected from the three local models, and they are shown in Fig. 18b where the boundaries of facies 2 are marked by red lines.Figure 18c shows the histograms of the four informed segments and the local models of 10 realizations for each MPS method.It can be observed that, although there is no facies 2 in the closest four segments, it is reproduced in this local area by DS and s2Dcd.Conversely, 3DRCS can maintain the distribution of facies well since all the MP statistics are captured from the surrounded subsections.If the surrounding subsections of a local area do not contain an attribute but it exists in other locations, the patterns with this attribute will not be moved to this local area in the 3DRCS approach.This indicates that 3DRCS allows the involvement of the nonstationary geological analogs in the 3-D real applications, and spatial patterns are restricted to a local domain so that they are not carried to faraway locations.
In the real-world applications, the geological sections or other analogs are not always straight or orthogonal.Therefore we need to project them in orthogonal directions.Figure 19 illustrates the process of projecting the tortuous sections to the parallel planes along a given direction.The same strategy can be used to address the issues in other directions.After that the original sections will be used as hard data and the projected sections will only be used as training images.Thus other scattered samples (e.g., boreholes, outcrops) also can be involved as hard data.

Discussion and conclusion
In this paper, we presented a novel method (3DRCS) for reconstructing 3-D complex heterogeneous structures by using partial lower dimensional data.Indeed, this is a very general issue since inferring high-dimensional patterns from low-dimensional data (e.g., boreholes, outcrops and other analogs) is a very common workflow for geologists.In practice, reliable 3-D models of complex geological structures are still difficult to construct due to the heterogeneity of geological phenomena and processes, even though there are many real geological analogs or sections that can be used.3DRCS makes it possible to reconstruct 3-D structures with MPS when no 3-D training image is available.The synthetic experiments and practical applications presented in this paper demonstrate the capacity to reconstruct such heterogeneous structures.
As compared to the previous MPS implementations that use partial data, the proposed method requires several local training subsections surrounding a simulated node, rather than a full section (Comunian et al., 2012) or points in a 3-D domain (Mariethoz and Renard, 2010).The local search strategy proposed in this paper allows more reliable MPS to be computed because it avoids spatial patterns from faraway locations being considered in the simulation of the current node.In this strategy, the original cross sections are divided into many subsections according to their spatial relationships.Therefore, the nonstationarity of real geological analogs is reduced to a certain extent because the training patterns cannot be borrowed from further than a local subdomain.Of course, besides cross sections, other scattered samples also can be included as hard data.
Moreover, 3DRCS increases the computational efficiency compared with existing MPS methods.The local search strategy allows MP statistics from the local subsections to be acquired so that the searches are significantly reduced.Its good computational performance makes it potentially applicable to real 3-D modeling problems such as porous media, hydrofa-cies, reservoirs and other complex sedimentary structures.In addition, a new parameter, the maximum of matched patterns from each training image, is adopted to avoid the unnecessary searches.The experimental results demonstrated that a reasonable choice for this parameter can not only ensure to capture a stable CPDF, but also gain a further performance speed-up.
The method presented here retains many advantages of DS (Mariethoz et al., 2010), such as unnecessary storing for MP statistics, pattern distances and a flexible neighborhood.Nevertheless, we propose an adaptive and flexible implementation of the search template on multiple grids where the radius of the neighborhood, the distance threshold and the size of data events decrease linearly with the rising of levels of multiple grids.As a result, a big data event is divided into several small parts placed on the different grids, which results in a smaller neighborhood on each grid.An acceptable distance threshold is assigned to the first grid to make it easier to obtain a stable CPDF and to capture the large-scale features from the original sparse samples.For the last grid, the radius of neighborhood is reduced to one and the highest criterion is carried out for the threshold (i.e., t = 0), which avoids the small-scale features or lower proportion facies are filtered out.Hence, the simulation of each multiple grid is simulated with different parameters, allowing for flexibility in simulating different structures at different scales.
Another important advantage of 3DRCS is the probability aggregation strategy in which the combinations of two different formulas are used to combine the CPDFs from different subsections.First, an additive aggregation method (linear pooling formula) is used to combine two disjunctive probability distributions from each pair of parallel subsections to obtain a more stable PDF.The weights of this step are related to the distances between the current location and the two parallel subsections.Such parameterization is able to ensure the pattern trend changing from one subsection to another one.And then, we aggregate the orthogonal PDFs and prior probability distribution by using a multiplicative method, the log-linear pooling formula.This step can enhance the capability for reconstructing connectivity of spatial patterns in comparison with the method using a series of 2-D MPS simulations to fill a 3-D domain along given orthogonal directions (Comunian et al., 2012).
The limitations of the 3DRCS method come from the fact that it is not always possible to obtain abundant sections in each direction, and extremely small local blocks cannot offer enough spatial patterns; thus, a minimal subsection size has to be considered.In addition, 3DRCS is not able to perform the simulation of continuous variables.The proposed method can be further improved to overcome these limitations.Another possible direction is to parallelize the proposed MPS implementation and further enhance its computational performance.

Figure 1 .
Figure 1.Local subsections divided by their spatial relationships and the corresponding training images.(a) Six cross sections in a 3-D domain: two sections along each direction; (b) two local domains: a central cube and a corner cube; (c) corresponding subsections (training images).

Figure 3 .
Figure 3.An example of multiple grids and the corresponding neighborhoods, search radius R and distance threshold t.
Hydrol.Earth Syst.Sci., 22, 6547-6566, 2018   www.hydrol-earth-syst-sci.net/22/6547/2018/ est training images in the other two directions and they are used to determine the current subsection (training image).A new parameter, the maximum number of matched patterns from the training image N max is adopted to avoid unnecessary searches.For some small neighborhoods, especially in the last multiple grid, the CPDF will rapidly stabilize with the increasing number of matched patterns.4Parameterization and performance analysisIn this section, we apply 3DRCS on several synthetic cases where the cross sections are extracted from existing 3-D references.Using these examples, we perform a parameter sensitivity analysis and compare it with two widely used meth-ods, DS-based 3-D reconstruction(Mariethoz and Renard,  2010)  and s2Dcd(Comunian et al., 2012).The workflows and algorithms proposed in this work are developed in the C++ programming language.All experiments presented in this paper are implemented on a laptop computer with Intel 4-Core i5-62000U Quad-core CPU, 8 GB RAM and 64 bit Windows 10.

Figure 4 .
Figure 4.A sample of Berea sandstone from Okabe and Blunt (2007) is used as a 3-D reference (100 × 100 × 100 voxels).The crimson color represents pores and the yellow color represents a matrix.The porosity of this model is 20.33 %.

Figure 7 .
Figure 7. Three realizations obtained by varying the weights of the probability aggregation formulas.Three sections in each direction are used and other parameters are same with the test of Fig. 5.

Figure 8 .
Figure 8. Interaction between t, f , N cs and N max .These first sections in three directions of each realization are presented.The porosity, CPU time and the number of geobodies are the average of 10 realizations.

Figure 9 .
Figure 9. Realizations of three different MPS reconstruction methods.

Figure 10 .
Figure 10.Proportions of the facies for 20 reconstructions by using three MPS methods.The black and red horizontal lines represent the proportions of facies in the 3-D reference and the cross sections used as training images respectively.

Figure 12 .
Figure 12.Comparison of the connectivity functions in three directions with three MPS methods.

Figure 13 .
Figure 13.MDS representation for 20 realizations of each MPS method.

Figure 14 .
Figure 14.Comparison of computational performance between DS, s2Dcd and 3DRCS when increasing the size of output grid: (a) two cross sections and (b) four cross sections in each direction.Note that different time axes are used in the two subplots, and four processors are used for DS-based reconstruction and s2Dcd but only one for 3DRCS.

Figure 15 .
Figure 15.Descalvado aquifer analog dataset (Bayer et al., 2015).(a) 3-D presentation of the informed cross sections: three sections in xz direction, and five sections in yz direction; (b) 2-D presentation of these informed sections.

Figure 16 .
Figure 16.Three realizations using three different MPS approaches: (a) DS, (b) s2Dcd with the coordinate z as auxiliary variable and (c) 3DRCS with the coordinate z as auxiliary variable.

Figure 17 .
Figure 17.MDS maps of sections extracted from realizations using three different MPS approach.(a) MDS map of 70 sections for each realization along the xz direction; (b) MDS map of 280 sections for each realization along the yz direction; (c) selected sections for each method according to the JS divergence in the xz direction; (d) selected sections in the yz direction.

Figure 18 .
Figure 18.Comparison of reproduction of nonstationary patterns.(a) A local domain and the four corresponding segments; (b) three selected segments from the realizations obtained using different approaches in the local area; (c) histograms of the four informed segments and the local models of 10 realizations for each MPS method.

Figure 19 .
Figure 19.Process of projecting real-world sections to parallel planes along a given direction: the process in (a) 3-D space and (b) the xy plane.

Table 1 .
Comparison of the performance of the tests in Fig.5.All the statistics are the averages of 20 realizations.

Table 2 .
Comparison of the performance for 20 realizations with three sections in each direction, varying the maximum of matched patterns from each training image N max .Other parameters are fixed and are the same as those in the test in Fig. 5.All the statistical values are the mean of 20 realizations.∞ represents no constraint for N max .