Articles | Volume 28, issue 7
Research article
09 Apr 2024
Research article |  | 09 Apr 2024

Identification of parameter importance for benzene transport in the unsaturated zone using global sensitivity analysis

Meirav Cohen, Nimrod Schwartz, and Ravid Rosenzweig

One of the greatest threats to groundwater is contamination from fuel derivatives. Benzene, a highly mobile and toxic fuel derivative, can easily reach groundwater from fuel sources and lead to extensive groundwater contamination and drinking water disqualification. Modelling benzene transport in the unsaturated zone can quantify the risk for groundwater contamination and provide needed remediation strategies. Yet, characterization of the problem is often complicated, due to typical soil heterogeneity, numerous unknown site and solute parameters, and the difficulty of distinguishing important from non-important parameters. Thus, sensitivity analysis (SA) methods, such as global SA (GSA), are applied to reduce uncertainty and detect key parameters for groundwater contamination and remediation. Nevertheless, studies devoted to identifying the parameters that determine transport of fuel derivatives in the unsaturated zone are scarce. In this study, we performed GSA to assess benzene transport in the unsaturated zone. First, a simple GSA (Morris) screening method was used for a homogenous sandy vadose zone. Then, a more computationally demanding (Sobol) variance-based GSA was run on the most influential parameters. Finally, the Morris method was tested for a heterogeneous medium containing clay layers. To overcome model crashes during GSA, several methods were tested for imputation of missing data. The GSA results indicate that benzene degradation rate (λk) is the utmost influential parameter controlling benzene mobility, followed by aquifer depth (z). The adsorption coefficient (Kd) and the van Genuchten n parameter of the sandy soil (n1) were also highly influential. The study emphasizes the significance of λk and the presence of clay layers in predicting aquifer contamination. The study also indicates the importance of heterogenous media representation in the GSA. Though identical parameters control the transport in the different soil types, in the presence of both sand and clay, parameters directly affecting the solute concentration like λk and Kd have increased influence in clay, whereas n is more influential for sand comprising most of the profile. Overall, GSA is demonstrated here as an important tool for the analysis of transport models. The results also show that in higher dimensionality models, the radial basis function (RBF) is an efficient surrogate model for missing data imputation.

1 Introduction

Petroleum products such as petrol and diesel are among the most abundant chemicals of ecological concern used nowadays. During petroleum exploration, production, transport, and storage, petroleum products often find their way to the environment by accidental leaks and spills (Logeshwaran et al., 2018; López et al., 2008; Nadim et al., 2000). Consequently, groundwater is often polluted by surface sources, posing a substantial potential risk to potable water worldwide (López et al., 2008; Nadim et al., 2000; Logeshwaran et al., 2018; Reshef and Gal, 2017; Kessler, 2022). Since petroleum substances in general, and fuel components in particular, are considered toxic, carcinogenic, and mutagenic (Logeshwaran et al., 2018), strict regulations limit their maximum allowed concentration in groundwater to the parts-per-billion level (U.S. EPA, 2018).

Fuel products are usually comprised of different types of hydrocarbons. Fuel compounds like benzene are among the most commonly found groundwater pollutants (Schmidt et al., 2004; Logeshwaran et al., 2018). Specifically, benzene is highly soluble and thus one of the most mobile fuel constituents in the subsurface (Farhadian et al., 2008). In the US alone, about 9×105kg was reported to be released into the terrestrial and aquatic environment by the petroleum industries between 1987 and 1993 (Fan et al., 2014). In Israel, benzene was detected in 60 % of all sites monitored for fuel contamination (Reshef and Gal, 2017). Benzene's low maximum acceptable concentration of 5 µg L−1 in drinking water (Farhadian et al., 2008; Kessler, 2022) raises great concerns that benzene leakages into groundwater may disqualify extremely large volumes of drinking water. Most fuel-related contamination reaches groundwater from or near the soil surface (Troldborg et al., 2009). Therefore, it is important to assess the risk to groundwater from soil contamination and to understand the fate and transport of fuel components travelling from the soil surface, through the unsaturated zone, and down to the groundwater.

Since actual water flow and contaminant transport in the subsurface are difficult to measure and predict, mathematical models are used to solve such transport problems (Bear and Cheng, 2010). Many studies have been performed to examine the fate and transport of petroleum hydrocarbons, and specifically of benzene, mostly in saturated homogeneous porous media (Lu et al., 1999; Brauner and Widdowson, 2001; Choi et al., 2005, 2009). Few studies have also assessed the movement of benzene in unsaturated porous media (Berlin et al., 2016; Berlin and Suresh, 2019; Troldborg et al., 2009; Ciriello et al., 2017). Benzene transport models typically combine Darcy-type water flow; advective–dispersive transport; and a source/sink term considering various physical, chemical, and biological processes including sorption, dissolution, and biodegradation (Mohamed and Sherif, 2010). Yet, the typical heterogeneity of the subsurface environment and the difficulty in obtaining sufficient relevant physical and bio-geochemical characterizations of the site lead to high uncertainty of these parameters (Ciriello et al., 2017; Tartakovsky, 2007). Soil heterogeneity and layering, for example, have been shown to considerably affect contaminant transport in some studies (Rivett et al., 2011; Chen et al., 2019), while in others their effect seemed negligible (Botros et al., 2012; Akbariyeh et al., 2018). This of course depends on the model scale, type of system (natural or irrigated), and type of contaminant tested. Also, soil and contaminants' parameters can be estimated using various methods, such as laboratory measurements, scaling, and inverse modelling (Botros et al., 2012; Akbariyeh et al., 2018; Berlin et al., 2016; Berlin and Suresh, 2019; Troldborg et al., 2009; Ciriello et al., 2017). This adds another aspect of uncertainty to the model. Thus, sensitivity analysis (SA) is required to determine the contribution of the individual input parameter to the uncertainty of the model output (Song et al., 2015). More specifically, SA can determine which are the non-influential input parameters that are redundant or can be fixed and reveal the order of parameter importance and the magnitude of parameter interactions (Razavi et al., 2021).

Various SA methods are available; they can generally be classified as local sensitivity analysis (LSA) or global sensitivity analysis (GSA) methods. LSA methods are mostly “one-at-a-time” (OAT) methods (Saltelli and Annoni, 2010; Razavi et al., 2021). These methods are based on changing the uncertain input parameter by a specific interval several times around a “local point” in the problem space (Saltelli and Annoni, 2010; Razavi et al., 2021). The difference or the derivative of the output compared with the base-case output is then tested. LSA methods are usually simple and efficient in analysing simple models, but they are less suitable for multiple-parameter, non-linear, and non-additive models. This is because derivatives are informative at the base point where they are computed but do not enable exploration of the rest of the input parameter space (Saltelli and Annoni, 2010). Moreover, OAT methods are efficient in finding the most influential input parameters but cannot rank the influence of the input parameters or measure parameter interactions (Saltelli and Annoni, 2010). In GSA, on the other hand, the input parameters are changed over the entire sampling space, and the variance or probability distribution of the output is tested rather than the derivative. Variance-based GSA methods are most commonly used since they are conceptually simple and easy to implement (Upreti et al., 2020; Khorashadi Zadeh et al., 2017; Jaxa-Rozen et al., 2021; Saltelli et al., 2010, 2004; Sobol, 2001; Song et al., 2015; Nossent et al., 2011; Brunetti et al., 2016, 2017). Yet, when the model output is highly skewed or multi-modal, the variance may not adequately represent output uncertainty (Liu et al., 2006; Borgonovo, 2007). Therefore alternative methods, such as moment-independent (Liu et al., 2006; Borgonovo, 2007) and moment-based (Dell'Oca et al., 2017) methods, were developed using the output probability density function (PDF) to fully characterize the output uncertainty. In some studies, PDF methods were shown to perform better for ranking the importance of parameters, and though highly influential parameters were usually common, the ranking of these parameters was similar (Wang and Solomatine, 2019; Upreti et al., 2020; Khorashadi Zadeh et al., 2017), and in some cases variance-based methods were preferred (Upreti et al., 2020). In addition to finding the most influential input parameters, GSA can rank the parameters' influence and their interactions (Saltelli and Annoni, 2010; Razavi et al., 2021). Yet, the main drawback of GSA methods is their computational cost.

Most hydrological models have numerous parameters resulting in high-dimensional and non-linear problems. Therefore GSA methods are usually recommended in hydrological modelling (Song et al., 2015). Song et al. (2015) recommended GSA application before final modelling to better understand the model and its dominating parameters and as a tool to reduce the model parametric dimensionality. In their review, Song et al. (2015) identified three main “hot spots” in GSA application with hydrological modelling, that are relevant to many other disciplines. The three hot spots are as follows:

  1. Computational cost and subsequent meta-modelling are used instead of running the models multiple times, where the reliability and goodness of fit of meta-models should be explored.

  2. Selecting an appropriate GSA method, monitoring the convergence, and estimating the uncertainty of the GSA results are important for hydrological models.

  3. GSA methods involve many hypotheses or have other limitations, including the independence of input variables, where in practice, the parameters employed by hydrological models usually have interactions or correlations that need to be considered.

So far, LSA/GSA has rarely been applied for contaminant transport in the unsaturated zone (Davis et al., 1994; Gatel et al., 2019; Gribb et al., 2002; Pan et al., 2011; Rockhold, 2018; Ciriello et al., 2017), where most studies have considered a homogenous media. Specifically for benzene, few local sensitivity analyses have indicated the degradation and adsorption coefficients as the most important parameters for benzene transport in the unsaturated zone (Gribb et al., 2002; Zanello et al., 2021). In one GSA performed to assess the risk of benzene contamination of groundwater (Ciriello et al., 2017), the porosity and hydraulic conductivity of the media were found to be the dominant parameters for the model uncertainty. Yet in that study, properties related to benzene itself (such as its degradation rate or adsorption coefficient) were treated as deterministic quantities and were not tested by the GSA (Ciriello et al., 2017). Moreover, we are not aware of any SA performed for benzene transport in layered heterogenous unsaturated media. Hence, more research is needed to understand the key parameters controlling contaminant transport in the subsurface in general, and for benzene, in particular, to improve prediction and mitigation of groundwater contamination.

The objective of this study was to assess the specific impact of each of the multiple parameters that affect benzene transport in the unsaturated zone. For that purpose, a mechanistic model was used to simulate the transport of benzene in an unsaturated zone representing Israel's coastal plain vadose zone. Simulations of both homogenous (sand) and heterogeneous (sand with clay layers) vadose zones were conducted. Two GSA methods were tested for the homogenous media simulations, to analyse the parameter importance: the Morris method (Morris, 1991), a reliable, computationally cheap alternative to variance-based GSA, and the Sobol method (Sobol, 2001), a computationally heavy and variance-based method and probably the most well-established and widely applied type of GSA (Khorashadi Zadeh et al., 2017; Jaxa-Rozen et al., 2021; Saltelli et al., 2010, 2004; Sobol, 2001; Song et al., 2015; Nossent et al., 2011; Brunetti et al., 2016, 2017, and many more) (see Sect. 2.3 for a description of these methods). The heterogeneous media simulation was tested by the Morris method, where the effect of the parameters of both soil types was tested as well as the clay layers' distribution.

A common though usually overlooked problem in GSA application is that some of the model runs do not converge but crash due to numerical instability and the assignment of random sets of parameters of different values (Razavi et al., 2021; Sheikholeslami et al., 2019). Owing to the novelty of GSA in hydrological research, there is not one agreed and established way to deal with these missing data, and the information in the literature is still scarce (Sheikholeslami et al., 2019). Consequently, and as part of the overall analysis done in this study, we tested several methods for missing data imputation in cases where the model does not converge or crashes.

2 Materials and methods

A mechanistic model was generated to investigate the potential transport of benzene in the vadose zone underlain by Israel's coastal plain aquifer. The model was applied for both homogenous and heterogeneous media. For simplicity, initial runs solely included a homogenous sandy soil profile, as sand and sandstone are the main constituents of Israel's coastal plain aquifer (Kurkar Group; Turkeltauub, 2011). Yet, in most natural environments, the soil profile is non-homogenous, containing clay layers and other materials. In a study conducted in Israel's coastal plain vadose zone, Rimon et al. (2007) reported that the occurrence of different soil materials, and specifically clay interbeds, strongly affects flow patterns to the aquifer. Therefore, later model runs included clay interbeds.

2.1 Mechanistic model and input parameters

The one-dimensional mechanistic model included water flow, solute transport, biodegradation, adsorption, and volatilization. Water flow in the unsaturated zone was modelled by a modified form of Richard's one-dimensional equation:

(1) θ t = z K h z + 1 ,

where h is the water matric head [L], θ is the volumetric water content, t is time [T], z is the vertical coordinate [L] (positive upward), and K is the unsaturated hydraulic conductivity function [L T−1] given by

(2) K ( h , z ) = K s ( z ) K r ( h , z ) ,

where Kr is the relative hydraulic conductivity [–] and Ks the saturated hydraulic conductivity [L T−1].

Table 1Constant input parameters for the model.

Download Print Version | Download XLSX

The unsaturated soil hydraulic properties θ(h) and K(h) are described by the van Genuchten–Mualem formulation (Mualem, 1976; van Genuchten, 1980):



(6) m = 1 - 1 / n , n > 1 .

In Eqs. (3)–(6), θs is the saturated water content and θr is the residual water content. α [L−1], n, and m are the van Genuchten fitting parameters; Se is the effective saturation, and l is the pore-connectivity parameter (Eq. 4).

Solute transport was described by the advection–dispersion equation:

(7) θ c t + ρ s t + a v g t = z θ D w c z + z a v τ g D g g z - q c z - λ k θ ,

where c, s, and g are solute concentrations in the liquid [M L−3], solid [MM−1], and gaseous [M L−3] phases, respectively. ρ [M L−3] is the solid-phase bulk density. Dw is the dispersion coefficient in the liquid phase [L2 T−1] given by Bear (1972) as

(8) θ D w = α L q + θ D M τ w ,

where DM is benzene's molecular diffusion coefficient in the aqueous phase [L2 T−1], q is the absolute value of the Darcian fluid flux [L T−1] evaluated using the Darcy–Buckingham law, and q=-Kdhdz+1. αL is the longitudinal dispersivity [L], and τw and τg are tortuosity factors in the liquid and gas phase, respectively [–], evaluated using the relationship described by Millington and Quirk (1961). av is the air content [L3 L−3], Dg is the benzene molecular diffusion coefficient [L2 T−1] in the gas phase, and λk is a first-order rate biodegradation constant for benzene in the liquid phase [T−1] (solid- and gas-phase degradation was assumed negligible).

Benzene adsorption was assumed linear (Wołowiec and Malina, 2015; Baek et al., 2003) of the form s=Kdc, where Kd is the distribution coefficient [L3 M−1] (see Table S2 in the Supplement for literature values).

The gaseous-phase (g) and aqueous-phase (c) concentrations in Eq. (7) are related by a linear expression of the following form:

(9) g = k g c ,

where kg is an empirical constant [–] equal to (KHRuTA)−1 (Stumm and Morgan, 1981), in which KH is Henry's law constant [M T2 M−1 L−2], Ru is the universal gas constant [M L2 T−2 K−1 M−1], and TA is the absolute temperature [K].

The values of θs, θr, l, DM, Dg, KH, and ρ were kept constant in the model and are listed in Table 1. DM, Dg, and KH are constant properties for benzene and were therefore not changed. The pore-connectivity parameter l in the hydraulic conductivity function was estimated to be about 0.5 as an average for many types of soils (Mualem, 1976). The range of θs, θr, and ρ values in the literature was limited (Carsel and Parrish, 1988; Domenico and Schwartz, 1990; Gribb et al., 2002; Schaap et al., 2001). In a local sensitivity analysis of benzene transport in the vadose zone, Gribb et al. (2002) found that ρ is an insignificant parameter, and θr is only significant in pure clayey soils.

Table 2Input parameters and their range used for GSA.

Download Print Version | Download XLSX

The sensitivity of the model to the values of α, n, Ks, αL, λk, and Kd was tested in the GSA. The range of tested values along with the corresponding references can be found in Table 2. Specifically, we found that λk values greatly vary between different studies, mainly due to the differences in experimental conditions and aquifer characteristics (see Table S1 in the Supplement for literature values). Though the highest λk value we encountered in the literature was 174 (d−1) (Lahvis et al., (1999); Table S1), we set the upper limit of λk to 1.5 (d−1) (Table 2). This was done for two main reasons:

  • a.

    From an early stage it was evident that λk is a very influential parameter and high values mostly resulted in output values of zero, thereby lowering the overall sensitivity.

  • b.

    The Morris analysis takes the range and divides it into a given number of levels (four or six, in our case). Since the range of λk included values spanning over 4 orders of magnitude (1 × 10−2–1 × 102 (d−1)), much of the range would have been missed by the analysis.

2.2 Model domain and boundary conditions

The profile depth (z) was set as a variable input parameter in the range of 5–50 m (Table 2). This range was obtained from a dataset of fuel-contaminated sites of Israel's coastal plain (see Fig. S1 in the Supplement) from Israel's Ministry of Environmental Protection and reported by Israel's Water Authority (Reshef and Gal, 2017). In runs that tested the occurrence of clay layers, the thickness (b) and number (N) of clay layers were additionally tested as variable GSA input parameters (Table 2).

An upper atmospheric boundary condition (BC) was set at the top of the profile with average daily precipitation and potential evaporation data from the Beit Dagan meteorological station for 2019 (Fig. 1). Potential pan evaporation data were converted to Penman–Monteith potential evaporation by multiplying the data by monthly coefficients obtained for the Israeli coastal plain (Gal et al., 2012). On days when evaporation data were not available, a monthly averaged evaporation value of the available data for the specific month was used as input. At the bottom boundary, where the aquifer was positioned, a Dirichlet BC of constant matric head (h=0) was set.

Figure 1Daily precipitation and potential evaporation data of Beit Dagan meteorological station for 2019 – set as the upper BC of the model.


For the first 10 years, only water flow was considered, to enable stabilization of the hydraulic conditions in the profile and establish annual periodic conditions. According to our tests, stabilization takes about 4 years. Hence, benzene was introduced following 10 years.

For the solute transport, an upper Dirichlet BC prescribing benzene saturation concentration (1.77 kg m−3 – solubility of benzene in water at 25 °C; Stewart, 2010) was set to mimic a constant fuel lens on the surface. A bottom Neumann BC of zero concentration gradient was set, enabling free drainage to the aquifer.

The model was run using the Hydrus-1D software package (Šimùnek et al., 2013), a finite element model for simulating the one-dimensional movement of water and solutes in variably saturated media. In the homogenous media analysis, the soil profile was divided into 51 equal nodes. Yet when clay layers were introduced, a higher resolution was required to represent the heterogeneity of the profile and the thinner layers, as compared to the whole profile. In these runs, the profile was divided such that the total number of nodes was equal to ((z20)+1). Clay layers were assigned in the profile according to the number of clay layers (N) and their thickness (b), such that they were equally distributed in the profile, generating alternating sand and clay layers. Each of the layers of both clay and sand was divided into (b⋅20) nodes.

Heterogeneous media analysis included clay layers within the sandy soil to obtain a more realistic representation of Israel's coastal plain vadose zone, mostly comprising sandy soil but also including clay layers and interbeds (Ecker, 1999).

Table 3Summary of the distribution of clay layers.

Download Print Version | Download XLSX

To create a representative configuration of the clay layers in the vadose zone above Israel's coastal plain aquifer, we examined the distribution of clay layers in selected fuel-contaminated sites. For that purpose, we constructed a database consisting of records obtained from the Israel Ministry of Environmental Protection from 32 fuel-contaminated sites containing dozens of monitoring boreholes. Each borehole in the database was sampled at multiple depths and characterized for the soil type. We classified these soil types into four main categories: gravel, sand, clayey sand (consisting of 55 % sand and 45 % clay), and clay, according to the soil type name on the database (see Table S9 in the Supplement for the categories). For each site, the percentage of each of the four soil types at each specific depth was extracted (i.e. the number of boreholes having a given soil type at a specific depth divided by the total number of boreholes penetrating that depth). We then recorded the percentage of clay at each depth; layers with a clay percentage higher than 25 % were considered clayey (Whiting et al., 2011). This yielded the number of clay layers and their thicknesses for each site (an example of one site is presented in Table S10 in the Supplement). Based on this methodology, the distribution of clay layers in contaminated sites at Israel's coastal plain vadose zone was calculated (Table 3).

Table 3 shows that only in 1 of the 32 examined sites had no clay layers at all. A proportion of 50 % of the sites only had one clay layer, and most of the sites had one to three layers ( 78 %), whereas almost 20 % had four to six layers. The mean of the layers' thickness ranged from 1–4 m. However, the standard deviation was high, and the actual thickness ranged from 11.7 to 0.1 m. Due to this variance in the distribution of clay layers (Table 3), it was decided to examine the number of clay layers (N) and their thickness (b) as additional input parameters in the sensitivity analysis of the heterogeneous media within the range of values reported in Table 3. The range of tested N and b can be found in Table 2. We are not aware of other studies that tested the distribution of clay layer interbeds in a SA for contaminant transport. However, Dai et al. (2017) tested the spatial distribution uncertainty of other parameters in a GSA, such as the elevations of the contact between aquifer and aquitard, the hourly head boundary conditions, and the hydraulic conductivity field.

The model was run for a total of either 100 years for the Morris analysis or 60 years for the Sobol analysis (in this case, the number of years was reduced to lower the overall high computational cost of the Sobol analysis). At the end of each run, the benzene concentration in the aquifer and the total flux to the aquifer were examined.

2.3 Global sensitivity analysis

2.3.1 The Morris method

The Morris or the elementary effect (EE) method was introduced by Morris (1991) and improved by Campolongo et al. (2007). It can be viewed as an extension of the OAT method, since it randomly generates sets of reference values from the entire parameter space and computes the difference of output (EE) caused by a fixed parameter change, altering only one parameter at a time. However, it can also be viewed as a GSA method, since it averages multiple EEs computed at different points in the parameter space. This method provides qualitative sensitivity measures (i.e. ranking the input parameters in order of importance); however it does not quantify the relative importance of each given parameter (Saltelli et al., 2004).

In the Morris method, each input parameter (xi, where i=1,,k), is assumed to vary across p selected levels in the space of the input parameter. The parameter space is normalized to a uniform distribution in [0,1] and partitioned into (p−1) equal sections. The algorithm starts at a randomly chosen point in the k-dimensional space and creates a trajectory (or a path) through the k-dimensional variable space. Each parameter is randomly chosen from the set (p−1) sections, and a fixed increment Δ (a multiple of 1/(p-1)) is added to each parameter in random order to compute an EE of each parameter, where EE is the difference of output y caused by the change Δ in the respective parameter. The EE for the ith input parameter can be described as

(10) EE i ( x 1 x k ) = Y ( x 1 , x 2 , , x i + Δ i , , x k ) - Y ( x 1 , x 2 , , x i , , x k ) Δ i .

Changing each parameter once from one set of reference values completes one path, which together with the base case requires (k+1) simulations. Conducting simulations over multiple paths produces an ensemble of EEis for each parameter. The number of required runs is then r(k+1), where r is the number of paths or trajectories. All EEi values computed for randomly chosen paths are used to compute two final sensitivity measures μi and σi (Campolongo et al., 2007):

(11) μ i = 1 r j = 1 r | EE i , j | ,

where μi is the mean of absolute values of the EEi. μi can be regarded as a global sensitivity index, since it represents the average effect of each parameter over the parameter space. Thus, it is used to identify influential and non-influential parameters.

The second measure σi is the standard deviation of the EEi:

(12) σ i = 1 r j = 1 r EE j , i - 1 r j = 1 r | EE j , i | 2 .

It is used to identify non-linear and/or interaction effects.

The review by Song et al. (2015) reported that in different studies, the number of paths (r) varies from 20 to 1250 paths, representing a total of 280 to 40 000 numerical simulations, with an average of 500 paths. Both Brunetti et al. (2018) and Turco et al. (2017) combined the Hydrus model (1D and 2D, respectively) with the Morris method. Brunetti et al. (2018) set r=100 for a total of 1700 simulations, and Turco et al. (2017) set r=8 for a total of 40 simulations. In this study we set r=250, considering that the data will be further analysed by the Sobol GSA. This gave us a total number of 3000 and 4000 simulations for the analysis with and without clay layers, respectively.

2.3.2 The Sobol method

While the OAT and the Morris sensitivity methods are difference-based, the Sobol–Saltelli method is variance-based (Saltelli and Annoni, 2010). Variance-based methods are used to quantitatively identify both the importance of individual model parameters and parameter interactions. The Sobol method is based on a decomposition of the total model variance into two main elements: variance of the individual parameter and variance due to interaction with other parameters (Sobol, 2001). Decomposition of the model variance can be written as follows (Saltelli et al., 2004):


where V stands for the total variance of the model output. Vi is the variance of each input parameter xi, and E(Y|xi) represents the mean of the system response Y when the parameter xi is fixed at different values. Vij represents the variance due to interactions between two parameters xi and xj, and V1…k describes the variance among k parameters. These elements, represented by Sobol's sensitivity indices (SIs), provide quantitative information about the variance associated with a single parameter or related to interactions of multiple parameters. The main sensitivity index or the first-order sensitivity index Si quantifies the main effect of parameter xi on the total variance of Y, excluding the interactions with other parameters:

(16) S i = V i V ( Y ) .

The total-order sensitivity index STi of a single parameter xi includes both the parameter's main variance effect and the proportion of the variance due to interactions of xi with the other parameters:

(17) S T i = S i + i j S i j + + S 1 k .

The values of the indices vary from 0 to 1, where 0 indicates no influence and 1 indicates a strong influence on the variance.

Parameter spaces were sampled using the Sobol quasi-random, cross-sampling strategy (Sobol, 2001). Rather than generating random numbers, this technique generates a uniform distribution in the probability space. The distribution appears qualitatively random, but sampling only takes place in regions of the probability function that were not previously sampled.

To assess the accuracy of the Sobol indices, confidence intervals of the indices should be constructed. The analytical procedure for confidence interval calculation involves repeating the r(2k+2) model runs several times, which was too time consuming and computationally demanding in this case. Therefore the bootstrapping approach was used instead (Efron and Tibshirani, 1986). Archer et al. (1997) suggested using bootstrap confidence intervals to produce confidence intervals of complicated data structures. The bootstrapping approach is based on resampling the parameter space of the data that are already available many times with replacement (randomly selecting values and allowing for duplicates) and constructing a distribution of the output (Efron and Tibshirani, 1986). Here, resamples were taken from the existing dataset Y with replacement, and the indices' values were recalculated. The result is an estimate of the mean and variance of each of the indices and allows us to calculate the confidence interval. The method thus relies on computational cost rather than on an analytical cost (running the model again). Here, the samples used for the model evaluation were resampled 1000 times with replacement, and 95 % confidence intervals were constructed (Archer et al., 1997).

Still, confidence intervals for the first-order indices (S1) using the Sobol sampling method gave values of more than 100 %. This was also observed by Brunetti et al. (2016) and Hartmann et al. (2018), who also studied transport in unsaturated media. This may be a result of insufficient sample size, since Sobol's convergence requires a very large sample size (Saltelli et al., 2004). Therefore, we extracted S1 values using the delta method developed by Plischke et al. (2013), by calculating S1 values from given data through emulators and bootstrapping rather than running the model itself multiple times.

More details on the Sobol sample size can be found in the Supplement.

The Morris and the Sobol sensitivity analyses were executed using the Python programming language and, specifically, the Sensitivity Analysis Library (SALib) (Herman and Usher, 2017). SALib uses a Python script that overwrites the input parameters given by the GSA in the relevant Hydrus-1D model input files. The script then executes the model and returns the final aquifer solute concentration and the total solute flux to the aquifer at the end of the simulation. This procedure is repeated for the number of runs set for the GSA, with changes in the input parameters according to the sampling technique. SALib then computes the indices of the two methods: μi and σi and their confidence interval for the Morris method and the Sobol indices and their confidence interval for the Sobol method. Both Morris and Sobol methods have already been applied with the Hydrus software package by Périard et al. (2013; Hydrus-2D/3D), Brunetti et al. (2016, 2018; Hydrus-1D), Turco et al. (2017; Hydrus-2D), Hartmann et al. (2018; Hydrus-2D), and others.

2.4 Treatment of missing output data

Owing to the arbitrary choice of input parameters and multiple model runs, the model sometimes does not converge and crashes. This makes data analysis problematic because sampling order is important for the analysis results of most GSA methods. Since there is not yet an agreed and established way of handling these missing data (Sheikholeslami et al., 2019), we tested the following methods:

Missing data removal. This method consists of removing the missing data points, or by removing full trajectories, so that the order of sampling within trajectories remains undisturbed. However, by removing data, valuable information can be lost. In addition, in most methods, removing data may render the entire sample since it no longer follows the sampling sequence and data structure. Thus, this study only tested value removal for the Morris method where full trajectories can be removed.

Missing data imputation. A missing value is replaced with some other value. The following missing data imputation approaches were tested.

  1. Constant value substitution is an easy and computationally cheap method for imputing missing data. The missing data can be replaced with zeros in cases where the output is typically near zero or with the mean or the median, in cases where the distribution is skewed. Sheikholeslami et al. (2019) for example, used the median substitution technique for a rainfall-runoff model and a land surface hydrology model. A shortcoming of this replacement methods is the potential for reducing the variance and distorting other statistical properties of the output (Sheikholeslami et al., 2019). In this study, both the zero and the median substitutions gave similar final GSA indices, with slightly different confidence intervals. Therefore, only the zero substitution results are presented.

  2. The K nearest-neighbour (KNN) substitution technique uses neighbourhood observations to fill in missing data. The underlying rationale behind the KNN-based techniques is that the sample points closer to xi should provide better information for imputing the failed output, where xi is an input parameter vector for which a simulation model fails to return an outcome. In the KNN method, the failed output is replaced by a response value of a weighted average of the K (the number of samples) nearest neighbours. The KNN algorithm computes the distance of the test observation to every observation in the K nearest neighbours and then imputes the missing value with the average model response of K simulations (Duneja and Puyalnithi, 2017). The computed neighbouring distance between the samples is typically the Euclidean distance (Duneja and Puyalnithi, 2017; Troyanskaya et al., 2001). Lower K values generally result in predictions with high variance and low bias and vice versa for high K values (Hastie et al., 2009). Thus, in this study, we tested both a lower K value of 5, as was used by Shapiro and Day-Lewis (2022) for groundwater hydrology models, and a higher K equal to the square root of the size of the dataset – a rule of thumb in the KNN method reported to correctly distinguish signal from noise (Duneja and Puyalnithi, 2017). The KNN analysis was conducted using the programming language Python with the scikit-learn KNN regressor.

  3. RBF emulation-based substitution – model emulation or surrogate modelling – is a strategy that develops statistical, cheap-to-run surrogates for the output of complex, computationally intensive models (Razavi et al., 2012a). The emulator usually uses a low computational cost function that fits the non-missing response values Ya to predict the values for the missing response Ym. There are various types of model emulations that can be used for hydrological models such as polynomial regressions, kriging, artificial neural networks, RBFs, and support vector machines (Razavi et al., 2012; Zhou et al., 2022). The RBF is one of the most commonly used function approximation techniques because it can provide an accurate emulation of high-dimensional problems for a low computational cost. Sheikholeslami et al. (2019), for example, employed the RBF approximation for crashed model simulation emulation, which performed better than all other methods tested in that study. The RBF approximation is a weighted summation of na number of functions that can approximate the predictive response Y at a point xi. Here na was set to the number of non-missing sample points. Detailed equations of the RBF approximation can be found in the Supplement. The RBF imputation analysis was also conducted with the Python program using the SciPy RBF interpolation package.

Figure 2Morris analysis results for homogenous sandy soil obtained using different methods for missing data imputation/removal for (a) cumulative benzene flux to the aquifer and (b) final benzene concentration in the aquifer. Black bars represent μ confidence intervals.


3 Results and discussion

3.1 Homogenous media analyses

3.1.1 Morris analysis for homogenous sandy soil

In this analysis, the model's sensitivity to seven input parameters (k) was tested (Table 2), considering two model outputs: benzene aquifer concentration at the end of the simulation and benzene cumulative flux to the aquifer. The analysis was conducted for 250 paths (r) and six levels (p; i.e. dividing the parameters' space to five equal segments, where the parameter can be assigned six different values) for 2000 simulations overall (r(k+1)). Out of these simulations, only 42 simulations (2.1 %) did not converge or crashed. To avoid bias in the results, we used the methodology described in Sect. 2.4 to either replace the missing values or remove the trajectories that contain missing values. Three imputation methods are presented: zero substitution, the RBF emulator, and the KNN method with K=5 and K=45 (representing the square root of the sample size). Figure 2 presents μ for these different methods. Detailed values of all indices for the different methods can be found in the Supplement (Tables S3 and S4). Small differences were observed between the different methods for the values of μ, the μ confidence interval, and σ, for each of the input parameters, with the same order of parameter importance. The similarity between the different methods stems from the scarce missing values, hardly affecting the overall results. In all strategies for handling missing data, it is evident that the GSA performed worst for the weakly influential parameters – α, Ks, and αl, exhibiting a high ratio of the μ confidence interval to μ (Fig. 2, Tables S3 and S4). This was also evident in global sensitivity analyses obtained using the Hydrus model for other hydrological problems (Brunetti et al., 2016, 2022; Hartmann et al., 2018; Zhou et al., 2022; Brunetti et al., 2017).

Figure 3Morris analysis results for homogenous sandy soil with RBF imputation for (a) cumulative benzene flux to the aquifer and (b) final benzene concentration in the aquifer.


The effect of the different parameters on the output can also be seen in Fig. 3a and b, where μ versus σ is presented for the Morris analysis conducted with the RBF emulation substitution method used to replace the crashed data. Though minimal differences were observed among all methods (Fig. 2), the RBF results are shown here for consistency purposes, since the RBF method gave the best results in the heterogeneous media case (see Sect. 3.2).

In all cases, the input parameter with the strongest effect on the system was the degradation rate λk, followed by the profile depth z (Figs. 2 and 3). The next two influential parameters are the adsorption coefficient Kd and the van Genuchten n parameter. Both showed a similar effect on the concentration, though the effect of n on the flux to the aquifer was much more pronounced and Kd was slightly more influential for the concentration. Finally, the van Genuchten α parameter, the hydraulic conductivity Ks, and the dispersivity αl showed little effect on the model results.

Zanello et al. (2021) reported similar results in a LSA for a model of BTEX transport in an unsaturated homogenous sandy soil using Hydrus-2D/3D software. They found that the order of the input parameters' influence on BTEX arrival to the aquifer (tested as concentration) was λk>Kd>z>Ks. The stronger influence of Kd compared to z in that study is probably due to the low z values tested there (2.5–4 m), representing a shallow aquifer. In another study, Davis et al. (1994) modelled the constant leakage of benzene in a loamy sand soil to an aquifer beneath a manufacturing facility. Benzene concentrations of  1 mg L−1 were found in the groundwater beneath the source ( 25 mg L−1), though in monitoring wells  100 m from the source no benzene was detected. In their LSA, they too found that λk was the dominant mechanism for benzene attenuation and found Kd to be very influential. Moreover, similarly to our study, their model was insensitive to αl (Davis et al., 1994). Indeed, the great importance of biodegradation for the removal of petrol hydrocarbons in aerobic environments has been recognized and reported in the literature (Lahvis et al., 1999; Berlin et al., 2016; Yadav and Hassanizadeh, 2011; Berlin and Suresh, 2019; Alvarez et al., 1991; Abu Hamed et al., 2004). This result is demonstrated once again in our study.

Generally, the order of influence of the parameters was similar for the cumulative flux to the aquifer and for the final concentration in the aquifer (Fig. 3), except for Kd and n as stated above. For both the flux and the concentration, a correlation between μ and σ is observed, as reflected in their arrangement around the same diagonal line (Fig. 3, red line), indicating that none of the parameters have solely a linear effect (μ being the mean effect). Instead, all parameters exhibit an interaction effect (σ – the standard deviation of the effect), where the interactions increases with the increase in the mean.

3.1.2 Sobol analysis for homogenous sandy soil

The Sobol analysis for homogenous media was conducted for the four most influential parameters of the Morris analysis: λk, z, n, and Kd, to obtain more quantitative information on the parameters' influence and interactions. A total of 5000 sets of parameters were generated, constituting an overall total of 50 000 model runs for r(2k+2). Of the 50 000 model runs, 881 samples did not converge. The same methodology used for the Morris analysis was used for imputation of missing values (Sect. 2.4). However, with the Sobol analysis, it was impossible to remove the missing points, since the order of sampling is significant to the overall analysis, and sampling is not divided into sets of trajectories. Hence, the data removal method was not used.

Figure 4The Sobol total indices (ST) for homogenous sandy soil obtained using different methods for missing data imputation for (a) cumulative benzene flux to the aquifer and (b) final benzene concentration in the aquifer. Black bars represent ST confidence intervals.


Figure 5Sobol first-order indices (S1) for homogenous sandy soil obtained using different methods for missing data imputation for (a) cumulative benzene flux to the aquifer and (b) final benzene concentration in the aquifer. Black bars represent μ confidence intervals.


Figures 4 and 5 show the S1 and ST Sobol indices for the different methods of missing data replacement. Detailed and averaged values of all methods can be found in the Supplement (Tables S5–S8). All missing data imputation methods gave similar results (Figs. 4 and 5 and Tables S5–S8). This is expected given the low dimensionality of the model (four input parameters). Here, the GSA also performed the worst for the weakly influential parameters (n and Kd) exhibiting a high ratio of the confidence interval to indices.

A similar effect of the different parameters' order and magnitude of importance on the two outputs was observed. Like the Morris analysis, λk and z were found to be the most influential parameters with the highest total-order (ST) and first-order (S1) index values (Figs. 4 and 5). ST, unlike S1, often sums to more than 100 % because it is the sum of S1 and all the higher-order Sobol indices involving the parameter (Saltelli et al., 2004). The difference STS1 is a measure of how much parameter xi is involved in interactions with any other input variable (Saltelli et al., 2004). The total index ST (Figs. 4 and 5) demonstrates that most of the variance in both flux and concentration is caused by λk, consisting of the variation of λk itself (S1 of  11.38 % and 13.21 % for the flux and concentration, respectively; Tables S5 and S6) and the interactions with other parameters. It should be noted that z has a relatively low main effect (S1 of 1.85 % and 1.17 % for the flux and concentration, respectively; Tables S5 and S6) but a high total effect of  58 % and 78 % for the flux and concentration, respectively (Tables S7 and S8), indicating that this parameter has a limited direct impact on the variance of the output but a strong interaction effect, most likely with the degradation coefficient. The total Sobol index of an input parameter is the sum of the first-order Sobol index and all the higher-order Sobol' indices involving that parameter. Hence, the sum of the total Sobol sensitivity indices is equal to or greater than 1 (Gatel et al., 2020). If no higher-order interactions are present, the sum of both the first- and total-order Sobol indices is equal to 1. Sums of ST values > 100 % were also reported by Brunetti et al. (2017), Schübl et al. (2022), Zhou et al. (2022), and Nossent et al. (2011).

Ciriello et al. (2017) performed a Sobol analysis for benzene contamination in an unsaturated soil assuming a very deep aquifer that contamination does not reach and in a shallow aquifer. They reported Ks as one of the most important parameters, while α and n were both found to be mostly insignificant. However, it is hard to compare that study and this one because λk, Kd, and z which were found to be highly significant in this study were not tested in that study. In addition, we tested moderate aquifer depths (more than 5 m).

λk is the only parameter with S1 higher than 10 % and hence the only parameter with a strong main effect on the output variance. When the sum of all first-order indices is less than 100 %, the model is non-additive, meaning that it is affected by interactions (Neumann, 2012; Nossent et al., 2011). Here, the sum of all first-order indices is < 15 %, indicating that the model is non-additive and very much affected by interactions. Only 13.7 % and 14.8 % of the variance for the flux and concentration, respectively, are attributable to the first-order effect (Tables S5 and S6; showing the sum of S1 for the flux and concentration, respectively), highlighting the fundamental role of interactions between parameters.

Overall, the Sobol method results agreed with those of the Morris method. Indeed, the Morris method was proposed as an efficient tool to be used prior to variance-based GSA, in order to screen important and unimportant factors and to provide the first inspection of the model's behaviour at a reasonable computational cost (Brunetti et al., 2018; Song et al., 2015; Wainwright et al., 2013). Similarities between Morris and variance-based methods were also observed by Herman et al. (2013) and Sarrazin et al. (2016).

3.2 Heterogeneous media

Morris analysis for heterogeneous media

In this analysis, 12 parameters concerning the soil type were examined, α, n, Ks, αl, λk, and Kd, both for sand and clayey soil (represented below with a subscript of 1 and 2, respectively). Three additional general profile parameters were tested, z, N, and b, comprising 15 parameters overall (Table 2).

Figure 6Morris analysis results for heterogeneous media: μ values for (a) cumulative benzene flux to the aquifer and (b) final benzene concentration in the aquifer.


The analysis was conducted for 250 paths (r) and four levels for an overall total of 4000 simulations (r(k+1)), from which 959 did not converge or crashed. The increase in the ratio of failures compared with the previous analysis (2.1 % for the homogeneous Morris analysis versus  24 % here) can be attributed to the complex transport in the heterogeneous medium and the difficulty in modelling flow between sand and clay layers, as well as to the increase in the number of model parameters (dimensionality of the parameter space) increasing the arbitrary combinations of parameters during GSA (Sheikholeslami et al., 2019). The same methodology was used for missing data imputation or removal, as discussed above. Unlike the previous analyses, the different methods for missing values imputation yielded dissimilar confidence interval levels, as well as dissimilar μ values for some parameters (Fig. 6). Also, an overall increase in the ratio between μ and its confidence interval was observed (Fig. 6). In the Supplement, the ratio between the μ confidence interval and μ is presented in Fig. S2 in the Supplement to clearly illustrate the difference between the different methods and parameters. GSA results with high confidence interval values were also reported by other studies that used Hydrus models (Brunetti et al., 2016, 2022; Hartmann et al., 2018; Zhou et al., 2022; Brunetti et al., 2017). Though the authors do not address this issue, it indicates that more model runs are needed for the indices to converge (Sarrazin et al., 2016). Yet, a clear μ ranking is observed, with an overall consistency with the previous results of the homogeneous case and among the different methods.

Figure 7Morris analysis results for heterogeneous media for (a) cumulative benzene flux to the aquifer and (b) final benzene concentration in the aquifer.


Full trajectory removal performed the poorest for most parameters, while the RBF emulation method performed the best (Figs. 6 and S2). The two KNN methods gave better results than the zero substitution. These results are similar to those reported by Sheikholeslami et al. (2019), in which the RBF-emulation-based substitution performed better than the single NN and than a constant value substitution (the median, in their case). Detailed values of all methods' indices can be found in the Supplement (Tables S11 and S12). A correlation between μ and σ for all input parameters is again demonstrated by their arrangement around one diagonal line (the red line in Fig. 7a and b), indicating that the interactions' effect increases with the increase in the total effect of each parameter.

Figure 7a and b show the effect of the different parameters on the outputs, in terms of μ versus σ, using the RBF method. Like the GSA results of the homogenous media, it is evident that the degradation coefficient λk2, the sorption coefficient Kd2 of the clay soil, the depth z, and the van Genuchten n1 parameter of the sand layers are the most dominant factors controlling benzene transport to the aquifer. The degradation coefficient of the clay layers λk2 was found to be the most dominant parameter, considering both the cumulative flux and the concentration at the end of the simulation. The sorption coefficient of the clay layers Kd2 is second-most influential for the concentration but equally important as the depth z, which comes in third place for the flux, followed by n1 in fourth place. Benzene adsorption for clay minerals is higher than that for sand (Berlin and Suresh, 2019; Zytner, 1994) and was set accordingly in the GSA (Table 1); therefore, its overall increased influence is not surprising, as well as the increased influence of λk2 in these layers, as the solute is attenuated due to sorption. The stronger effect of Kd2 on the concentration as compared with the flux is expected, since sorption directly affects the solute concentration. On the other hand, n1 of the sand was much more influential than n2, probably since it is a parameter affecting the flow, and sand comprises most of the profile. We note that these four parameters were also found to be significant for homogenous soil analysis; thus, they should be carefully examined when modelling the fate and transport of benzene.

A large group of moderately influential parameters follow these four most influential parameters. For the flux, α1 and α2 follow in importance in fifth and sixth place, followed by the sand adsorption coefficient Kd1, which comes in seventh place. For the concentration, α2 is also at the fifth place; however, Kd1 and λk1 are more influential and come in sixth and seventh place, respectively, also showing a greater degree of interaction effect, just before α1 at the eighth place. The parameters Kd1 and λk1 are likely to affect concentration more than flux, since they affect benzene directly, similarly to Kd2 here and Kd1 in the homogenous media analysis. They both lower benzene concentrations but have less effect on the total flux.

Following these influential parameters, the dispersivities αl2 and αl1 of both soil types are less influential but still somewhat close to the middle of the graph, together with clay layers distribution parameters, b and N, and with λk1 for the flux. The least influential parameters are n2 of the clay layer and the hydraulic conductivities of both soil types, Ks1 and Ks2. Interestingly, Gribb et al. (2002), who conducted LSA for a risk assessment model of benzene and naphthalene transport to groundwater through sand, loam, and clay soils, also reported high model sensitivity to λk and Kd for all soils. In their case, the model was also less sensitive to Ks, except for pure loam and clay soils. For other parameters that were not tested here (porosity, bulk density, residual water content, and initial concentration), the model was only sensitive in the case of pure clayey soil. However, their study assumed homogenous media of each soil type, which may have obscured the effect of a specific parameter for different soil types in a single profile. In the present study, it is also evident that λk and Kd are very influential, depending also on the soil type, whereas Ks is the least influential parameter.

The results for both the homogenous and heterogeneous media indicate that the most dominant factor controlling benzene arrival to the aquifer is λk, the rate of benzene removal from the media by biological degradation. This rate can vary greatly from extremely fast to very slow rates, depending on parameters such as initial benzene concentration and soil water content (Table S1). In general, the degradation rate is lower for higher concentrations and lower water content, and vice versa. Since the values of λk vary greatly in the literature, this parameter must be carefully selected in hydrological modelling for benzene transport, and further research to elucidate its value onsite is encouraged.

The parameters z, Kd, and n1 followed λk in importance; Kd usually had a stronger effect on the concentration than on the flux. The aquifer depth is an easy-to-measure parameter, and it should be included in any model for benzene transport. Likewise, n can be evaluated quite easily using tools such as Rosetta to establish soil texture (Schaap et al., 2001). The adsorption coefficient of the clay layers was also found to be highly significant. Therefore, examination and characterization of the onsite soil types are also extremely important. Most studies that tested SA for benzene transport in different soils types used homogenous media representation of each soil type and tested one soil type at a time (Davis et al., 1994; Gatel et al., 2019; Gribb et al., 2002). In this study, we tested the effects of both parameters of the individual soil types and the distribution of the clay layers using b and N. Thus, we offer an improved assessment of the importance of each parameter of each soil type and show that clay layer should be represented and characterized, though the exact layering distribution is only moderately influential. On the other hand, literature values may be used for non-influential parameters, including the hydraulic conductivities Ks1 and Ks2 and n2.

4 Summary and conclusions

This paper explores the effect of different model parameters on benzene transport in the vadose zone of Israel's coastal plain aquifer and its potential arrival to the aquifer below. A physical model was implemented to simulate benzene transport in the unsaturated zone. The model was initially employed for homogenous sandy soil, as sand comprises the vast majority of the vadose zone. Next, the model was set to describe heterogeneous soil containing clay layers representing lithology obtained from data of contaminated sites. Two GSA methods were applied to examine the effect of the model input parameters on benzene concentration in the aquifer at the end of the simulation and on benzene cumulative flux to the aquifer. Additionally, treatment of missing data due to model crashes was demonstrated.

The results for both the homogenous and heterogeneous media showed that the most dominant factor controlling benzene arrival to the aquifer is benzene degradation coefficient (λk). Following λk, the depth (z) of the aquifer was highly significant. The adsorption coefficient Kd and van Genuchten n parameter of the sand soil were also found to be highly significant.

A substantial interaction effect between the parameters was observed; the parameters with the highest individual effect showed a high interaction effect and vice versa. The degree of individual parameter influence on the model was shown to be small (< 15 %) by the Sobol analysis, indicating the great importance of interactions between parameters.

The different methods for missing data handling yielded a similar overall ranking of the influential parameters identified by the GSA. However, the RBF-emulation-based substitution showed better results compared to the KNN and zero substitution techniques, particularly when the transport between layers was considered, and the model dimensionality and subsequent number of failures were high. In that case, the data removal technique performed markedly worse.

Data availability

The dataset is available online at (Cohen, 2023).


The supplement related to this article is available online at:

Author contributions

RR and NS planned the research, MC ran the model and GSA and analysed the data, MC wrote the manuscript draft, and RR and NS reviewed and edited the manuscript.

Competing interests

The contact author has declared that none of the authors has any competing interests.


Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.


This work was funded by the Israel Water Authority.

Financial support

This research has been supported by the Israel Water Authority (grant no. 0399839).

Review statement

This paper was edited by Alberto Guadagnini and reviewed by two anonymous referees.


Abu Hamed, T., Bayraktar, E., Mehmetoglu, Ü., and Mehmetoglu, T.: The biodegradation of benzene, toluene and phenol in a two-phase system, Biochem. Eng. J., 19, 137–146,, 2004. 

Akbariyeh, S., Bartelt-Hunt, S., Snow, D., Li, X., Tang, Z., and Li, Y.: Three-dimensional modeling of nitrate-N transport in vadose zone: Roles of soil heterogeneity and groundwater flux, J. Contam. Hydrol., 211, 15–25,, 2018. 

Alvarez, P. J. J., Anid, P. J., and Vogel, T. M.: Kinetics of aerobic biodegradation of benzene and toluene in sandy aquifer material, Biodegradation, 2, 43–51,, 1991. 

Archer, G. E. B., Saltelli, A., and Sobol, I. M.: Sensitivity measures ANOVA-like techniques and the use of bootstrap, J. Stat. Comput. Sim., 58, 99–120,, 1997. 

Baek, D. S., Kim, S. B., and Kim, D. J.: Irreversible sorption of benzene in sandy aquifer materials, Hydrol. Process., 17, 1239–1251,, 2003. 

Bear, J.: Dynamics of Fluids in Porous Media, Dover Publications, New York, ISBN 0486656756, (last access: 8 April 2024), 1972. 

Bear, J. and Cheng, A.: Modeling Groundwater Flow and Contaminant Transport, Springer Science & Business Media, Berlin,, 2010. 

Berlin, M. and Suresh, K. G.: Numerical Experiments on Fate and Transport of Benzene with Biological Clogging in Vadoze Zone, Environ. Process., 6, 841–858,, 2019. 

Berlin, M., Vasudevan, M., Kumar, G. S., and Nambi, I. M.: Numerical modelling on fate and transport of coupled adsorption and biodegradation of pesticides in an unsaturated porous medium, ISH J. Hydraul. Eng., 22, 236–246,, 2016. 

Borgonovo, E.: A new uncertainty importance measure, Reliab. Eng. Syst. Safe., 92, 771–784, 2007. 

Botros, F. E., Onsoy, Y. S., Ginn, T. R., and Harter, T.: Richards Equation–Based Modeling to Estimate Flow and Nitrate Transport in a Deep Alluvial Vadose Zone, Vadose Zone J., 11, vzj2011.0145,, 2012. 

Brauner, J. S. and Widdowson, M. A.: Numerical Simulation of a Natural Attenuation Experiment with a Petroleum Hydrocarbon NAPL Source, Groundwater, 39, 939–952,, 2001. 

Brunetti, G., Šimùnek, J., and Piro, P.: A comprehensive numerical analysis of the hydraulic behavior of a permeable pavement, J. Hydrol., 540, 1146–1161,, 2016. 

Brunetti, G., Šimùnek, J., Turco, M., and Piro, P.: On the use of surrogate-based modeling for the numerical analysis of Low Impact Development techniques, J. Hydrol., 548, 263–277,, 2017. 

Brunetti, G., Šimùnek, J., Turco, M., and Piro, P.: On the use of global sensitivity analysis for the numerical analysis of permeable pavements, Urban Water J., 15, 269–275,, 2018. 

Brunetti, G., Stumpp, C., and Šimùnek, J.: Balancing exploitation and exploration: A novel hybrid global-local optimization strategy for hydrological model calibration, Environ. Model. Softw., 150, 105341,, 2022. 

Campolongo, F., Cariboni, J., and Saltelli, A.: An effective screening design for sensitivity analysis of large models, Environ. Model. Softw., 22, 1509–1518,, 2007. 

Carsel, R. F. and Parrish, R. S.: Developing joint probability distributions of soil water retention characteristics, Water Resour. Res., 24, 755–769,, 1988.. 

Chen, N., Valdes, D., Marlin, C., Blanchoud, H., Guerin, R., Rouelle, M., and Ribstein, P.: Water, nitrate and atrazine transfer through the unsaturated zone of the Chalk aquifer in northern France, Sci. Total Environ., 652, 927–938,, 2019. 

Choi, J.-W., Ha, H.-C., Kim, S.-B., and Kim, D.-J.: Analysis of benzene transport in a two-dimensional aquifer model, Hydrol. Process., 19, 2481–2489, 2005. 

Choi, N. C., Choi, J. W., Kim, S. B., Park, S. J., and Kim, D. J.: Two-dimensional modelling of benzene transport and biodegradation in a laboratory-scale aquifer, Environ. Technol., 30, 53–62, 2009. 

Ciriello, V., Lauriola, I., Bonvicini, S., Cozzani, V., Di Federico, V., and Tartakovsky, D. M.: Impact of Hydrogeological Uncertainty on Estimation of Environmental Risks Posed by Hydrocarbon Transportation Networks, Water Resour. Res., 53, 8686–8697,, 2017. 

Cohen, M.: Output and results of all GSA, figshare [data set],, 2023. 

Dai, H., Chen, X., Ye, M., Song, X., and Zachara, J. M.: A geostatistics-informed hierarchical sensitivity analysis method for complex groundwater flow and transport modeling, Water Resour. Res., 53, 4327–4343,, 2017. 

Davis, J. W., Klier, N. J., and Carpenter, C. L.: Natural biological attenuation of Benzene in ground water beneath a manufacturing facility, Groundwater, 32, 215–226,, 1994. 

Dell'Oca, A., Riva, M., and Guadagnini, A.: Moment-based metrics for global sensitivity analysis of hydrological systems, Hydrol. Earth Syst. Sci., 21, 6219–6234,, 2017. 

Domenico, P. A. and Schwartz, F. W.: Physical and Chemical Hydrogeology, 2nd edn., John Wiley & Sons, New York,, 1990. 

Du, P., Sagehashi, M., Terada, A., and Hosomi, M.: Numerical modeling of benzene transport in andosol and sand: Adequacy of diffusion and equilibrium adsorption equations, World Acad. Sci. Eng. Technol., 66, 38–42,, 2010. 

Duneja, A. and Puyalnithi, T.: Enhancing Classification Accuracy of K-Nearest Neighbours Algorithm Using Gain Ratio, Int. Res. J. Eng. Technol., 4, 1385–1388, 2017. 

Ecker, A.: Selected geological cross-sections and subsurface maps in the coastal aquifer of Israel, Atlas, Geological Survey of Israel, Jerusalem, 1999. 

Efron, B. and Tibshirani, R.: Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy, Stat. Sci., 1, 54–75, 1986. 

Fan, X., He, L., Lu, H. W., and Li, J.: Environmental and health-risk-induced remediation design for benzene-contaminated groundwater under parameter uncertainty: A case study in Western Canada, Chemosphere, 111, 604–612, 2014. 

Farhadian, M., Vachelard, C., Duchez, D., and Larroche, C.: In situ bioremediation of monoaromatic pollutants in groundwater: A review, Bioresource Technol., 99, 5296–5308,, 2008. 

Gal, Y., Azenkot, A., Peres, M., and Levingart, M.: Converting irrigation coefficnets based on Class A pan to calculated evaporation (Penman Montieth FAO 56), Alon Hanotea, 66, 28–32, 2012. 

Gatel, L., Lauvernet, C., Carluer, N., Weill, S., Tournebize, J., and Paniconi, C.: Global evaluation and sensitivity analysis of a physically based flow and reactive transport model on a laboratory experiment, Environ. Model. Softw., 113, 73–83,, 2019. 

Gatel, L., Lauvernet, C., Carluer, N., Weill, S., and Paniconi, C.: Sobol global sensitivity analysis of a coupled surface/subsurface water flow and reactive solute transfer model on a real hillslope, Water (Switzerland), 12, 1–21,, 2020. 

Gribb, M. M., Bene, K. J., and Shrader, A.: Sensitivity analysis of a soil leachability model for petroleum fate and transport in the vadose zone, Adv. Environ. Res., 7, 59–72,, 2002. 

Hartmann, A., Šimùnek, J., Aidoo, M. K., Seidel, S. J., and Lazarovitch, N.: Implementation and Application of a Root Growth Module in HYDRUS, Vadose Zone J., 17, 170040,, 2018. 

Hastie, T., Tibshirani, R., and Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition, Springer Science & Business Media, New York, ISBN 9780387848846, 2009. 

Herman, J. and Usher, W.: SALib: An open-source Python library for Sensitivity Analysis, J. Open Source Softw., 2, 18155,, 2017. 

Herman, J. D., Kollat, J. B., Reed, P. M., and Wagener, T.: Technical Note: Method of Morris effectively reduces the computational demands of global sensitivity analysis for distributed watershed models, Hydrol. Earth Syst. Sci., 17, 2893–2903,, 2013. 

Jaxa-Rozen, M., Pratiwi, A. S., and Trutnevyte, E.: Variance-based global sensitivity analysis and beyond in life cycle assessment: an application to geothermal heating networks, Int. J. Life Cycle Ass., 26, 1008–1026,, 2021. 

Kessler, N.: Summary of actions to prevent water contamination from fuels in the years of 2020–2021, Israel water authority report, (last access: 8 April 2024), 2022. 

Khorashadi Zadeh, F., Nossent, J., Sarrazin, F., Pianosi, F., van Griensven, A., Wagener, T., and Bauwens, W.: Comparison of variance-based and moment-independent global sensitivity analysis approaches by application to the SWAT model, Environ. Model. Softw., 91, 210–222,, 2017. 

Lahvis, M. A., Baehr, A. L., and Baker, R. J.: Quantification of aerobic biodegradation and volatilization rates of gasoline hydrocarbons near the water table under natural attenuation conditions, Water Resour. Res., 35, 753–765,, 1999. 

Levy, Y.: Observations and Modeling of Nitrate Fluxes to Groundwater under Diverse Agricultural Land-Uses: From the Fields to the Pumping Wells, Master thesis, The Hebrew University of Jerusalem, (last access: 8 April 2024), 2015. 

Liu, H., Chen, W., and Sudjianto, A.: Relative entropy based method for probabilistic sensitivity analysis in engineering design, J. Mech. Design, 182, 326–336, 2006. 

Logeshwaran, P., Megharaj, M., Chadalavada, S., Bowman, M., and Naidu, R.: Petroleum hydrocarbons (PH) in groundwater aquifers: An overview of environmental fate, toxicity, microbial degradation and risk-based remediation approaches, Environ. Technol. Innov., 10, 175–193,, 2018. 

López, E., Schuhmacher, M., and Domingo, J. L.: Human health risks of petroleum-contaminated groundwater, Environ. Sci. Pollut. R., 15, 278–288,, 2008. 

Lu, G., Clement, T. P., Zheng, C., and Wiedemeier, T. H.: Natural Attenuation of BTEX Compounds Model Development and Field-Scale Application, Ground Water, 37, 707–701,, 1999. 

Millington, R. J. and Quirk, J. P.: Permeability of porous solids, T. Faraday Soc., 57, 1200–1207,, 1961. 

Mohamed, M. M. A. and Sherif, N. E. S. M. M.: Modeling in situ benzene bioremediation in the contaminated Liwa aquifer (UAE) using the slow-release oxygen source technique, Environ. Earth Sci., 61, 1385–1389, 2010. 

Moret-Fernández, D., Peña-Sancho, C., Latorre Garcés, B., Pueyo, Y., Sánche, L., and Victoria, M.: Estimating the van Genuchten retention curve parameters of undisturbed soil from a single upward infiltration measurement, Soil Res., 55, 682–691, 2017. 

Morris, M. D.: Factorial Sampling Plans for Preliminary Computational Experiments Max, Technometrics, 33, 161–174, 1991. 

Mualem, Y.: A New Model for Predicting Hydraulic Conductivity of Unsaturated Porous-Media., Water Resour. Res., 12, 513–522,, 1976. 

Nadim, F., Hoag, G. E., Liu, S., Carley, R. J., and Zack, P.: Detection and remediation of soil and aquifer systems contaminated with petroleum products: An overview, J. Petrol. Sci. Eng., 26, 169–178,, 2000. 

Nemes, A., Wösten, J., and Lilly, A.: Development of soil hydraulic pedotransfer functions on a European scale: their usefulness in the assessment of soil quality, Sustaining the Global Farm, Selected papers from the 10th International Soil Conservation Organization Meeting, 24–29 May 1999, West Lafayette, Indiana, USA, Purdue University and the USDA-ARS National Soil Erosion Research Laboratory, edited by: Stott, D. E., Mohtar, R. H., and Steinhardt, G. C., 541–549, (last access: 8 April 2024), 2001. 

Neumann, M. B.: Comparison of sensitivity analysis methods for pollutant degradation modelling: A case study from drinking water treatment, Sci. Total Environ., 433, 530–537,, 2012. 

Nossent, J., Elsen, P., and Bauwens, W.: Sobol' sensitivity analysis of a complex environmental model, Environ. Model. Softw., 26, 1515–1525,, 2011. 

Pan, F., Zhu, J., Ye, M., Pachepsky, Y. A., and Wu, Y. S.: Sensitivity analysis of unsaturated flow and contaminant transport with correlated parameters, J. Hydrol., 397, 238–249,, 2011. 

Périard, Y., Gumiere, S. J., Rousseau, A. N., and Caron, J.: Sensitivity analyses of a colloid-facilitated contaminant transport model for unsaturated heterogeneous soil conditions, EGU General Assembly 2013, 7–12 April 2013, Vienna, Austria, Geophysical Research Abstracts, EGU2013-510, 2013. 

Plischke, E., Borgonovo, E., and Smith, C. L.: Global sensitivity measures from given data, Eur. J. Oper. Res., 226, 536–550,, 2013. 

Rawls, W. J. J., Brakenseik, D. L., Saxton, K. E. E., Brakensiek, C. L., and Saxton, K. E. E.: Estimation of soil water properties, T. ASAE, 25, 1316–1320,, 1982. 

Razavi, S., Tolson, B. A., and Burn, D. H.: Review of surrogate modeling in water resources, Water Resour. Res., 48, W07401,, 2012. 

Razavi, S., Jakeman, A., Saltelli, A., Prieur, C., Iooss, B., Borgonovo, E., Plischke, E., Lo Piano, S., Iwanaga, T., Becker, W., Tarantola, S., Guillaume, J. H. A., Jakeman, J., Gupta, H., Melillo, N., Rabitti, G., Chabridon, V., Duan, Q., Sun, X., Smith, S., Sheikholeslami, R., Hosseini, N., Asadzadeh, M., Puy, A., Kucherenko, S., and Maier, H. R.: The Future of Sensitivity Analysis: An essential discipline for systems modeling and policy support, Environ. Model. Softw., 137, 104954,, 2021. 

Reshef, G. and Gal, H.: Summary of actions to prevent pollution of water sources from fuels 2016, Israel water authority report, (last access: 8 April 2024), 2017. 

Rimon, Y., Dahan, O., Nativ, R., and Geyer, S.: Water percolation through the deep vadose zone and groundwater recharge: Preliminary results based on a new vadose zone monitoring system, Water Resour. Res., 43, 1–12,, 2007. 

Rivett, M. O., Wealthall, G. P., Dearden, R. A., and McAlary, T. A.: Review of unsaturated-zone transport and attenuation of volatile organic compound (VOC) plumes leached from shallow source zones, J. Contam. Hydrol., 123, 130–156,, 2011. 

Rockhold, M., Thorne, P., Song, X., Tartakovsky, G., Tagestad, J., and Chen, X.: Sensitivity Analysis of Contaminant Transport from Vadose Zone Sources to Groundwater Case Study for Hanford Site B-Complex, Richland, Washington,, 2018. 

Saltelli, A. and Annoni, P.: How to avoid a perfunctory sensitivity analysis, Environ. Model. Softw., 25, 1508–1517,, 2010. 

Saltelli, A., Tarantola, S., Campolongo, F., and Ratto, M.: Sensitivity analysis in practice: a guide to assessing scientific models (Google eBook), Wiley, New York, 219 pp., ISBN 0-470-87093-1, 2004. 

Saltelli, A., Annoni, P., Azzini, I., Campolongo, F., Ratto, M., and Tarantola, S.: Variance based sensitivity analysis of model output. Design and estimator for the total sensitivity index, Comput. Phys. Commun., 181, 259–270,, 2010. 

Sarrazin, F., Pianosi, F., and Wagener, T.: Global Sensitivity Analysis of environmental models: Convergence and validation, Environ. Model. Softw., 79, 135–152,, 2016. 

Schaap, M. G., Leij, F. J., and Van Genuchten, M. T.: Rosetta: A computer program for estimating soil hydraulic parameters with hierarchical pedotransfer functions, J. Hydrol., 251, 163–176,, 2001. 

Schmidt, T. C., Schirmer, M., Weiß, H., and Haderlein, S.: Microbial degradation of tert-butyl ether and tert-butyl alcohol in the subsurface., J. Contam. Hydrol., 70, 173–203, 2004. 

Schübl, M., Stumpp, C., and Brunetti, G.: A Bayesian perspective on the information content of soil water measurements for the hydrological characterization of the vadose zone, J. Hydrol., 613, 128429,, 2022. 

Shapiro, A. M. and Day-Lewis, F. D.: Reframing groundwater hydrology as a data-driven science, Groundwater, 1–2,, 2022. 

Sheikholeslami, R., Razavi, S., and Haghnegahdar, A.: What should we do when a model crashes? Recommendations for global sensitivity analysis of Earth and environmental systems models, Geosci. Model Dev., 12, 4275–4296,, 2019. 

Šimùnek, J., Šejna, M., Saito, H., Sakai, M., and van Genuchten, M. T.: The HYDRUS-1D Software Package for Simulating the One-Dimensional Movement of Water, Heat, and Multiple Solutes in Variably-Saturated Media, Department of Environmental Sciences, University of California Riverside, Riverside, CA, USA, (last access: 8 April 2024), 2013. 

Sobol, I. M.: Global sensitivity indices for nonlinear mathematical models and their Monte Carlo estimates, Math. Comput. Simulat., 55, 271–280,, 2001. 

Song, X., Zhang, J., Zhan, C., Xuan, Y., Ye, M., and Xu, C.: Global sensitivity analysis in hydrological modeling: Review of concepts, methods, theoretical framework, and applications, J. Hydrol., 523, 739–757,, 2015. 

Stewart, O.: Sulfolane Technical Assistance and Evaluation Report, Alaska Department of Environmental Conservation, 61, (last access: 8 April 2024)​​​​​​​, 2010. 

Stumm, W. and Morgan, J. J.: Aquatic Chemistry: An Introduction Emphasizing Chemical Equilibria in Natural Waters, 2nd Edition, John Wiley & Sons Ltd., New York, ISBN 0471048313, ISBN 9780471048312​​​​​​​, 1981. 

Tartakovsky, D. M.: Probabilistic risk analysis in subsurface hydrology, Hydrol. L. Surf. Stud., 34, L05404,, 2007. 

Troldborg, M., Binning, P. J., Nielsen, S., Kjeldsen, P., and Christensen, A. G.: Unsaturated zone leaching models for assessing risk to groundwater of contaminated sites, J. Contam. Hydrol., 105, 28–37,, 2009. 

Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., and Altman, R. B.: Missing value estimation methods for DNA microarrays, Bioinformatics, 17, 520–525,, 2001. 

Turco, M., Kodešová, R., Brunetti, G., Nikodem, A., Fér, M., and Piro, P.: Unsaturated hydraulic behaviour of a permeable pavement: Laboratory investigation and numerical analysis by using the HYDRUS-2D model, J. Hydrol., 554, 780–791,, 2017. 

Turkeltauub, T.: Monitoring and modeling recharge-water fluxes in the deep vadose-zone under different land uses over the southern coastal aquifer, Israel, Master thesis, Ben-Gurion University of the Negev, 99 pp., (last access: 8 April 2024), 2011.  

U.S. EPA: Edition of the Drinking Water Standards and Health Advisories Tables, EPA's Office of Water, Washington, DC, 822 pp.,​​​​​​​ (last access: 8 April 2024), 2018. 

Upreti, D., Pignatti, S., Pascucci, S., Tolomio, M., Li, Z., Huang, W., and Casa, R.: A Comparison of moment-independent and variance-based global sensitivity analysis approaches for wheat yield estimation with the Aquacrop-OS model, Agronomy, 10, 607,, 2020. 

van Genuchten, M. T. T.: A Closed-form Equation for Predicting the Hydraulic Conductivity of Unsaturated Soils, Soil Sci. Soc. Am. J., 44, 892–898,, 1980. 

Wainwright, H. M., Finsterle, S., Zhou, Q., and Birkholzer, J. T.: Modeling the performance of large-scale CO2 storage systems: A comparison of different sensitivity analysis methods, Int. J. Greenh. Gas Con., 17, 189–205,, 2013. 

Wang, A. and Solomatine, D. P.: Practical experience of sensitivity analysis: Comparing six methods, on three hydrological models, with three performance criteria, Water (Switzerland), 11, 1–26,, 2019. 

Whiting, D., Card, A., Wilson, C., and Reeder, J.: Estimating Soil Texture, Colorado Master Gardener, 1–5, Colorado State University, (last access: 8 April 2024), 2011. 

Wołowiec, K. and Malina, G.: Laboratory studies on benzene sorption processes in clay formations, Geol. Geophys. Environ., 41, 43,, 2015. 

Yadav, B. K. and Hassanizadeh, S. M.: An overview of biodegradation of LNAPLs in coastal (Semi)-arid environment, Water. Air. Soil Poll., 220, 225–239,, 2011. 

Zanello, V., Scherger, L. E., and Lexow, C.: Assessment of groundwater contamination risk by BTEX from residual fuel soil phase, SN Appl. Sci., 3, 1–20,, 2021. 

Zhou, T., Šimùnek, J., Braud, I., Nasta, P., Brunetti, G., and Liu, Y.: The impact of evaporation fractionation on the inverse estimation of soil hydraulic and isotope transport parameters, J. Hydrol., 612,, 2022. 

Zytner, R. G.: Sorption of benzene, toluene, ethylbenzene and xylenes to various media, J. Hazard. Mater., 38, 113–126,, 1994. 

Short summary
Contamination from fuel constituents poses a major threat to groundwater. However, studies devoted to identification of the driving parameters for fuel derivative transport in soils are scarce, and none have dealt with heterogeneous layered media. Here, we performed global sensitivity analysis (GSA) on a model of benzene transport to groundwater. The results identified the parameters controlling benzene transport in soils and showed that GSA is as an important tool for transport model analysis.