Local sensitivity analysis for compositional data with application to soil texture in hydrologic modelling

Compositional data, such as soil texture, are hard to deal with in the geosciences as standard statistical methods are often inappropriate to analyse this type of data. Especially in sensitivity analysis, the closed character of the data is often ignored. To that end, we developed a method to assess the local sensitivity of a model output with resect to a compositional model input. We adapted the finite difference technique such that the different parts of the input are perturbed simultaneously while the closed character of the data is preserved. This method was applied to a hydrologic model and the sensitivity of the simulated soil moisture content to local changes in soil texture was assessed. Based on a high number of model runs, in which the soil texture was varied across the entire texture triangle, we identified zones of high sensitivity in the texture triangle. In such zones, the model output uncertainty induced by the discrepancy between the scale of measurement and the scale of model application, is advised to be reduced through additional data collection. Furthermore, the sensitivity analysis provided more insight into the hydrologic model behaviour as it revealed how the model sensitivity is related to the shape of the soil moisture retention curve.


Introduction
In environmental studies, modellers are sometimes confronted with multivariate data that carry only relative information of which the components represent parts of a whole.Such type of data is called compositional or closed data as the components always sum to a constant, e.g. 1 or 100 %.A typical example is the sedimentary particle size distribution of which the closed character implies that the components are not free to vary independently such that if one of its components (e.g.clay) decreases (increases), at least one of the others (e.g.silt or sand) must increase (decrease).Because of this particular property, the application of standard statistical methods to compositional data is hampered and many of the results are invalid because the methods are inappropriate to analyse this type of data.Problems in the analysis of compositional data have been discussed since the end of the twentieth century by a number of authors (e.g.Aitchison, 1986;Aitchison and Egozcue, 2005).
A frequently performed statistical exercise involves the evaluation of how changes in the model input or parameters affect the model output.This is widely known as sensitivity analysis (SA) and allows for (i) the allocation of the uncertainty in the model output to different sources of uncertainty in the model input (Saltelli et al., 2000), (ii) the prioritisation of additional data collection or research concerning the uncertainties identified as most important (Frey and Patil, 2002) and (iii) the verification or validation of a model (Fraedrich and Goldberg, 2000).
According to the objective of the analysis, the techniques for sensitivity analysis are usually classified into screening, global and local methods.Screening methods aim at identifying the model inputs to which the model output is most sensitive.Global methods calculate the total effect of a model input on the model output across the entire input space, whereas local methods investigate the sensitivity of the model output for a specific input scenario, i.e. at a fixed set of points from the model input domain.The local methods L. Loosvelt et al.: Sensitivity analysis for compositional data are especially important for complex, nonlinear models as the effect of a model input on the model output may be highly localised, which makes the assessment of a global effect inappropriate in this case.Screening methods are often relatively simple and are a particular instance of sampling-based methods.One of the most commonly used screening methods is the elementary effect method (Campolongo et al., 2007).Commonly used global methods are the Sobol method (e.g.Sobol, 1993;Saltelli et al., 2008a), the Fourier amplitude sensitivity test (FAST) (e.g.Saltelli et al., 1999;McRae et al., 1982), the response surface method (RSM) (e.g.Cryer and Havens, 1999;Kleijnen et al., 1992) and Monte Carlo based methods (Hofer, 1999;Gwo et al., 1996).Most of them are variance based, which means that the resulting sensitivity reflects the contribution of the model input to the total variance in the model output.In contrast, local methods are based on first-order second-moment approximations (FOSM) in which it is assumed that the first two moments are sufficient to characterise a variable (Dettinger and Wilson, 1981).Examples of local methods are the Morris method (e.g.Morris, 1991;Francos et al., 2003) and the finite difference method (e.g.Lenhart et al., 2002;Foglia et al., 2009).Depending on the specific SA problem, a screening, global or local method needs to be selected such that the method fits the objective(s) of the analysis.For a review on methods for sensitivity analysis, the reader is referred to Saltelli et al. (2006), Frey and Patil (2002) and Helton and Davis (2003).
In case a SA on multiple model inputs is intended, the inputs can be varied simultaneously based on their underlying probability distribution (e.g.Gwo et al., 1996), or they can be varied individually around a base value while keeping the value of the other model inputs constant (e.g.Ferreira et al., 1995).The latter strategy is known as one at a time sensitivity analysis (OAT-SA) and has been the subject of discussion because it is built on assumptions of model linearity and cannot detect interactions between model inputs (Saltelli and Annoni, 2010).Furthermore, OAT-SA is by definition nonexplorative as only a fraction of the total hyperspace is explored, and is therefore attributed "the curse of dimensionality" (Saltelli and Annoni, 2010).Despite the shortcomings of OAT-SA, a literature review by Saltelli et al. (2006) revealed that most published sensitivity analyses use OAT.In some cases (strong) input correlations were observed (Boateng and Cawlfield, 1999;Zhu et al., 2010) and the assumption of independent inputs was therefore incorrectly adopted.Only in a limited number of SA studies, correlation structures have been incorporated (Pan et al., 2011;Jacques et al., 2006;Gevrey et al., 2006).The reason why OAT is so popular is that the observed effect on the model output is solely due to the fact that one input has been changed, which is consistent with the modeller's way of thinking to systematically evaluate the effect of input variation.In case the model input consists of compositional data, the different components of the input are related through the closure balance, and consequently an OAT-SA on its individual components is not jus-tified, but instead all components should be varied simultaneously in order to preserve the closed character of the data.Despite the need to deal with this type of data in environmental models, limited research on sensitivity analysis involving compositional model inputs has been reported to date.Often, the methods applied do not or only partly respect the characteristic properties of compositional data.For example, Bormann (2007) defined a neighbourhood sensitivity for soil texture by applying a fixed change of 1 % in the portion of clay or silt while keeping the portion of silt, respectively sand fixed, although a simultaneous change in all of its portions would have been expected.
In this study, the main objective is to develop a sensitivity analysis method that allows to quantify the sensitivity of a model output with resect to a specific input scenario in case the model input consists of compositional data.To that end, the finite difference technique has been adopted and modified to deal with the closed character of the inputs.The method comprises the calculation of an omnidirectional local sensitivity index that indicates the average impact on the model output when perturbing the compositional model input in different directions around a given point.Since the results of the derivative-based method depend on the magnitude of perturbation (Breshears et al., 1992), especially in case the model shows strong nonlinear relationships and correlations (Saltelli et al., 2000), the method also includes a procedure to optimise the perturbation factor.Subsequently, the SA method is applied to the hydrologic model TOPLATS and is used to evaluate changes in the simulated soil moisture content with respect to small local changes in soil texture, of which the composition was varied across the entire input domain, defined by the soil texture triangle.On the basis of this generated local sensitivity index, we aim at locating regions in the texture triangle to which the modelled soil moisture is most sensitive.
In addition to constructing and applying the SA method, another objective of this study is to gain more insight into the behaviour of the hydrologic model, and more specifically with regard to the role of soil texture therein.Information on soil texture is essential for the operation of a hydrologic model since it is used to estimate the soil hydraulic parameters from the hydraulic model.Because soil texture is often measured at a number of sparsely distributed locations within the study area, all locations falling into the same soil type as the one of the sampled location (cf.information on the soil map) are attributed the same hydraulic properties within the hydrologic model.This discrepancy between the scale of measurement (spacing, cfr.scale triplet (Blöschl and Sivapalan, 1995)) and the scale of model application (grid resolution) raises doubts about the suitability of the measured input value as the most probable (Barth et al., 2001) since it may give rise to large uncertainties in the model output.In this perspective, the presented sensitivity analysis offers the possibility to reduce this type of model output uncertainty by formulating guidelines for additional data collection as a function of the measured soil texture.

The hydrologic model
The TOPMODEL-based Land-Atmosphere Transfer Scheme (TOPLATS) is a spatially distributed water and energy balance model that is based on a lateral redistribution of water (Famiglietti and Wood, 1994;Peters-Lidard et al., 1997;Pauwels and Wood, 1999), i.e. groundwater gradients induce spatial patterns of soil moisture and are estimated from the local topography and the soil transmissivity (Sivapalan et al., 1987).The original model (Famiglietti and Wood, 1994) was modified in 1997 to correct for deficiencies in the representation of the heat fluxes (e.g.ground heat flux) (Peters-Lidard et al., 1997), and in 1999 to expand the representation of the hydrological processes towards conditions in high latitudes (e.g.frozen ground and snow) (Pauwels and Wood, 1999).
A separate local water and energy balance equation is solved for each pixel to generate for each time step a spatial distribution of the water table depth, the soil moisture content, the surface temperature and the amount of water stored in the canopy.At the pixel scale, the soil column is partitioned into an upper root zone and a lower transmission zone.The soil moisture content in both layers (assumed uniformly with depth) is initialised based on the local water table depth and the assumption of an equilibrium moisture profile after which the soil moisture content is updated using the local soil water balance equations as described in Peters-Lidard et al. (1997).

Model parametrization
In TOPLATS, the soil properties are modelled through the closed-form analytical equations of Brooks and Corey (1964), which express the relationship between the soil moisture content θ [m 3 m −3 ], the hydraulic head ψ [m] and the hydraulic conductivity K [m s −1  ].The soil moisture retention curve (SMRC) and the hydraulic conductivity curve describe how ψ is related to θ and K, respectively.The shape of both curves is determined by the soil hydraulic parameters (SHPs): the residual soil moisture content θ r [m 3 m −3 ], the saturated soil moisture content θ s [m 3 m −3 ], the bubbling pressure ψ c [m], the pore size distribution index λ [−] and the saturated hydraulic conductivity K s [m s −1  ].When field measurements of the SHPs are not available, they are estimated based on soil textural information (soil type or particle size distribution) through application of either class or continuous pedotransfer functions (PTFs).Numerous PTFs have been proposed, reviewed and evaluated over the last decade (e.g.Tietje and Tapkenhinrichs, 1993;Wagner et al., 2001;Nemes et al., 2009), but the accuracy and reliability of the PTFs are highly variable (Loosvelt et al., 2011) and mainly depend on the similarity of the soil and climatic features between the region of PTF development and the region of PTF application.
In this study, the continuous PTFs of Rawls andBrakensiek (1985, 1989) (Table 1) are applied to estimate the SHPs for the Brooks and Corey (1964) model based on the sand content Z [%], the clay content C [%], and the soil porosity P [-].The latter is calculated from the bulk density ρ b [g cm −3 ] and the particle density ρ s [g cm −3 ] following the relationship P = 1 − ρ b /ρ s .The particle density is corrected for the presence of organic matter, for which a content of 3 % (Sleutel et al., 2006) and a density of 1.45 g cm −3 (Kaiser and Guggenberger, 2003;Mayer et al., 2004) is assumed.The bulk density, ρ b , is calculated following the procedure as described by Saxton and Rawls (2006).When applying the PTFs of Rawls andBrakensiek (1985, 1989), one should bare in mind that these PTFs were actually developed for textures with a clay content between 5 % and 60 % and a sand content between 5 % and 70 %.
In addition to the soil parameters (e.g.SHPs, soil resistance, heat capacity), TOPLATS has a large number of other model parameters among which the vegetation parameters (e.g.albedo, leaf area index, stomatal resistance) and the TOPMODEL parameters (e.g.saturated subsurface flow, initial water table depth) are the most important ones (see Sect. 2.1.2).

Data set
The hydrologic model is applied at a point location (with coordinates 50.89 • N and 4.09 • E) in the catchment of the Bellebeek (Belgium) in order to simulate the soil moisture content of the upper soil layer (5 cm) during the period 1 January 2006 to 31 December 2006, using an hourly time step.For the catchment, appropriate values for the parameters to estimate baseflow are taken from the literature (Samain et al., 2011): the subsurface flow at complete saturation is 6.31 m 3 s −1 , the exponential baseflow coefficient is 2.51 [-], and the initial average depth of the groundwater table is 1.51 m.The soil and land cover type registered at the simulation point are loam and bare soil, respectively.The meteorological variables wind speed, relative humidity, net radiation, atmospheric pressure and temperature (dry bulb, wet bulb, dew point) were registered with a temporal resolution of 10 to 60 min at the meteorological station of Liedekerke, which is situated near the outlet of the catchment.Missing data were complemented by measurements from nearby meteorological stations (at Gooik and Denderbelle, respectively 3 km south and 10 km north of the catchment).Measurements of incoming shortwave radiation were not available at the station of Liedekerke, but were calculated from the net radiation based on a regression (with a correlation coefficient of 0.96) between the shortwave and net radiation measured at a nearby meteorological station in Gooik.The meteorological records point out that the weather conditions in the catchment Table 1.Regression equations of the Rawls andBrakensiek (1985, 1989) pedotransfer functions to estimate the soil hydraulic parameters from the Brooks and Corey (1964) hydraulic model.
Notation: θ s is the saturated soil moisture content [m 3 m −3 ], θ r the residual soil moisture content [m 3 m −3 ], K s the saturated hydraulic conductivity [cm s −1 ], λ the pore size distribution [-], φ c the bubbling pressure [cm], C the clay content [%], Z the sand content [%] and P the porosity [-], calculated as 1 − ρ b /ρ s following the procedure as described in Saxton and Rawls (2006).
of the Bellebeek apply to a temperate climate with an annual mean temperature of 11.5 • C and a total annual rainfall of 750 mm.Furthermore, in situ soil moisture measurements (at 2.5 cm depth) taken between 13 May and 30 May 2007 are used to validate the model.

Basic concept and operations
Compositional or closed data are multivariate data, represented by positive real vectors of which the components sum up to a constant κ.The components of the vector show the relative weight or importance of the parts in a total, which means that compositional data carry only relative information.A typical example of compositional data is soil texture, which provides information on the relative portion of sand, clay and silt in a given soil sample, and of which the closed character implies that changing one portion causes the other portions to change as well, such that the sum of the portions remains equal to 100 %.The set of all possible compositions x with D components forms a simplex sample space, denoted as S D , and is defined as where x i is the i-th part of composition x, and κ is the closure constant of which the value is generally 1 (proportions) or 100 (percentage).For the specific problem setting in this study, the sample space is a simplex with κ = 100 and D = 3, as the soil texture encloses three different parts that sum up to 100 %.In the simplex, the composition p 0 with coordinates 100 3 , 100 3 , 100 3 is called the barycenter and can be conceived as the origin of the sample space.
Specific operations and statistical properties (e.g.distributions) for compositional data were introduced by Aitchison (1986) and further developed by Egozcue and Pawlowsky-Glahn (2006).The basic operations on the simplex that are relevant for the sensitivity analysis are summarised below.For a comprehensive description of these and other properties, the reader is referred to Aitchison (1982).
-Vector addition of composition x ∈ S D and composition y ∈ S D (also called perturbation) (Aitchison, 1986): For a detailed discussion on the visualization, the role and the interpretation of addition in the simplex, we refer to Aitchison and Ng (2005) and von Eynatten et al. (2002).
-Scalar multiplication of a composition x ∈ S D by a scalar λ ∈ R (also called power transformation) (Aitchison, 1986):  Aitchison, 1983): The Aitchison distance is a measure for the difference between two compositions x and y (Aitchison, 1992).
If one of the compositions corresponds to the barycenter (e.g. Furthermore, it is worth mentioning that coordinates in the vector space can be transformed into a Cartesian coordinate system.A frequently used transformation is the isometric logratio (ILR) transformation, which preserves all metric properties (Egozcue et al., 2003).Although a coordinate transformation is not required within the presented SA method, it will be used for a better understanding of the operations in the simplex and as an alternative approach for sensitivity analysis in case of high-dimensional compositions (D > 3).

Soil texture in the simplex
The texture of a soil sample x = (C, Z, L) is defined by the distribution of the soil particle sizes C (clay, diameter < 2 µm), Z (sand, diameter > 2 mm) and L (silt, 2 µm < diameter < 2 mm).Because the parts cannot vary independently (there are only two degrees of freedom), it is possible to visualise the soil texture, a 3-D composition, in two dimensions by means of an equivalent representation in the texture triangle (Fig. 1).This is an equilateral triangle with vertices at p 1 = (100, 0, 0), p 2 = (0, 100, 0) and p 3 = (0, 0, 100).The three vertices are defined counter-clockwise and are connected through the segments p 1 p 3 , p 3 p 2 and p 2 p 1 , scaled from 0 to 100.
In the texture triangle, three bisectors are defined as the straight lines through one of the vertices and the barycenter p 0 = 100 3 , 100 3 , 100 3 (see Fig. 1).The bisector through vertex p 1 , p 2 and p 3 is referred to as B 1 , B 2 , and B 3 , respectively, and has the property that the values of two parts of the composition are always equal on this line.

Local sensitivity analysis on compositional data
The aim of a local sensitivity analysis is to measure the effect of perturbing a specific composition x, i.e. inducing small relative changes to the composition, on the model output y.The sensitivity of y with respect to x is expressed as a sensitivity function that is defined as the derivative of y with respect to x and is evaluated at one particular value of x by using the finite difference approximation.Therefore, small changes in x need to be imposed that, considering the closed character of x, imply a change in each of its parts x i (i = 1, 2, ..., D) while maintaining D i=1 x i = κ.The model area where Rawls andBrakensiek [1985,1989] PTFs are valid  Rawls andBrakensiek (1985, 1989).
output y is said to be sensitive to model input x if small changes in x produce large changes in y.On the contrary, y is called insensitive to x if small changes in x have almost no effect on y.

Perturbing in the 2-D euclidean space
The methodology presented in this paper for perturbing compositional data is built by analogy with a perturbation in a two-dimensional Euclidean space.Suppose we want to simultaneously perturb two inputs x * 1 and x * 2 with a factor ξ , four possible outcomes are evident and are given by the Cartesian coordinates These are the intersections of the bisectors of the Cartesian coordinate system and the circle The circle defines all possible perturbations and is therefore further referred to as the perturbation circle.Since it is impossible to evaluate an infinite number of perturbations, only a limited set of perturbed points, e.g. the four points on the bisectors, can be considered.
This idea is adopted for the perturbation of a 3-D composition x = (x 1 , x 2 , x 3 ).Consider a random sample x from S 3 which is defined by its triangular coordinates in a ternary diagram (see Fig. 1).Through ILR transformation, the composition can be represented in the 2-D Euclidean space by L. Loosvelt et al.: Sensitivity analysis for compositional data means of the Cartesian coordinates (Fig. 2a): Likewise, any geometric shape on the ternary diagram can be transformed.Figure 2a shows that after ILR transformation, the bisectors B 1 , B 2 , and B 3 preserve their angles of 60 • (see Sect. 2.2.2) and the barycenter p 0 forms the origin of the Cartesian system in which the bisectors intersect.The perturbation of sample x with factor ξ can now be performed in the Euclidean space, following the methodology described above.First of all, the perturbation circle 2b).As only a perturbation in the directions given by the bisectors (further called perturbation axes) is considered, the directions of the bisectors are transferred to composition x by means of a translation.The perturbed points are then defined by the intersections x i+ and x i− between the translated bisectors B i (i ∈ {1, 2, .., D}) and the perturbation circle (Fig. 2b).Finally, the Cartesian coordinates of the perturbed compositions x i+ and x i− are back transformed to the simplex through an inverse ILR transformation (Egozcue et al., 2003) (Fig. 3, see further).

Perturbing in the simplex
Yet, in order to avoid the roundabout method of coordinate transformations, the operations in the Euclidean space are mimicked by operations in the simplex.This results in the following procedure to perturb a composition x with a constant factor ξ : 1. Perform one of the scalar multiplications x ± = (1±ξ ) x (Eq. 3) in order to rescale the composition x with a factor ξ .The scaling factor ξ determines the magnitude of the perturbation, i.e. the higher the value of ξ , the more the perturbed composition will deviate from the sampled composition x.

Calculate the Aitchison distance d
in order to quantify the difference between the sampled composition x and the rescaled compositions x + and x − .Note that for the same value of ξ , the value of d increases with increasing values of x A .
3. Define a circle with centre p 0 and radius d and determine the intersections between the circle and the perturbation axes, here the bisectors B 1 , B 2 , and B 3 (Fig. 3).For each axis in direction i ∈ {1, 2, .., D}, this problem is solved by searching for the compositions v i+ and v i− on the axis that satisfy the condition v i A = d (see Appendix).The resulting compositions are further called directional vectors because they are necessary to transfer the direction of the perturbation axes to the sampled composition x.
4. Add the directional vectors v i+ and v i− (i ∈ {1, 2, .., D}) to the composition x (Eq.2).This results in three pairs of new compositions {x i+ , x i− } that lie on the perturbation circle (Fig. 3).Since the performed operation preserves the distance in the simplex, the perturbation circle around x has radius d.Although its circular shape is distorted in the simplex (the further from the barycenter, the more distortion; examples of circles are shown in Fig. 3), the definition of a circle remains valid.
In summary, when perturbing composition x in the simplex S D with a fixed factor ξ following the methodology described above, we obtain three pairs of new compositions {x i+ , x i− } with i ∈ {1, 2, .., D} that are a subset of all possible perturbations, defined by the circle with centre x and radius d = d A (x, x i± ).Note that although M ∈ N other perturbation axes could have been chosen by selecting M points on a circle around the barycenter, either at random or such that they form angles of 360 2M degrees, and by connecting each of the selected points with the barycenter.In case of M perturbation axes, steps 4 and 5 from the methodology would respectively result in M pairs of directional vectors {v i+ , v i− } and M pairs of perturbed compositions {x i+ , x i− }.In this study, we selected the bisectors as perturbation axes (hence, M = 3) such that the perturbed compositions (i) define angles of 60 • degrees on the perturbation circle, and (ii) lie on the translated bisectors B i , connecting x with vertex p i (see Fig. 3).The compositional lines B i also illustrate the effect of increasing (or decreasing) the magnitude of perturbation in direction i, as the perturbed compositions always lie on this line but shift towards (or away from) vertex p i .
Note that generalizing the methodology might raise some difficulties (Egozcue, 2012).For M > 3, calculating the directional vectors in the ternary diagram becomes complicated.In this case, it is advised to rely on a simplicial expression that computes the ILR coordinates of the directional vectors as the intersection points of the circle with radius d and center p 0 and M perturbation axes, regularly distributed on the circle.Through inverse ILR transformation, the directional vectors are expressed in the simplex.For D > 3, e.g. when soil texture is described with more than three parts, the perturbation circle is not easily generalised to a sphere or hyper-sphere and the distribution of the perturbation axes might raise difficulties (Egozcue, 2012).

Calculating the sensitivity index
The methodology for perturbing a composition in the simplex allows to approximate the sensitivity function by means of the finite difference technique.The sensitivity function can be further summarised into a sensitivity index in order to express the sensitivity of y to small changes in x by means of a single value.In this section, it is described how the sensitivity index is evaluated at one particular value of x.In the simplex, a composition x is sampled uniformly at random from the sample space.Therefore, a Dirichlet distribution is defined, which is the multivariate generalisation of the beta distribution and is parametrized by a vector α.The density function of the Dirichlet distribution is given by with x = (x 1 , ..., x D ) a sample from S D , α = (α 1 , ..., α D ) the parameter vector and D the dimension of the sample space.In order to guarantee that the composition is sampled uniformly at random, the condition α = 1 should be fulfilled.The sampled composition x is thereupon perturbed in M different directions following the methodology described in Sect.2.3.2.
After sampling and perturbing x, the model output y t is determined at time step t for both the sampled composition x and the perturbed compositions x i+ and x i− .For each direction i given by the perturbation axes, a forward and backward directional sensitivity function, respectively denoted as ∇ v i+ y t (x) and ∇ v i− y t (x), are calculated using the finite difference technique:  4).Averaging both functions leads to a central, directional sensitivity function ∇ i y t (x), which indicates the average change in y caused by opposite changes in x in the direction of perturbation axis i: As the sensitivity function itself is not useful for sensitivity analysis (Saltelli et al., 2008b), it is summarised into an omnidirectional local sensitivity index S (by analogy with the sensitivity index proposed by Hill and Tiedeman ( 2007)): with N the number of time steps in the model output and M the number of perturbation axes (here M = 3).The sensitivity index is hence a single value that reflects the average response of y when x is perturbed with a fixed factor ξ in M different directions, each covering two opposite changes in x.It is calculated as the root mean squared difference in the model output resulting from two oppositely perturbed compositions, averaged over the different perturbation directions.
As such, the sensitivity index can be easily updated when ∇ i y t (x) is calculated for additional perturbation directions, i.e.M > 3.In case the model output y is time-independent, i.e.N = 1, then S reduces to the mean absolute difference in the model output resulting from two oppositely perturbed compositions.An overview of the methodology to calculate the sensitivity index is given in Algorithm 1.
Note that for D > 3, a large number of points on the perturbation (hyper-)sphere will be required to compute the sensitivity index.In this case, computation of the sensitivity function can be simplified by estimating the directional derivatives based on the ILR coordinates of the sampled composition: As such, any directional derivative can be computed if for each of the D-1 orthogonal axis on the ILR coordinate space, the gradient ∂y t /∂x * has been determined (Egozcue, 2012).

Optimizing the perturbation factor
The choice of the perturbation factor ξ determines the quality of the sensitivity function: the smaller its value, the better the (Eq.9) end finite difference scheme approximates the derivative.However, if ξ is taken too small it might give rise to numerical errors.If ξ is taken too large, errors due to model nonlinearities might be introduced in the analysis (De Pauw and Vanrolleghem, 2006).Therefore, an optimization procedure is included in the presented SA framework.The basic idea is to make both types of error as small as possible by minimizing the difference in model sensitivity when inducing opposite changes in the model input, i.e. the difference between the sensitivity functions ∇ v i+ y t (x) and ∇ v i− y t (x) should be as small as possible when perturbing in direction i.Although several measures can be used to quantify this difference, the sum of squared difference between their absolute values is selected in this study and is denoted as C i (x): By taking the absolute value of the sensitivity functions, we allow that opposite changes in x result in non-opposite, but similar model responses.Since the sensitivity analysis explores M directions on the perturbation circle, M values of C i (x) are obtained on which the minimization procedure needs to be carried out.In order to solve this optimization problem, the maximum value over all C i (x) (with i ∈ {1, 2, .., M}), further called C max (x), is used as an objective function.For the SA problem in this study, it is chosen to limit ξ to a set of fixed values, such that ξ ∈ {10 −5 , 10 −4 , 10 −3 , 10 −2 , 10 select the value of ξ for which C max (x) is minimal; end the methodology to optimise the perturbation factor for a sampled composition is given in Algorithm 2.

Results and discussion
The experimental set-up consists of two main steps: in the first step, 100 textures are sampled from the texture triangle to determine the optimal factor ξ for perturbing soil texture.In the second step, 5000 textures are sampled from the texture triangle and are used as input to the hydrologic model to evaluate the response of the simulated soil moisture when texture is perturbed with the optimal perturbation factor.

Identification of the optimal perturbation factor
Hundred compositions are sampled from the texture triangle according to a Dirichlet distribution (with α = 1).For each sampled texture, the perturbation factor ξ is evaluated for the values in {10 −5 , 10 −4 , 10 −3 , 10 −2 , 10 −1 } by applying the methodology as described in Algorithm 2. For each ξ , we obtain 100 values of C max of which the mean, the minimum and the maximum are shown in Fig. 4b.The mean value of C max clearly shows a minimum for ξ = 10 −2 , which suggests that this value is optimal for perturbing a broad range of textures.Although the minimum value of C max is lowest for ξ = 10 −4 , this perturbation factor can be discarded as being optimal since the spread over the different values of C max is very large for this value of ξ .Or stated differently: when ξ = 10 −4 would be selected as optimal, it would result in a very low value of C max for only a limited number of textures, whereas the majority of the samples would be characterised by a larger value of C max .These findings are in correspondence with the frequency distribution of the optimal ξ -values (Fig. 4b), which shows that for the major part (about 65 %) of the samples, C max is minimal when they are perturbed with 10 −2 .For 25 % and 10 % of the samples, the optimal value of ξ is respectively smaller and larger than 10 −2 .The samples from the group with an optimal ξ smaller than 10 −2 show a relatively heterogeneous distribution in the texture triangle (see Fig. 5) with the highest concentration around textures with a clay content above 40 % or a sand content between 20 % and 50 %.The samples from the group with an optimal ξ larger than 10 −2 are mainly located around textures with a sand and clay content of 30 %.
For practical purposes, it is chosen to use a fixed value of 10 −2 for the perturbation factor, as it would unnecessarily increase the complexity of the sensitivity analysis when making ξ dependent of the sampled texture.Consequently, deviations from the optimal value are mainly located within the texture classes clay loam and loam (Fig. 5).

Identification of sensitive regions in the texture triangle
Identifying sensitive regions in the texture triangle with respect to the estimation of soil parameters or with respect to the prediction of soil moisture is useful for the model user as it allows to reduce the uncertainty in the predicted variable.Since standardly available soil information is often limited to a soil map of the study area and a number of sparsely distributed soil texture measurements, the measurement is assumed to be representative for the corresponding soil map unit.Consequently, the model user attributes the same particle size distribution to all locations falling into that soil map unit, whereas the soil texture at a location different from the measurement point, but within the same soil map unit, may (largely) deviate from the sampled texture.Although it is assumed that the spatial variability within a homogeneous soil map unit covers only a minor part of the total variability in texture that is enclosed within the definition of the corresponding soil type, the discrepancy between the scale of measurement and the scale of model application might introduce large uncertainties in the model output.If large uncertainties in either the estimated SHPs or the simulated soil moisture are not acceptable (depending on the objective of the study), the uncertainty about the potential bias in the measured soil texture due to spatial variability should be further reduced through additional data collection.If the pattern in sensitivity is identified, the following rule of thumb to prioritize additional data collection can be applied: "If the sampled texture, which is assumed to be representative for a certain soil map unit, is located within a region of high sensitivity in the texture triangle, then additional texture samples within the area corresponding to this soil map unit, as delineated on the soil map, should be taken".By accounting for the spatial variability, a more accurate estimate of the representative (most probable) texture for the given soil map unit can be formulated and can be used to reduce the uncertainty in the model output.On the contrary, if the sampled texture is located within a region of low sensitivity in the texture triangle, the discrepancy between the scale of measurement and the scale of model application will have a low impact on the predicted variable, and taking additional samples may therefore be discarded, unless a high spatial variability in soil texture exists.

Sensitivity of soil hydraulic parameters
Five thousand compositions T = (C, Z, L) are sampled from the texture triangle according to a Dirichlet distribution (with α = 1) and are perturbed with ξ = 10 −2 .For both the sampled and perturbed compositions, the corresponding SHPs are estimated with the PTFs of Rawls and Brakensiek (1985), on which the sensitivity function ∇ i SHP(T ) with i ∈ {1, 2, 3} and the sensitivity indices S SHP are calculated by applying the methodology as described in Algorithm 1.A contour plot of S SHP (Fig. 6a-e) reveals that the sensitivity pattern highly depends on the parameter under consideration, although the patterns in S θ s and S ψ b show a remarkable resemblance.For these parameters, the hot spot of high sensitivity is located around textures with a clay content of 60-80 % and a sand content of 20-40 %.The sensitivities S θ r and S λ show a pattern that is highly dominated by the clay content: an increase in the clay content causes an increase (decrease) in the sensitivity if the clay content is lower (higher) than 30 %.Although the clay content of the hot spot matches for S θ r and S λ , the corresponding sand content is different: around 0 % for the former and around 70 % for the latter.On the contrary, the pattern in S K s is dominated by the sand content: the higher the sand content, the higher the sensitivity.The order of magnitude of the sensitivity index should be interpreted with respect to the corresponding SHP.Therefore, the mean predicted SHP over the entire texture triangle is given as a reference in Fig. 6.Also note that results outside the validity zone of the PTFs should be interpreted with care (see Sect. 2.1.1 and Fig. 1).This zone is indicated on the contour plots in Fig. 6.In summary, the potential uncertainty in the predicted SHPs due to the discrepancy in scale between measurement and model application highly varies across the texture triangle and among the different SHPs, making it very difficult to formulate general guidelines to reduce the uncertainty in the predicted SHPs.

Sensitivity of soil moisture
After determining the sensitivity of the estimated SHPs to soil texture, the SHPs are used as input to the hydrologic model TOPLATS (run in spatially distributed mode) in order to simulate the daily soil moisture content θ t during the year 2006 at the simulation location (see Sect. 2.1.2) under Belgian weather conditions (see Sect. 2.1.2).The 5000 sampled textures and their perturbed textures are successively attributed to the simulation location on which the corresponding sensitivity functions ∇ i θ t (T ) with i ∈ {1, 2, 3} and sensitivity index S θ are calculated as described in Algorithm 1.
Figure 6f shows a contour plot of S θ as a function of T and reveals a rather simple sensitivity pattern.For textures with a clay content lower than 35 % or higher than 70 %, the sensitivity is strongly determined by the clay content.In case C < 35 %, the textural sensitivity increases with increasing values of the clay content, whereas in case C > 70 % the textural sensitivity decreases with increasing values of the clay content.For soils with a clay content between 35 % and 70 %, S θ is also highly influenced by the percentage of sand in the soil.The hot spot of high sensitivity is located around textures with a clay and sand content of 55 % and 45 %, respectively.This means that for these measured textures, the potential uncertainty in θ that is associated with the scaling issue will be the highest, but can, however, be reduced through additional data collection.

Evaluation of the USDA class as sensitivity region
The objective of this section is to investigate whether a soil map of a region with indication of the USDA soil classes can be used as a rudimentary tool to set up the texture sampling strategy prior to data collection.As such, the discrepancy between the scale of measurement and the scale of model application within that region is optimally managed with respect to the uncertainty in the model prediction.For the prediction in a region that goes together with an USDA soil class that is attributed a high sensitivity towards soil texture, it is important to reduce the uncertainty on the textural variability within that region.Obviously, sufficient samples should be taken to accurately estimate the representative texture.By using the representative texture, a lower model prediction uncertainty is achieved.Otherwise, if the soil map indicates that the prediction will take place in a region where the soil class shows a low sensitivity towards texture, then resources can be saved and data collection can be limited to a single soil sample.However, this strategy is only valid under the assumption that the textural variability within that region is low.
In order to associate regions of high and low sensitivity of the SHP estimation with the commonly used USDA soil classification, S SHP is averaged over the samples falling into the same USDA soil class, further denoted as S SHP (Table 2).For θ s and ψ b , the soil class corresponding to the highest S SHP is clay, whereas for θ r , λ and K s this is respectively silt loam, sandy clay loam and sandy loam, which are also the classes that contain the hot spot of high sensitivity for their respective SHP (Fig. 6).However, the hot spot only covers a part of the entire soil class, such that advising a higher sampling density to formulate a representative texture for the corresponding soil map units will not always be relevant.Based on these results, we may argue that the USDA classification is only useful as a preliminary indication for the sensitivity in SHP prediction because the variation in sensitivity within an USDA class is often high.As a consequence, the USDA soil map is suboptimal when used as a tool to optimise the sampling strategy with respect to the potential uncertainty in the estimated SHPs that is associated with the scaling issue.These findings call for a refinement of the USDA soil classes as they seem to be too rough to accurately describe the soil texture.In addition, the definition of soil texture should evolve towards a representation with more than three grain classes.
By analogy, S θ is averaged over each USDA soil class and the resulting S θ is shown together with its standard deviation in Fig. 7. Based on the total range in S θ , four sensitivity classes are defined: low sensitivity (0 ≤ S θ < 0.04), medium sensitivity (0.04 ≤ S θ < 0.08), high sensitivity (0.08 ≤ S θ < 0.12) and very high sensitivity (0.12 ≤ S θ ).The soil class sandy clay is attributed the highest S θ and falls into the high sensitivity class, which is obvious as this class contains the hot spot of high sensitivity (see Fig. 6).However, the variation in sensitivity within that soil class is very large and    32 Fig. 7. Average sensitivity index S θ for the 12 USDA soil classes with indication of the standard deviation and the sensitivity classes low (0 ≤ S θ < 0.04), medium (0.04 ≤ S θ < 0.08), high (0.08 ≤ S θ < 0.12) and very high (0.12 ≤ S θ ).
ranges from medium to very high.Also other soil classes enclose more than one sensitivity class, e.g.clay and silt, which would require to re(de)fine those soil classes with respect to their sensitivity.On the contrary, some USDA classes completely fall within a single sensitivity class.The classes loamy sand and sand therefore correctly represent a low sensitivity, whereas the classes loam, silty clay loam and silty clay represent a medium sensitivity.This supports the earlier findings to use the USDA soil classification (and hence the USDA soil map) only as a preliminary indication of the model output sensitivity towards textural changes.The dominant sensitivity class that is associated with the USDA class is then an indication of the sampling density needed to formulate a representative texture for the given USDA class.For example the clayey soil classes (e.g.sandy clay, clay loam and clay) will require a higher sampling density.

Identification of the hydrologic model behaviour
The scatterplot in Fig. 8a shows how the model response S θ is related to the average annual soil moisture content θ avg = 1 N N t=1 θ t simulated with TOPLATS, from which it is clear that very high model sensitivities are only expected if the simulated soil moisture has a value between 0.2 and 0.4, with a maximum around 0.3.On the contrary, low sensitivities occur for both very low (θ avg < 0.2) and very high soil moisture contents (θ avg > 0.45).This suggests that the more extreme (dry or wet) the soil moisture content becomes, the less uncertainty in the simulation result is involved when there is a discrepancy between the scale of texture measurement and the scale of model application.Similarly, scatterplots between S θ and the SHPs (Fig. 8b-f) reveal that the sensitivity can only be very high if the SHPs take specific values: θ s , θ r , ψ b , λ and K s should be within the range [0.42, 0.49], [0.1, 0.12], [0.05, 0.6], [0.4,1] and [5 × 10 −8 , 5 × 10 −6 ], respectively.The parameter values for which a maximum in S θ is recorded, are combined to construct the SMRC that involves the highest uncertainty in θ.The so-called "high sensitivity" SMRC shows a rather linear behaviour (Fig. 9) that is characteristic for fine-textured soils with a low effective porosity, i.e. θ s − θ r .For the sake of simplicity, it can be said that this SMRC corresponds to low values of θ s , ψ b and λ, and a high value of θ r .On the contrary, a low sensitivity of the simulated soil moisture is not exclusively related to specific values of the SHPs, since for a broad range of SHP values the corresponding S θ falls into the low sensitivity class.Nevertheless, it is observed that the more the SHP values deviate from the specified range that gives rise to a very high sensitivity, S θ shifts towards the low sensitivity class.This means that the sensitivity of the simulated soil moisture is certainly low in case θ s , ψ b and λ have a high value and θ r has a low value.The so-called "low sensitivity" SMRC that results from this soil parameter combination is characteristic   for coarse-textured soils with a high effective porosity and shows a highly nonlinear behaviour.

Conclusions
Considering the omnipresence of compositional data in the geosciences, we developed a method to perform a local sensitivity analysis on compositional model inputs.As the different parts of the input vary simultaneously, while preserving the closed character of the input, this method allows to abandon incorrect practice of OAT-SA.In the presented SA method, a sensitivity index is calculated based on the finite difference technique to approximate the directional derivatives of the model output with respect to the compositional model input.Local perturbations of the compositions were realised by operations in the simplex (for complex SA problems we suggest to implement the alternative approach using ILR coordinates) and we relied on the assumption that all possible perturbations are defined by a perturbation circle.Additionally, we supplemented the SA method with a procedure to optimise the perturbation factor in order to minimise numerical errors and errors due to model nonlinearities.
The SA method was subsequently applied to a hydrologic model to assess the sensitivity of the simulated soil moisture content to changes in soil texture, for a high number of compositions in the texture triangle.In a first step, we found that the optimal factor to perturb soil texture is 10 −2 .Although this value was found to be optimal in 65 % of the cases, it was chosen to use a fixed value of ξ in order not to unnecessarily complicate the sensitivity analysis.However, one should be aware that in 10 % of the cases this value is too low and might introduce numerical errors in the sensitivity analysis, and that in 25 % of the cases this value is too high, which might result in errors due to the nonlinear behaviour of the hydrologic model.Especially near the borders of the ternary diagram, deviation of the perturbation factor from its optimal value might affect the sensitivity analysis.A perturbation factor of 10 −2 was used to perform a local SA on 5000 different textures, sampled according to a Dirichlet distribution from the texture triangle.The analysed models are the PTFs of Rawls and Brakensiek (1985) and the hydrologic model TOPLATS of which the generated outputs are respectively the soil hydraulic parameters and the soil moisture content.Based on these model applications, the sensitivity index was calculated for both model outputs and was evaluated with respect to the position of the sampled texture in the texture triangle.
The results of the sensitivity analysis were found to be useful (i) to reduce the uncertainty on the modelled output when there is a discrepancy between the scale of measurement and the scale of model application and (ii) to gain more insight into the behaviour of the applied model, and more specifically on how it reacts on changes in the soil texture with respect to its position in the texture triangle.As such, we found that the simulated soil moisture is most sensitive to soil texture when the measured clay content is around 55% and the sand content around 45 %.This means that the potential uncertainty that is involved with the scaling issue will be the highest under these textural conditions.Therefore, when high uncertainties in the modelled output are not acceptable, it is advised to take one or more additional texture samples within the soil map unit that encloses the original sample such that a better estimate of the most probable texture can be formulated.Similarly, we identified zones of high sensitivity for the soil parameters, showing a high variability in their sensitivity pattern.We also investigated whether a soil map with indication of the USDA soil classes can be used as a tool to optimise the texture sampling strategy by reviewing the USDA soil classification with respect to the pattern in model output sensitivity.The results point out that USDA classes are only useful as a rudimentary indication for the sensitivity as they distinguish between high and low sensitivity, but comprise a large within-class-variability of the sensitivity.Especially the clayey soil classes sandy clay, clay loam and clay involve high to very high sensitivities, such that it is advised to apply a high(er) sampling density within these soil map units to calculate the representative texture.Furthermore, we were able to relate S θ to the shape of the soil moisture retention curve and recorded the highest sensitivity when the values of θ s , ψ b and λ are low and the value of θ r is high.The resulting curve is characteristic for fine-textured soils with a low effective porosity and shows a rather linear behaviour.

-
Aitchison distance between composition x ∈ S D and composition y ∈ S D (

)
Representation of the baricenter p 0 , the sampled composition x and the bisectors B 1 , B 2 and B 3 transformation and (b) illustration of perturbation in the 2D Euclidean space with indication of the compositions {x i+ ,x i− } and the translated bisectors B 1 , B 2 and B 3

Fig. 2 .
Fig. 2. (a) Representation of the baricenter p 0 , the sampled composition x and the bisectors B 1 , B after ILR transformation and (b) illustration of perturbation in the 2D Euclidean space with indica perturbed compositions {x i+ ,x i− } and the translated bisectors B 1 , B 2 and B 3 . 27

Fig. 2 .
Fig. 2. (a) Representation of the barycenter p 0 , the sampled composition x and the bisectors B 1 , B 2 and B 3 after ILR transformation and (b) illustration of perturbation in the 2-D Euclidean space with indication of the perturbed compositions {x i+ , x i− } and the translated bisectors B 1 , B 2 and B 3 .

Fig. 3 .
Fig. 3. Perturbation in the simplex of a composition x sampled from the texture triangle with indication of the directional vectors {vi+,vi−}, the perturbed compositions {xi+,xi−}, the bisectors B1, B2 and B3 and the translated bisectors B 1 , B 2 and B 3 ; illustration of compositional circles at different locations in the texture triangle.

Fig. 3 .
Fig. 3. Perturbation in the simplex of a composition x sampled from the texture triangle with indication of the directional vectors {v i+ , v i− }, the perturbed compositions {x i+ , x i− }, the bisectors B 1 , B 2 and B 3 and the translated bisectors B 1 , B 2 and B 3 ; illustration of compositional circles at different locations in the texture triangle.

Fig. 4 . 29 Fig. 4 .Fig. 5 .Fig. 5 .
Fig. 4. The mean (full line), minimum and maximum (dotted lines) values of C max as a function o perturbation factor ξ (a) and the frequency distribution of the optimal perturbation factor (b), for 100 sam compositions.

Fig. 6 .
Fig. 6.Contour plot of the sensitivity index across the texture triangle for the estimated soil hydraulic parameters (a)-(e) θ s (mean is 0.18 m 3 • m −3 ), θ r (mean is 0.03 m 3 • m −3 ), ψ b (0.16 m), λ (mean is 0.49), and log 10 K s (mean is −6.11m • s −1 ) and for the simulated soil moisture content θ (f).Results outside the validity zone of the PTFs (indicated by a white line) and near the borders of the triangle should be interpreted with care.

Fig. 6 .
Fig.6.Contour plot of the sensitivity index across the texture triangle for the estimated soil hydraulic parameters (a-e) θ s (mean is 0.18 m 3 m −3 ), θ r (mean is 0.03 m 3 m −3 ), ψ b (0.16 m), λ (mean is 0.49), and log 10 K s (mean is −6.11 m s −1 ) and for the simulated soil moisture content θ (f).Results outside the validity zone of the PTFs (indicated by a white line) and near the borders of the triangle should be interpreted with care.
moisture retention curve (SMRC) for which the sensitivity of the simulated soil moisture to textural he highest and the lowest, respectively. 34

Fig. 9 .
Fig.9.Soil moisture retention curve (SMRC) for which the sensitivity of the simulated soil moisture to textural changes is the highest and the lowest, respectively.