Systematic comparison of five machine-learning methods in classification and interpolation of soil particle size fractions using different transformed data

Soil texture and soil particle size fractions (psf) play an increasing role in physical, chemical and hydrological 10 processes. Digital soil mapping using machine-learning methods was widely applied to generate more detailed prediction of qualitative or quantitative outputs than traditional soil-mapping methods in soil science. As compositional data, interpolation of soil psf combined with log ratio approaches was developed to improve the prediction accuracy, which also can be used to indirectly derive soil texture. However, few reports systematically analyzed and compared the classification and regression, the accuracies of original (untransformed) and log ratio approaches, and the performance of direct and indirect soil texture 15 classification using machine-learning methods. In this total, a total of 45 evaluation models generated from five different machine-learning models combined with original and three log ratio approaches—additive log ratio, centered log ratio and isometric log ratio (ALR, CLR and ILR, respectively), to evaluate and compare the performance of soil texture classification and soil psf interpolation. The results demonstrated that log ratio approaches modified the soil sampling data more symmetrically, and with respect to soil texture classification, random forest (RF) and extreme gradient boosting (XGB) showed 20 notable consequences. For soil psf interpolation, RF delivered the best performance among five machine-learning models with lowest root mean squared error (RMSE, sand: 15.09 %, silt: 13.86 %, clay: 6.31 %), mean absolute error (MAE, sand: 10.65 %, silt: 9.99 %, clay: 5.00 %), Aitchison distance (AD, 0.84) and standardized residual sum of squares (STRESS, 0.61), and highest coefficient of determination (R, sand: 53.28 %, silt: 45.77 %, clay: 53.75 %). STRESS was improved using log ratio approaches, especially CLR and ILR. There is a pronounced improvement (21.3 %) in the kappa coefficient using indirect soil 25 texture classification compared to the direct approach. Our systematic comparison helps to elucidate the processing and selection of compositional data in spatial simulation. 1 Abbreviations: psf, soil particle-size fractions; HRB, Heihe River Basin; DSM, digital soil mapping; KNN, k-nearest neighbor; MLP, multilayer perceptron neural network; RF, random forest; SVM, support vector machines; XGB, extreme gradient boosting; ALR, additive log-ratio; CLR, centered log-ratio; ILR, isometric log-ratio; ORI, original; ROC, receiver Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2018-584 Manuscript under review for journal Hydrol. Earth Syst. Sci. Discussion started: 11 February 2019 c © Author(s) 2019. CC BY 4.0 License.

classification or regression problems. However, few studies systematically analyzed both soil texture classification and soil PSFs interpolation using different machine-learning methods.
The soil PSFs, which can be classified as soil texture, are not only continuous variables but also compositional data -the sum constant (1 or 100 %) should be guaranteed. Soil PSFs data, including three dimensions, are typical compositional data, these individual variables in the data set are not independent of each other, which are related by being expressed as a percentage 5 (Filzmoser et al., 2009). Because of the spurious correlations between components, different results would occur on different measurement scales, which makes more complicated interpretation (Abdi et al., 2015;Reimann and Filzmoser, 2000).
Indicators and statistical methods defined in the Euclidean geometry or based on Euclidean distances could reveal misleading or biased results (Butler, 1979). Numerous different interpretations of compositional data in soil science have been suggested (Gobin et al., 2001;Salazar et al., 2015;Tolosana-Delgado et al., 2019;Hengl et al., 2018), and the most extensively used 10 method was a combination of log ratio transformation methods involving the additive log ratio (ALR) and the centered log ratio (CLR) put forward by Aitchison (1982), as well as the isometric log ratio (ILR) from Egozcue et al. (2003). Soil PSFs can be predicted using models such as multiple linear regression (Huang et al., 2014) and kriging (Wang and Shi, 2018;Zhang et al., 2013) combining with log ratio transformation methods. However, few studies conducted systematic comparison of accuracy, strengths and weaknesses for different machine-learning methods combing with original (untransformed) data and 15 different log ratio transformed data.
Soil texture classification can be predicted by machine-learning methods directly, and can be derived indirectly from soil PSFs. For the direct soil texture classification, tree-based model such as RF and classification tree (CT) performed better than multinomial logistic regression, support vector machines (SVM) and artificial neural network (ANN) (Camera et al., 2017;Wu et al., 2018). For the indirect classification of soil texture, Poggio and Gimona (2017) combined hybrid geostatistical 20 generalized additive models with ALR and modeled soil particle classes at 250 m resolution in Scotland, expecting that vegetation index, morphological features and information about the phenological season were of vital significance as environmental covariates. Considering the particularity of compositional data, the results of soil PSFs classification and regression could be compared from the direct and indirect soil texture classification in terms of the relationship between soil texture and soil PSFs. Nevertheless, few studies systematically compared the different machine-learning methods for both 25 direct and indirect soil texture classification.
In our study, five machine-learning models -K-nearest neighbor (KNN), multilayer perceptron neural network (MLP), RF, SVM, and extreme gradient boosting (XGB) -were applied for soil texture classification and soil PSFs interpolation.
Furthermore, the original and log ratio transformed data were also combined with these five machine-learning methods for soil PSFs interpolation. Hence, the objectives of this study are (i) to compare the performance of five machine-learning models for 30 soil texture classification and soil PSFs interpolation, (ii) to evaluate the performance of machine-learning models using original and different log ratio transformed data for soil PSFs interpolation, and (iii) to estimate the performance of direct and indirect soil texture classification using these methods. https://doi.org/10.5194/hess-2019-648 Preprint. Discussion started: 6 January 2020 c Author(s) 2020. CC BY 4.0 License. https://doi.org/10.5194/hess-2019-648 Preprint. Discussion started: 6 January 2020 c Author(s) 2020. CC BY 4.0 License.

Soil sampling
A total of 640 soil sampling points was collected in the HRB from the Science Data Center of Cold and Arid Regions (WestDC) in China (http://westdc.westgis.ac.cn/), involving 392 soil sampling points on the upper reaches and 248 soil sampling points on the middle and lower reaches of the HRB (Fig. 1b). The soil types, vegetation types, distribution of DEM and geomorphology types of the HRB were considered in soil sample collection according to the location and proportion of these 5 types for the purpose of more representative spatial characteristics of soil PSFs using limited soil samples. There were more soil sampling points on the middle and upper reaches of HRB due to the more complicated soil types and vegetation types in these areas. In contrast, the types on the lower reaches are relatively similar with more desert in the northwest. Hence, the east of the lower reaches of the HRB contained more soil sampling points. All soil samples had information about soil PSFs using Malvern Mastersizer 2000 laser diffraction particle size analyzer (average measurement error is less than 3 %). The global 10 position system (GPS) information and related environmental covariates were recorded. Purposive sampling was used as the sampling strategy to collect soil samples and to characterize the spatial variability of soil PSFs especially on such a regional scale of the study area. In this strategy, sample sites were chosen based on the variability of soil formation factors, which represented the heterogeneity of the soil PSFs in the HRB such as the distribution of climate and categorical maps etc. To reduce the noise effect of soil sample, the average of mixed 3 -5 topsoil (0 -20 cm) samples for each soil sample and its 15 parallel sample was used as the final measurement. Subsequently, the samples were dried, analyzed and measurement of soil PSFs (approximately 30 g of each sample).

Environmental covariates and pre-processing
The environmental covariates, such as topographic variables, remote sensing variables, climate and position variables, soil physicochemical variables and categorical maps, are related to the distributions of soil PSFs. System for Automated 20 Geoscientific Analysis (SAGA) GIS (Conrad et al., 2015) was used to compute the topographic variables from DEM, including slope, aspect, convergence index, general curvature, plane curvature, profile curvature and valley depth. Remote sensing variables, including the normalized difference vegetation index (NDVI) (Huete et al., 2002), the Brightness index (BI) (Metternicht and Zinck, 2003), and the soil adjusted vegetation index (SAVI) (Huete, 1988) were derived from the Landsat 7 based on band operation. We also collected climate variables from the National Meteorological Information Center (NMIC,25 http://data.cma.cn/) such as the mean annual precipitation and the mean annual temperature. Latitude and longitude were also considered because of the large region of the HRB. Mean annual surface evapotranspiration variable (Wu et al., 2012) were gathered from WestDC (http://westdc.westgis.ac.cn/) as well as soil physicochemical variables -soil organic carbon, saturated water content, field water holding capacity, wilt water content, saturated hydraulic conductivity, and soil thickness (Yi et al., 2015;Song et al., 2016;Yang et al., 2016). Additionally, the categorical maps, which were of significance such as 30 geomorphology types, soil types, land use types and vegetation types were also used ( Fig. 1).

K-nearest neighbor
K-nearest neighbor (KNN) is a simple non-parametric classifier based on the known instance to label unknown instance (Cover and Hart, 1967). For the test set, K-nearest training set vectors (k) were found, and maximum summed kernel densities were computed for classification. Moreover, continuous variables can also be predicted for regression with the average values of K-5 nearest neighbors. The parameters of KNN contain the maximum value of k (kmax), the distances of the nearest neighbors (distance) and the types of a kernel function (kernel). The KNN model is available in the R package "kknn" (Schliep and Hechenbichler, 2016).

Multilayer perceptron neural network
Multilayer perceptron neural network (MLP), which was currently one of the most commonly multilayer feedforward 10 backpropagation networks (Zhang et al., 2018), was selected to train artificial neural network (ANN) models in our study due to its rapid operation, the small set of training requirements and ease of implementation (Subasi, 2007). MLP neurons can perform classification or regression depending on whether the response variable is categorical or continuous. The MLP has three sequential layers: input layer, hidden layer and output layer. The resilient backpropagation algorithm was chosen because the learning rate of this algorithm was adaptive, avoiding oscillations and accelerating the learning process (Behrens and 15 Scholten, 2006). The range of the data set should be standardized because MLPs operate in terms of scale 0 to 1. MLP can be run using the R package "RSNNS" (Bergmeir and Benitez, 2012).

Random forest
Random forest (RF) was developed by Breiman (2001), combining the bagging method (Breiman, 1996) with the random variable selection, and the principle was to merge a group of "weak learners" together to form a "strong learner". Bootstrap 20 sampling is used for each tree of RF, and the rules to binary split data are different for regression and classification problems.
For classification, the Gini index is used to split the data; for regression, minimizing the sum of the squares of the mean deviations can be selected to train each tree model. Benefits of using RFs are that the ensembles of trees are used without pruning. In addition, RF is relatively robust to overfitting, and standardization or normalization is not necessary because it is insensitive to the range of value. Two parameters should be adjusted for the RF model: the number of trees (ntree) and the 25 number of features randomly sampled at each split (mtry). The RF model is available in the R package "randomForest" (Liaw and Wiener, 2002). https://doi.org/10.5194/hess-2019-648 Preprint. Discussion started: 6 January 2020 c Author(s) 2020. CC BY 4.0 License.

Support vector machine
Support vector machine (SVM), proposed by Cortes and Vapnik (1995), is a type of generalized linear classifier that is widely applied for classification and regression problems in soil science (Burges, 1998). The main principle of SVM is to classify different classes by constructing an optimal separating hyperplane in the feature space (so-called "structural risk minimization"). Regression problems also can be solved by minimization of the structural risk using loss functions (Vapnik, 5 1998) in SVM, named support vector regression. The advantages of SVM are that they are effective in high dimensional spaces.
Linear function was selected for SVM as the kernel function in our study, and two other parameters need to be tuned, i.e., cost and gamma, controlling the tradeoff between the classification accuracy and complexity, and the ranges of radial effect, respectively. The SVM model is available in the R package "e1071" (Meyer et al., 2017).

Extreme gradient boosting 10
Extreme gradient boosting, put forward by Chen and Guestrin (2016), is an efficient method of implementation for gradient boosting frames, tree learning algorithms, and efficient linear model solvers to solve both classification and regression problems (Chen et al., 2018). Like the boosted regression trees (Elith et al., 2008), it follows the principle of gradient enhancement; however, more regularized model formalization is applied to XGB to control over-fitting, making it perform better in terms of accuracy assessment. The residuals of the first tree can be fitted by the second tree to enhance the model 15 accuracy and the sum of the prediction of each tree generates the ultimate prediction. There are seven parameters in XGBthe learning rate (eta), the maximum depth of a tree (max_depth), the max number of boosting iterations (nrounds), the subsample ratio of columns (colsample_bytree), the subsample ratio of the training instance (subsample), the minimum loss reduction (gamma) and the minimum sum of instance weight (min_child_weight). The XGB model is available in the R package "xgboost" (Chen et al., 2018). 20

Parameters optimization
R package "caret" (Kuhn, 2018) for MLP, SVM, XGB, "randomForest" for RF and "kknn" for KNN were used to adjust parameters. A set of parameters with the lowest RMSE for regression and the highest kappa coefficient for classification by cross-validation will be selected as the best parameters. There are 11 dependent variables (i.e., "sand, silt, clay, ilr1, ilr2, alr1, alr2, clr1, clr2, clr3" for regression and "class" for classification) trained with environmental covariates (independent variables). 25 The adjusted parameters and equation description of five machine-learning methods can be found in Supplementary Section S1 and S2 (Table S2.1).

Log ratio transformation methods
For the composition of elements = [ 1 , . . . , ], > 0, ∀ = 1, 2, . . . , and ∑ =1 = 1, the transformation equation for ALR, CLR and ILR are defined as follows: where is the ith component. The inverse transformation equations for ALR, CLR and ILR were computed in the "compositions" R package (van den Boogaart and Tolosana-Delgado, 2008), which were defined as follows: For original data, the standardization function was used to ensure predictions of soil PSFs were between 0 and 100 and 15 that their sum was 100%: where is the content of sand after standardization, and the same as silt and clay fractions.

Validation method 20
We used a total of 45 models including five machine-learning methods combined with original (ORI) and three log ratio methods (ALR, CLR, ILR): five machine-learning methods for direct soil texture classification (5 models), and these methods combined with original data and log ratio transformed data for indirect soil texture classification (20 models) and soil PSFs interpolation (20 models) ( Table 1). The data were randomly divided into two sets: 448 soil samples (70 %) for training and 192 soil samples (30 %) for validation. This process was repeated 30 times.

Validation indicators for soil texture classification
We used the overall accuracy, kappa coefficients, area under the precision-recall curve (AUPRC) and abundance index to validate the performance of different models. The first two indicators were selected to evaluate the overall prediction performance of soil texture types, and the last two were applied to evaluate the performance of each soil texture type. 5 The overall accuracy represents all samples of soil texture types correctly classified by machine-learning models, divided by the total number of samples of soil texture types used in the validation. The overall accuracy is defined as follows (Brus et al., 2011): where , , , were true positive, true negative, false positive and false negative, respectively. 10 Kappa coefficient demonstrates the agreement of observed classes and measured classes, which is calculated based on the confusion matrix, the equation is defined as: where is the probability of observed agreement (overall accuracy) and is the probability of agreement when two classes are unconditionally independent. The strength of the kappa coefficients is interpreted in the following manner: 0.01 -15 0.20: slight, 0.21 -0.40: fair, 0.41 -0.60: moderate, 0.61 -0.80: substantial, 0.81 -1.00: almost perfect (Landis and Koch, 1977). The probabilities of different soil texture types (sum to 1) obtained during the training and predicting processes of machine-learning models were selected to calculate the precision and recall, which indicated the extent of identifying positive cases: 20 https://doi.org/10.5194/hess-2019-648 Preprint. Discussion started: 6 January 2020 c Author(s) 2020. CC BY 4.0 License.
Soil texture are a class-imbalanced data set of positive and negative with 62.5% silt loam types, and the negative classifier would be overvalued under these circumstances because of the overabundance of majority (negative) examples, additionally revealing overly optimistic findings (Davis and Goadrich, 2006). PRCs are informative in dealing with class-imbalanced data (Fu et al., 2017). The R package "precrec" (Saito and Rehmsmeier, 2017) can generate PRCs and compute AUPRC for each soil texture type. This process was repeated 30 times and eventually, and then the average PRCs and AUPRCs were obtained. 5 Similarly, confusion index (COI) based on prediction probability was calculated to evaluate the uncertainties of machinelearning models of classification (Burrough et al., 1997), which equation was as follows: where , refers to the maximum value of probability of soil sampling point and , represents the second highest value of probability of soil sampling point . A lower COI indicates better performance of model. 10 Abundance index was applied to describe the proportion of all soil texture types and well-classified soil texture types in prediction maps, which was defined as follows: where is all soil texture types in prediction maps and is well-classified soil texture type(s) in test sets. All nine soil texture types were involved in the test sets to ensure the balance of the soil texture types, including clay loam (ClLo: 12), loam (Lo: 15 57), loamy sand (LoSa: 18), sand (Sa: 23), sandy clay loam (SaClLo: 4), sandy loam (SaLo: 58), silt (Si: 31), silty clay loam (SiClLo: 37), and silt loam (SiLo: 400).
where is the observed value; is the predicted value; is the number of dimensions ( of prediction values and the ranges of 95 % confidence interval (CI) (Streiner, 1996) of indicators derived from running models 5 30 times to assess model uncertainty.

Statistical analysis for the original and log ratio transformed data
The standard deviation (SD), coefficient of variation (CV), mean, minimum (Min), maximum (Max), median absolute deviation (MAD), skewness (Skew), kurtosis and Kolmogorov-Smirnov (k-s) test (p > 0.05) were employed for descriptive statistical analysis of the original and log ratio transformed data. Furthermore, multivariate median based on depth measures 10 (Bedall and Zimmermann, 1979;Gower, 1974;Small, 1990) were used because of the sum-constraint of compositional soil PSFs data. The arithmetic mean of log ratio transformation data should be back-transformed to the original space. For = [ 1 , . . . , ], the MAD can be calculated according to the Eq. (22) as below: 3 Results 15

The descriptive statistics for the original and log ratio transformed soil PSFs data
For the original data of sand content, the mean (30.64 %) was much higher than that of median center (26.06 %). In contrast, silt and clay contents were the opposite, with lower means (silt: 55.79 %, clay: 13.57 %) than median centers (silt: 59.51 %, clay: 14.43 %). For the log ratio transformed data, different log ratio methods delivered the same means for sand, silt and clay.
Additionally, the means of sand (28.69 %) and silt (60.54 %) were closer to the median centers of the original data except for 20 clay with a mean of 10.78 %. For SD and CV, soil PSFs data in log ratio geometry had more stability and less variability compared with the original data. ILR and CLR had the lowest MAD for the first component (0.66) and the second component (0.43), respectively (Fig. 2). Although the p values of the original and different log ratio transformed data were not significant, log ratios made the data more symmetric according to the skews (Fig. 2). All log ratio methods had lower skews (ALR: 0.77, CLR: 0.88, ILR: -1.20) than those of the original data (1.24) of the first component. All the kurtoses of log ratio methods were 25 much higher compared with the results generated from original data. https://doi.org/10.5194/hess-2019-648 Preprint. Discussion started: 6 January 2020 c Author(s) 2020. CC BY 4.0 License.

Figure 2.
Descriptive statistical analysis for the original and log ratio transformed soil sampling data of (a) sand, (b) silt, (c) clay, (d) ALR_1, (e) ALR_2, (f) CLR_1, (g) CLR_2,(h) CLR_3, (i) ILR_1 and (j) ILR_2. SD is standard deviation, CV is the coefficient of variation, and the Median is multivariate median based on depth measures. ALR and ILR transformed 3 (the simplex) to 2 (the real space), and CLR transformed 3 to 3 . Note that the means of log ratio transformed data were 5 back-transformed to the real space. Blue dashed lines showed the multivariate medians of original data.

Comparison of the validation indicators for soil texture classification
The overall accuracy of each model ranged from 0.613 to 0.636 (Fig. 3a). RF had the highest overall accuracy (0.636) among the five models, followed closely by the accuracy of KNN (0.630) and MLP (0.627). SVM (0.618) and XGB (0.613) were 10 relatively lower than other models. The highest kappa coefficient was generated from MLP (0.242), followed by RF (0.238), https://doi.org/10.5194/hess-2019-648 Preprint. Discussion started: 6 January 2020 c Author(s) 2020. CC BY 4.0 License. XGB (0.229), KNN (0.213) and SVM (0.213) (Fig. 3b). For uncertainties of models with confusion indices (COIs), XGB (0.278) delivered the best performance, and RF (0.501) demonstrated the highest confusion of models (Fig. 3c). We combined the PRCs of the five machine-learning methods in Fig. 4 to evaluate the performance of them in predicting each soil texture type using imbalanced data with different samples of each type. We found that the AUPRCs of types with fewer positive examples were typically small, especially in the case of SaClLo (only 4 samples), which resulted in unsatisfying results because the lack of soil sampling points made models learn poorly during the training process. Hence, the soil texture types (Lo, SaLo, SiLo, SiClLo) with more positive examples delivered superior results to those with fewer positive examples. 10 Moreover, these soil texture types had significant differences in AUPRCs. For example, SiLo, which had the largest number of samples, was the most effective among these nine types. For soil texture classes with more samples, RF and XGB performed better, and for soil texture classes with less samples, RF and SVM had better performance according to the AUPRCs.

Comparison of the prediction maps for soil texture classification
Prediction maps of soil texture types in the HRB using machine-learning models delivered quite different spatial distributions 5 in the overall performance of different models (Fig. 5). The abundance indices pointed out that SVM can predicted all of 9 https://doi.org/10.5194/hess-2019-648 Preprint. Discussion started: 6 January 2020 c Author(s) 2020. CC BY 4.0 License. types, KNN and XGB predicted 8 of 9 types, followed closely by RF (7 of 9 types) and MLP (6 of 9 types). The maps predicted by RF, SVM and XGB illustrated that the main soil texture types in the northwest of the lower reaches of HRB were mostly LoSa, while other prediction models produced SaLo. On the upper reaches of the HRB, soil texture types generated from RF were more abundant and more in accordance with the real environment (Fig. 1).

Comparison of the validation indicators for interpolation of soil PSFs 10
We compared the performance of each machine-learning model combined with the original and the log ratio transformed data of soil PSFs. The results indicated that the STRESS of the methods using log ratio transformed data were superior to these methods using original data ( Table 2). The RMSE, MAE, R 2 and AD generated from KNN, MLP, RF and XGB using original data outperformed the results using log ratio transformed data. By comparison among different log ratio transformed data of the same machine-learning model, ILR and CLR outperformed ALR in these models. In Table 2, KNN_CLR demonstrated the 15 most remarkable performance with the highest R 2 and the lowest RMSE and MAE among KNN using the three log ratios.
Furthermore, RF and SVM using CLR and ILR transformed data generated relatively similar results. XGB_ILR showed the best performance with most of the indicators except for RMSE (6.75 %) and MAE (5.36 %) of clay, and STRESS (0.63). RF https://doi.org/10.5194/hess-2019-648 Preprint. Discussion started: 6 January 2020 c Author(s) 2020. CC BY 4.0 License. had the lowest RMSE and MAE, the highest R 2 , and the lowest AD and STRESS for ALR, CLR and ILR. For original data, RF also outperformed other models.

Comparison of the interpolation prediction maps of soil PSFs
Interpolation prediction maps of soil PSFs using log ratio transformed data (ILR) and original data were represented in Figs. 5 6, S4.1 and S4.2. The maps generated from models combined with ILR transformed data showed closer ranges to the original soil sampling data in the case of sand (0.98 -99.66 %), silt (0.17 -95.87 %) and clay (0.03 -39.77 %), and the texture features were more suitable for the distributions of the real environment (Figs. 6, S4.1 and S4.2). With respect to different machinelearning models, RF and XGB delivered prediction maps that were closer to the range of the distribution of original data than did KNN, SVM and MLP.

Comparison of the validation indicators for direct and indirect soil texture classification
The overall accuracy and kappa coefficients of indirect classification were improved by using log ratio transformed data, 5 especially RF and XGB (Fig. 7). ILR of five machine-learning models showed the highest overall accuracy among three log ratio transformation methods, which also demonstrated the best performance according to kappa coefficients, except for MLP.
We also compared direct classification (Fig. 3) with indirect classification and found that the differences of overall accuracy of direct and indirect classification were negligible. In turn, the kappa coefficients were greatly modified using indirect classification compared with direct classification other than MLP; peculiarly, RF_ILR increased the kappa coefficient to 0.291 10 (21.3 % improvement) while keeping accuracy stable.

Figure 7.
Overall accuracy and kappa coefficients calculated from soil texture classification by soil PSFs interpolation using five machine-learning models combined with original data and log ratio transformed data. Note that the highest overall accuracy is SVM_ORI (0.638), and the highest kappa coefficient is RF_ILR (0.291).

The prediction performance of soil texture types from different methods
The distributions of soil texture classes using original data and ILR transformed data were illustrated in the USDA soil texture 5 triangle (Fig. 8). The triangle of the original soil PSFs data (Fig. 8a) demonstrated wider ranges of spatial dispersion than the interpolation data using machine-learning models, revealing the properties of aggregate from the sides to the center of triangles.
With respect to these machine-learning models, RF showed the most dispersed feature in accordance with the original soil PSFs data. The distributions predicted from models combined with ILR transformed data were more discrete and more associated with the original soil PSFs data than those resulting from ORI methods. The results of prediction represented striking 10 differences in that the error ratio (yellow color) of soil sampling points on types of LoSa, SaLo and Lo (left side of triangles) were significantly more than those on types of SiLo and Si (the right side of triangles) for most models, especially KNN and MLP. The log ratio methods overestimated the component of silt in the process of transformation (Fig. 2); in this way, these points were biased to the right of the USDA soil texture triangle based on overall contraction (regression smoothing effects), crossing the classification boundary and becoming other soil texture types. RF_ILR (Fig. 8f) delivered the highest right ratio 15 (RR) among these models, and the classification accuracy was enhanced using the ILR method (83.9%) compared with ORI (81.7%). In the case of other models, the differences between ORI and ILR were negligible. We also compared the RRs of indirect classification models with those of direct classification, demonstrating all RRs of direct classification were higher  (i) SVM_ORI, (j) XGB_ILR, and (k) XGB_ORI. Note that right points (green) mean that the predicted soil texture classes and these classes corresponding to the original data were the same; wrong points (yellow) were the opposite. The predicted right ratios (RRs) of the soil texture classes were in the bracket after interpolators in plots. 5

Comparison of prediction maps of direct and indirect soil texture classification
The soil texture maps predicted using original data were different from those generated from log ratio transformed data, and classification maps of the machine-learning models combined with the log ratio transformed data had more detailed information (Figs. 9 and S5.1). The machine-learning model using three log ratio transformed data were similar in the number of each predicted type; however, there were significant differences between using original data and log ratio transformed data. 10 All machine-learning models combined with original data predicted more Lo and SaLo, and fewer LoSa and Si (Fig. 9). We also compared the prediction of soil texture classes by direct classification (Fig. 5) with those generated from indirect classification using the same machine-learning model, revealing completely difference between them on the lower reaches of Heihe River Basin such as the distribution of LoSa; on the middle and upper reaches of Heihe River Basin, all the prediction maps were similar, mainly distributed with SiLo.

Comparison of total computing time for each model in soil texture classification and soil PSFs interpolation
Time spending for models was computed to compare the efficiency of different machine-learning models in soil texture classification and soil PSFs interpolation (Fig. 10). Because the difference in time spent among ORI and log ratio methods were similar, time spent of ILR was selected for soil PSFs interpolation. For the different models, RFs required the longest

The systematic comparison of the five machine-learning models 15
We found that tree-based machine-learning models -RF and XGB delivered better performance than KNN, MLP and SVM, which conclusion is the same as Heung et al. (2016). For the total computing time, RF revealed the longest time with respect https://doi.org/10.5194/hess-2019-648 Preprint. Discussion started: 6 January 2020 c Author(s) 2020. CC BY 4.0 License.
Another more meaningful multivariate treatment of soil PSFs using the probability density functions of soil particle size curves (PSCs) could be considered in the future, since non-negative values integrating to 1 (or 100 %) can be considered as compositional data with infinitesimal parts (so-called functional compositions) (Menafoglio et al., 2014). Unlike conventional component-wise approaches, the viewpoints of functional compositions are beneficial to acquiring complete and continuous information rather than discrete information. Soil texture and soil PSFs can be extracted from the stochastic simulation of soil 5 PSCs (Menafoglio et al., 2016a), applying jointly to the fractions and exploiting fully the richness of information. Menafoglio

The systematic comparison of the direct and indirect soil texture classification for soil PSFs
Compared with the real soil texture distribution and environment of the HRB, SiLo overlaid the upper reaches of HRB, and SaLo and Lo were in the south of the upper reaches of HRB showed strip distribution. Moreover, an uncovered area was detected in the northwest of the lower reaches of HRB, where it cannot be predicted due to a lack of information input in the process of model training. The main soil texture classes of the lower reaches of the HRB were SiLo, LoSa and small amounts 15 of SaLo and Lo, which distributed in the uncovered area. The main soil texture classes predicted from direct classification using machine-learning models were SaLo and SiLo; RF and XGB delivered much more LoSa than other direct classification models. However, all these models predicted that the main soil type of the lower reaches of HRB was SaLo, which was not fitted for the real environment (LoSa). In fact, LoSa and SaLo were obviously most confused classes. However, they are fairly similar to each other (Fig.8). In addition, because of the limitation of the train sets, direct classification can only predict types 20 which contained in train sets. In contrast, indirect classification broke such limitations, and new prediction types arose due to the transformation from soil PSFs to soil texture types. Moreover, more suitable matching performance with the real environment should be considered such as the log ratio methods of MLP and RF models, KNN_ ALR, KNN_ ILR and XGB_CLR.

Conclusion 25
We systematically compared a total of 45 models for direct and indirect soil texture classification, and soil PSFs interpolation using five machine-learning methods combined with original and three different log ratio transformed data in the HRB. As flexible and stable models, tree learners such as RF delivered powerful performance in both classification and regression and were superior to other machine-learning models mentioned above. As a new and sub-optimal machine-learning method in soil science, XGB appeared to be more meaningful and more computationally efficient when dealing with large data sets. RF and XGB were recommended to evaluate classification capacity of imbalanced data. In addition, the log ratio methods had advantages of modifying STRESS in soil PSFs interpolation. Moreover, the indirect soil texture classification outperformed the direct one, especially when combined with the log ratio methods. The indirect soil texture classification generated preferable results in both cases of accuracy indicators and prediction maps. More appropriate environmental covariates and interpolation techniques, more efficient soil PSFs data transformation methods, different perspectives of compositional data 5 selection (e.g., functional compositions and multivariate treatments) and systematic parameter adjustment algorithms of compositional data are key to improving accuracy in the future.