Status: this preprint was under review for the journal HESS but the revision was not accepted.
Systematic comparison of five machine-learning methods in
classification and interpolation of soil particle size fractions using
different transformed data
Mo Zhangand Wenjiao Shi
Abstract. Soil texture and soil particle size fractions (psf) play an increasing role in physical, chemical and hydrological processes. Digital soil mapping using machine-learning methods was widely applied to generate more detailed prediction of qualitative or quantitative outputs than traditional soil-mapping methods in soil science. As compositional data, interpolation of soil psf combined with log ratio approaches was developed to improve the prediction accuracy, which also can be used to indirectly derive soil texture. However, few reports systematically analyzed and compared the classification and regression, the accuracies of original (untransformed) and log ratio approaches, and the performance of direct and indirect soil texture classification using machine-learning methods. In this total, a total of 45 evaluation models generated from five different machine-learning models combined with original and three log ratio approaches–additive log ratio, centered log ratio and isometric log ratio (ALR, CLR and ILR, respectively), to evaluate and compare the performance of soil texture classification and soil psf interpolation. The results demonstrated that log ratio approaches modified the soil sampling data more symmetrically, and with respect to soil texture classification, random forest (RF) and extreme gradient boosting (XGB) showed notable consequences. For soil psf interpolation, RF delivered the best performance among five machine-learning models with lowest root mean squared error (RMSE, sand: 15.09 %, silt: 13.86 %, clay: 6.31 %), mean absolute error (MAE, sand: 10.65 %, silt: 9.99 %, clay: 5.00 %), Aitchison distance (AD, 0.84) and standardized residual sum of squares (STRESS, 0.61), and highest coefficient of determination (R2, sand: 53.28 %, silt: 45.77 %, clay: 53.75 %). STRESS was improved using log ratio approaches, especially CLR and ILR. There is a pronounced improvement (21.3 %) in the kappa coefficient using indirect soil texture classification compared to the direct approach. Our systematic comparison helps to elucidate the processing and selection of compositional data in spatial simulation.
Received: 20 Nov 2018 – Discussion started: 11 Feb 2019
Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.
Total article views: 3,488 (including HTML, PDF, and XML)
HTML
PDF
XML
Total
Supplement
BibTeX
EndNote
2,613
803
72
3,488
217
107
116
HTML: 2,613
PDF: 803
XML: 72
Total: 3,488
Supplement: 217
BibTeX: 107
EndNote: 116
Views and downloads (calculated since 11 Feb 2019)
Cumulative views and downloads
(calculated since 11 Feb 2019)
Viewed (geographical distribution)
Total article views: 3,025 (including HTML, PDF, and XML)
Thereof 3,008 with geography defined
and 17 with unknown origin.
Country
#
Views
%
Total:
0
HTML:
0
PDF:
0
XML:
0
1
1
Latest update: 22 Nov 2025
Mo Zhang
Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
School of Earth Sciences and Resources, China University of Geosciences, Beijing 100083, China
Wenjiao Shi
Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China
We systematically analyzed both direct (or indirect) soil texture classification and soil particle size fractions (psf) interpolation using five machine learning methods combined with untransformed and log ratio transformed data. The results showed that random forest had notable consequences for soil psf interpolation and soil texture classification (indirect performed better). Our systematic comparison helps to elucidate the processing and selection of compositional data in spatial simulation.
We systematically analyzed both direct (or indirect) soil texture classification and soil...