Preprints
https://doi.org/10.5194/hess-2018-584
https://doi.org/10.5194/hess-2018-584
11 Feb 2019
 | 11 Feb 2019
Status: this preprint was under review for the journal HESS but the revision was not accepted.

Systematic comparison of five machine-learning methods in classification and interpolation of soil particle size fractions using different transformed data

Mo Zhang and Wenjiao Shi

Abstract. Soil texture and soil particle size fractions (psf) play an increasing role in physical, chemical and hydrological processes. Digital soil mapping using machine-learning methods was widely applied to generate more detailed prediction of qualitative or quantitative outputs than traditional soil-mapping methods in soil science. As compositional data, interpolation of soil psf combined with log ratio approaches was developed to improve the prediction accuracy, which also can be used to indirectly derive soil texture. However, few reports systematically analyzed and compared the classification and regression, the accuracies of original (untransformed) and log ratio approaches, and the performance of direct and indirect soil texture classification using machine-learning methods. In this total, a total of 45 evaluation models generated from five different machine-learning models combined with original and three log ratio approaches–additive log ratio, centered log ratio and isometric log ratio (ALR, CLR and ILR, respectively), to evaluate and compare the performance of soil texture classification and soil psf interpolation. The results demonstrated that log ratio approaches modified the soil sampling data more symmetrically, and with respect to soil texture classification, random forest (RF) and extreme gradient boosting (XGB) showed notable consequences. For soil psf interpolation, RF delivered the best performance among five machine-learning models with lowest root mean squared error (RMSE, sand: 15.09 %, silt: 13.86 %, clay: 6.31 %), mean absolute error (MAE, sand: 10.65 %, silt: 9.99 %, clay: 5.00 %), Aitchison distance (AD, 0.84) and standardized residual sum of squares (STRESS, 0.61), and highest coefficient of determination (R2, sand: 53.28 %, silt: 45.77 %, clay: 53.75 %). STRESS was improved using log ratio approaches, especially CLR and ILR. There is a pronounced improvement (21.3 %) in the kappa coefficient using indirect soil texture classification compared to the direct approach. Our systematic comparison helps to elucidate the processing and selection of compositional data in spatial simulation.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.
Mo Zhang and Wenjiao Shi
 
Status: closed
Status: closed
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
Printer-friendly Version - Printer-friendly version Supplement - Supplement
 
Status: closed
Status: closed
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
Printer-friendly Version - Printer-friendly version Supplement - Supplement
Mo Zhang and Wenjiao Shi
Mo Zhang and Wenjiao Shi

Viewed

Total article views: 2,592 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
1,869 672 51 2,592 175 69 67
  • HTML: 1,869
  • PDF: 672
  • XML: 51
  • Total: 2,592
  • Supplement: 175
  • BibTeX: 69
  • EndNote: 67
Views and downloads (calculated since 11 Feb 2019)
Cumulative views and downloads (calculated since 11 Feb 2019)

Viewed (geographical distribution)

Total article views: 2,159 (including HTML, PDF, and XML) Thereof 2,142 with geography defined and 17 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 

Cited

Latest update: 14 Dec 2024
Download
Short summary
We systematically analyzed both direct (or indirect) soil texture classification and soil particle size fractions (psf) interpolation using five machine learning methods combined with untransformed and log ratio transformed data. The results showed that random forest had notable consequences for soil psf interpolation and soil texture classification (indirect performed better). Our systematic comparison helps to elucidate the processing and selection of compositional data in spatial simulation.