Preprints
https://doi.org/10.5194/hess-2018-584
https://doi.org/10.5194/hess-2018-584

  11 Feb 2019

11 Feb 2019

Review status: this preprint was under review for the journal HESS but the revision was not accepted.

Systematic comparison of five machine-learning methods in classification and interpolation of soil particle size fractions using different transformed data

Mo Zhang1,2 and Wenjiao Shi1,3 Mo Zhang and Wenjiao Shi
  • 1Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
  • 2School of Earth Sciences and Resources, China University of Geosciences, Beijing 100083, China
  • 3College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China

Abstract. Soil texture and soil particle size fractions (psf) play an increasing role in physical, chemical and hydrological processes. Digital soil mapping using machine-learning methods was widely applied to generate more detailed prediction of qualitative or quantitative outputs than traditional soil-mapping methods in soil science. As compositional data, interpolation of soil psf combined with log ratio approaches was developed to improve the prediction accuracy, which also can be used to indirectly derive soil texture. However, few reports systematically analyzed and compared the classification and regression, the accuracies of original (untransformed) and log ratio approaches, and the performance of direct and indirect soil texture classification using machine-learning methods. In this total, a total of 45 evaluation models generated from five different machine-learning models combined with original and three log ratio approaches–additive log ratio, centered log ratio and isometric log ratio (ALR, CLR and ILR, respectively), to evaluate and compare the performance of soil texture classification and soil psf interpolation. The results demonstrated that log ratio approaches modified the soil sampling data more symmetrically, and with respect to soil texture classification, random forest (RF) and extreme gradient boosting (XGB) showed notable consequences. For soil psf interpolation, RF delivered the best performance among five machine-learning models with lowest root mean squared error (RMSE, sand: 15.09 %, silt: 13.86 %, clay: 6.31 %), mean absolute error (MAE, sand: 10.65 %, silt: 9.99 %, clay: 5.00 %), Aitchison distance (AD, 0.84) and standardized residual sum of squares (STRESS, 0.61), and highest coefficient of determination (R2, sand: 53.28 %, silt: 45.77 %, clay: 53.75 %). STRESS was improved using log ratio approaches, especially CLR and ILR. There is a pronounced improvement (21.3 %) in the kappa coefficient using indirect soil texture classification compared to the direct approach. Our systematic comparison helps to elucidate the processing and selection of compositional data in spatial simulation.

Mo Zhang and Wenjiao Shi

 
Status: closed
Status: closed
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
Printer-friendly Version - Printer-friendly version Supplement - Supplement
 
Status: closed
Status: closed
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
Printer-friendly Version - Printer-friendly version Supplement - Supplement

Mo Zhang and Wenjiao Shi

Mo Zhang and Wenjiao Shi

Viewed

Total article views: 1,500 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
1,113 372 15 1,500 72 24 27
  • HTML: 1,113
  • PDF: 372
  • XML: 15
  • Total: 1,500
  • Supplement: 72
  • BibTeX: 24
  • EndNote: 27
Views and downloads (calculated since 11 Feb 2019)
Cumulative views and downloads (calculated since 11 Feb 2019)

Viewed (geographical distribution)

Total article views: 1,128 (including HTML, PDF, and XML) Thereof 1,111 with geography defined and 17 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 

Cited

Latest update: 06 May 2021
Download
Short summary
We systematically analyzed both direct (or indirect) soil texture classification and soil particle size fractions (psf) interpolation using five machine learning methods combined with untransformed and log ratio transformed data. The results showed that random forest had notable consequences for soil psf interpolation and soil texture classification (indirect performed better). Our systematic comparison helps to elucidate the processing and selection of compositional data in spatial simulation.