Systematic comparison of five machine-learning methods in
classification and interpolation of soil particle size fractions using
different transformed data

Zhang, Mo; Shi, Wenjiao

doi:https://doi.org/10.5194/hess-2018-584

Preprints

https://doi.org/10.5194/hess-2018-584

Preprints

11 Feb 2019

| 11 Feb 2019

Status: this preprint was under review for the journal HESS but the revision was not accepted.

Systematic comparison of five machine-learning methods in classification and interpolation of soil particle size fractions using different transformed data

Mo Zhang and Wenjiao Shi

Abstract. Soil texture and soil particle size fractions (psf) play an increasing role in physical, chemical and hydrological processes. Digital soil mapping using machine-learning methods was widely applied to generate more detailed prediction of qualitative or quantitative outputs than traditional soil-mapping methods in soil science. As compositional data, interpolation of soil psf combined with log ratio approaches was developed to improve the prediction accuracy, which also can be used to indirectly derive soil texture. However, few reports systematically analyzed and compared the classification and regression, the accuracies of original (untransformed) and log ratio approaches, and the performance of direct and indirect soil texture classification using machine-learning methods. In this total, a total of 45 evaluation models generated from five different machine-learning models combined with original and three log ratio approaches–additive log ratio, centered log ratio and isometric log ratio (ALR, CLR and ILR, respectively), to evaluate and compare the performance of soil texture classification and soil psf interpolation. The results demonstrated that log ratio approaches modified the soil sampling data more symmetrically, and with respect to soil texture classification, random forest (RF) and extreme gradient boosting (XGB) showed notable consequences. For soil psf interpolation, RF delivered the best performance among five machine-learning models with lowest root mean squared error (RMSE, sand: 15.09 %, silt: 13.86 %, clay: 6.31 %), mean absolute error (MAE, sand: 10.65 %, silt: 9.99 %, clay: 5.00 %), Aitchison distance (AD, 0.84) and standardized residual sum of squares (STRESS, 0.61), and highest coefficient of determination (R², sand: 53.28 %, silt: 45.77 %, clay: 53.75 %). STRESS was improved using log ratio approaches, especially CLR and ILR. There is a pronounced improvement (21.3 %) in the kappa coefficient using indirect soil texture classification compared to the direct approach. Our systematic comparison helps to elucidate the processing and selection of compositional data in spatial simulation.

Received: 20 Nov 2018 – Discussion started: 11 Feb 2019

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2508 KB)

Supplement (446 KB)

Download & links

Mo Zhang and Wenjiao Shi

Status: closed

AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment

- Printer-friendly version

- Supplement

SC1: 'Could the code / steps used to produce results be made available to reviewers?', Tomislav Hengl, 17 Feb 2019
- AC1: 'All R codes for the results of soil PSF interpolation and soil texture classification are available now', Wenjiao Shi, 18 Feb 2019
  - SC2: 'parameters used in the ML methods matter', Yen-Sen Lu, 13 Mar 2019
    - AC2: 'Adjusted parameters for the ML methods', Wenjiao Shi, 14 Mar 2019
RC1: 'A comment on the uncertainty assessement and the general validity of the work', Anonymous Referee #1, 15 Mar 2019
- AC3: 'Response to the Anonymous Referee #1', Wenjiao Shi, 14 Apr 2019
RC2: 'Questions about sampling and some recommendations to help improve the overall readability', Tomislav Hengl, 23 Apr 2019
- AC4: 'Responses to Tomislav Hengl', Wenjiao Shi, 20 May 2019
RC3: 'Major revision required', Anonymous Referee #3, 25 Apr 2019
- AC5: 'Responses to the Anonymous Referee #3', Wenjiao Shi, 20 May 2019

Status: closed

AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment

- Printer-friendly version

- Supplement

SC1: 'Could the code / steps used to produce results be made available to reviewers?', Tomislav Hengl, 17 Feb 2019
- AC1: 'All R codes for the results of soil PSF interpolation and soil texture classification are available now', Wenjiao Shi, 18 Feb 2019
  - SC2: 'parameters used in the ML methods matter', Yen-Sen Lu, 13 Mar 2019
    - AC2: 'Adjusted parameters for the ML methods', Wenjiao Shi, 14 Mar 2019
RC1: 'A comment on the uncertainty assessement and the general validity of the work', Anonymous Referee #1, 15 Mar 2019
- AC3: 'Response to the Anonymous Referee #1', Wenjiao Shi, 14 Apr 2019
RC2: 'Questions about sampling and some recommendations to help improve the overall readability', Tomislav Hengl, 23 Apr 2019
- AC4: 'Responses to Tomislav Hengl', Wenjiao Shi, 20 May 2019
RC3: 'Major revision required', Anonymous Referee #3, 25 Apr 2019
- AC5: 'Responses to the Anonymous Referee #3', Wenjiao Shi, 20 May 2019

Mo Zhang and Wenjiao Shi

Supplement

https://doi.org/10.5194/hess-2018-584-supplement

Mo Zhang and Wenjiao Shi

Viewed

Total article views: 3,432 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
2,576	786	70	3,432	211	102	111

HTML: 2,576
PDF: 786
XML: 70
Total: 3,432
Supplement: 211
BibTeX: 102
EndNote: 111

Views and downloads (calculated since 11 Feb 2019)

Month	HTML	PDF	XML	Total
Feb 2019	153	43	1	197
Mar 2019	65	17	0	82
Apr 2019	33	20	0	53
May 2019	25	15	0	40
Jun 2019	13	10	1	24
Jul 2019	11	7	0	18
Aug 2019	11	7	0	18
Sep 2019	98	16	1	115
Oct 2019	96	6	0	102
Nov 2019	160	8	0	168
Dec 2019	102	7	0	109
Jan 2020	76	4	0	80
Feb 2020	42	15	0	57
Mar 2020	26	7	0	33
Apr 2020	10	17	0	27
May 2020	8	24	0	32
Jun 2020	9	13	0	22
Jul 2020	63	23	6	92
Aug 2020	14	8	2	24
Sep 2020	9	7	0	16
Oct 2020	17	6	1	24
Nov 2020	10	7	1	18
Dec 2020	9	16	1	26
Jan 2021	8	12	0	20
Feb 2021	20	20	0	40
Mar 2021	15	19	0	34
Apr 2021	8	16	1	25
May 2021	10	17	1	28
Jun 2021	14	5	0	19
Jul 2021	12	9	0	21
Aug 2021	11	11	2	24
Sep 2021	13	5	0	18
Oct 2021	13	21	0	34
Nov 2021	10	15	0	25
Dec 2021	12	5	0	17
Jan 2022	19	5	1	25
Feb 2022	19	11	0	30
Mar 2022	11	8	1	20
Apr 2022	10	2	1	13
May 2022	13	8	1	22
Jun 2022	5	2	2	9
Jul 2022	20	3	0	23
Aug 2022	5	5	0	10
Sep 2022	8	2	0	10
Oct 2022	7	4	2	13
Nov 2022	5	5	0	10
Dec 2022	8	9	0	17
Jan 2023	8	8	1	17
Feb 2023	16	2	0	18
Mar 2023	11	3	1	15
Apr 2023	7	2	0	9
May 2023	6	5	1	12
Jun 2023	11	4	2	17
Jul 2023	39	5	2	46
Aug 2023	30	6	0	36
Sep 2023	18	7	2	27
Oct 2023	33	4	0	37
Nov 2023	19	1	0	20
Dec 2023	10	3	1	14
Jan 2024	13	11	1	25
Feb 2024	14	17	2	33
Mar 2024	34	10	1	45
Apr 2024	39	18	7	64
May 2024	39	6	0	45
Jun 2024	31	10	2	43
Jul 2024	25	4	1	30
Aug 2024	34	7	1	42
Sep 2024	29	4	0	33
Oct 2024	30	11	0	41
Nov 2024	25	1	0	26
Dec 2024	31	2	1	34
Jan 2025	33	3	0	36
Feb 2025	34	3	1	38
Mar 2025	31	7	2	40
Apr 2025	30	10	0	40
May 2025	36	12	1	49
Jun 2025	51	32	3	86
Jul 2025	42	12	1	55
Aug 2025	85	14	5	104
Sep 2025	316	13	3	332
Oct 2025	30	7	2	39

Cumulative views and downloads (calculated since 11 Feb 2019)

Month	HTML	PDF	XML	Total
Feb 2019	153	43	1	197
Mar 2019	65	17	0	82
Apr 2019	33	20	0	53
May 2019	25	15	0	40
Jun 2019	13	10	1	24
Jul 2019	11	7	0	18
Aug 2019	11	7	0	18
Sep 2019	98	16	1	115
Oct 2019	96	6	0	102
Nov 2019	160	8	0	168
Dec 2019	102	7	0	109
Jan 2020	76	4	0	80
Feb 2020	42	15	0	57
Mar 2020	26	7	0	33
Apr 2020	10	17	0	27
May 2020	8	24	0	32
Jun 2020	9	13	0	22
Jul 2020	63	23	6	92
Aug 2020	14	8	2	24
Sep 2020	9	7	0	16
Oct 2020	17	6	1	24
Nov 2020	10	7	1	18
Dec 2020	9	16	1	26
Jan 2021	8	12	0	20
Feb 2021	20	20	0	40
Mar 2021	15	19	0	34
Apr 2021	8	16	1	25
May 2021	10	17	1	28
Jun 2021	14	5	0	19
Jul 2021	12	9	0	21
Aug 2021	11	11	2	24
Sep 2021	13	5	0	18
Oct 2021	13	21	0	34
Nov 2021	10	15	0	25
Dec 2021	12	5	0	17
Jan 2022	19	5	1	25
Feb 2022	19	11	0	30
Mar 2022	11	8	1	20
Apr 2022	10	2	1	13
May 2022	13	8	1	22
Jun 2022	5	2	2	9
Jul 2022	20	3	0	23
Aug 2022	5	5	0	10
Sep 2022	8	2	0	10
Oct 2022	7	4	2	13
Nov 2022	5	5	0	10
Dec 2022	8	9	0	17
Jan 2023	8	8	1	17
Feb 2023	16	2	0	18
Mar 2023	11	3	1	15
Apr 2023	7	2	0	9
May 2023	6	5	1	12
Jun 2023	11	4	2	17
Jul 2023	39	5	2	46
Aug 2023	30	6	0	36
Sep 2023	18	7	2	27
Oct 2023	33	4	0	37
Nov 2023	19	1	0	20
Dec 2023	10	3	1	14
Jan 2024	13	11	1	25
Feb 2024	14	17	2	33
Mar 2024	34	10	1	45
Apr 2024	39	18	7	64
May 2024	39	6	0	45
Jun 2024	31	10	2	43
Jul 2024	25	4	1	30
Aug 2024	34	7	1	42
Sep 2024	29	4	0	33
Oct 2024	30	11	0	41
Nov 2024	25	1	0	26
Dec 2024	31	2	1	34
Jan 2025	33	3	0	36
Feb 2025	34	3	1	38
Mar 2025	31	7	2	40
Apr 2025	30	10	0	40
May 2025	36	12	1	49
Jun 2025	51	32	3	86
Jul 2025	42	12	1	55
Aug 2025	85	14	5	104
Sep 2025	316	13	3	332
Oct 2025	30	7	2	39

Viewed (geographical distribution)

Total article views: 2,971 (including HTML, PDF, and XML) Thereof 2,954 with geography defined and 17 with unknown origin.

Country	#	Views	%

Cited

Latest update: 23 Oct 2025

Short summary

We systematically analyzed both direct (or indirect) soil texture classification and soil particle size fractions (psf) interpolation using five machine learning methods combined with untransformed and log ratio transformed data. The results showed that random forest had notable consequences for soil psf interpolation and soil texture classification (indirect performed better). Our systematic comparison helps to elucidate the processing and selection of compositional data in spatial simulation.


Total:	0
HTML:	0
PDF:	0
XML:	0

Systematic comparison of five machine-learning methods in classification and interpolation of soil particle size fractions using different transformed data

Supplement

Viewed

Viewed (geographical distribution)

Cited

4 citations as recorded by crossref.