Articles | Volume 24, issue 5
https://doi.org/10.5194/hess-24-2505-2020
https://doi.org/10.5194/hess-24-2505-2020
Research article
 | 
14 May 2020
Research article |  | 14 May 2020

Systematic comparison of five machine-learning models in classification and interpolation of soil particle size fractions using different transformed data

Mo Zhang, Wenjiao Shi, and Ziwei Xu

Data sets

Soil texture of soil sampling points in Yingke Irrigation District G. Huang and Y. Jiang http://data.tpdc.ac.cn/zh-hans/data/2e9cbc1a-5972-4e29-945d-99a1902cadb7/

Dataset of soil parameters in the midstream of the Heihe River Basin (2012) M. Ma http://data.tpdc.ac.cn/zh-hans/data/371ce545-e8d0-4e96-81e1-e862dbfc3b50/

Data set of soil moisture in the lower reaches of Heihe River (2012) J. Si http://data.tpdc.ac.cn/zh-hans/data/438fc689-ad9e-4370-8961-5b2de53d8b87/

Digital soil mapping dataset of soil texture (soil particle-size fractions) in the Tianlaochi basin (2012-2014) Yue, T. and Zhao, N http://data.tpdc.ac.cn/zh-hans/data/737e4d01-c5f8-4940-98d2-3bda306784ad/

Digital soil mapping dataset of soil texture (soil particle-size fractions) in the upstream of the Heihe river basin (2012-2016) Yue, T. and Zhao, N. http://data.tpdc.ac.cn/zh-hans/data/7f91d36d-8bbd-40d5-8eaf-7c035e742f40/

Soil texture of representative samples in the Heihe River Basin G. Zhang http://data.tpdc.ac.cn/zh-hans/data/b5835154-1e3c-41a4-ba6c-a6ec5c968949/

Digital soil mapping dataset of hydrological parameters in the Heihe River Basin (2012) G. Zhang and X. Song http://data.tpdc.ac.cn/zh-hans/data/e977f5e8-972b-42a5-bffe-cd0195f3b42b/

Digital soil mapping dataset of soil depth in the Heihe River Basin (2012--2014) Zhang, G. and Song, X. http://data.tpdc.ac.cn/zh-hans/data/fc84083e-8c66-4a42-b729-4f19334d0d67/

Soil physical properties-soil bulk density and mechanical composition dataset of Tianlaochi Watershed in Qilian Mountains C. Zhao and W. Ma http://data.tpdc.ac.cn/zh-hans/data/b8bfbb8b-97e4-4622-acbd-06b5ac466403/

Download
Short summary
We systematically compared 45 models for direct and indirect soil texture classification and soil particle size fraction interpolation based on 5 machine-learning models and 3 log-ratio transformation methods. Random forest showed powerful performance in both classification of imbalanced data and regression assessment. Extreme gradient boosting is more meaningful and computationally efficient when dealing with large data sets. The indirect classification and log-ratio methods are recommended.