Articles | Volume 24, issue 5
https://doi.org/10.5194/hess-24-2505-2020
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/hess-24-2505-2020
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Systematic comparison of five machine-learning models in classification and interpolation of soil particle size fractions using different transformed data
Mo Zhang
Key Laboratory of Land Surface Pattern and Simulation, State Key
Laboratory of Resources and Environmental Information System, Institute of
Geographic Sciences and Natural Resources Research, Chinese Academy of
Sciences, Beijing 100101, China
School of Earth Sciences and Resources, China University of
Geosciences, Beijing 100083, China
Key Laboratory of Land Surface Pattern and Simulation, State Key
Laboratory of Resources and Environmental Information System, Institute of
Geographic Sciences and Natural Resources Research, Chinese Academy of
Sciences, Beijing 100101, China
College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China
Ziwei Xu
State Key Laboratory of Earth Surface Processes and Resource Ecology, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China
Related authors
Mo Zhang and Wenjiao Shi
Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2021-86, https://doi.org/10.5194/hess-2021-86, 2021
Revised manuscript not accepted
Short summary
Short summary
We paid more attention to explain the performance of linear model, machine-learning model and their hybrid patterns on both Euclidean space and Aitchison space using appropriate statistical methods. Different accuracy performance of soil particle-size fraction interpolation were revealed in terms of different compositional balances of isometric log ratio transformation. This study provides a reference for the mapping of soil PSFs combined with transformed data at the regional scale.
Xufeng Wang, Tao Che, Jingfeng Xiao, Tonghong Wang, Junlei Tan, Yang Zhang, Zhiguo Ren, Liying Geng, Haibo Wang, Ziwei Xu, Shaomin Liu, and Xin Li
Earth Syst. Sci. Data, 17, 1329–1346, https://doi.org/10.5194/essd-17-1329-2025, https://doi.org/10.5194/essd-17-1329-2025, 2025
Short summary
Short summary
In this study, carbon flux and auxiliary meteorological data are post-processed to create an analysis-ready dataset for 34 sites across six ecosystems in the Heihe River basin. Overall, 18 sites have multi-year observations, while 16 were observed only during the 2012 growing season, totaling 1513 site months. This dataset can be used to explore carbon exchange, assess ecosystem responses to climate change, support upscaling studies, and evaluate carbon cycle models.
Yaoming Ma, Zhipeng Xie, Yingying Chen, Shaomin Liu, Tao Che, Ziwei Xu, Lunyu Shang, Xiaobo He, Xianhong Meng, Weiqiang Ma, Baiqing Xu, Huabiao Zhao, Junbo Wang, Guangjian Wu, and Xin Li
Earth Syst. Sci. Data, 16, 3017–3043, https://doi.org/10.5194/essd-16-3017-2024, https://doi.org/10.5194/essd-16-3017-2024, 2024
Short summary
Short summary
Current models and satellites struggle to accurately represent the land–atmosphere (L–A) interactions over the Tibetan Plateau. We present the most extensive compilation of in situ observations to date, comprising 17 years of data on L–A interactions across 12 sites. This quality-assured benchmark dataset provides independent validation to improve models and remote sensing for the region, and it enables new investigations of fine-scale L–A processes and their mechanistic drivers.
Yibo Sun, Bilige Sude, Xingwen Lin, Bing Geng, Bo Liu, Shengnan Ji, Junping Jing, Zhiping Zhu, Ziwei Xu, Shaomin Liu, and Zhanjun Quan
Atmos. Meas. Tech., 16, 5659–5679, https://doi.org/10.5194/amt-16-5659-2023, https://doi.org/10.5194/amt-16-5659-2023, 2023
Short summary
Short summary
Unoccupied aerial vehicles (UAVs) provide a versatile platform for eddy covariance (EC) flux measurements at regional scales with low cost, transport, and infrastructural requirements. This study evaluates the measurement performance in the wind field and turbulent flux of a UAV-based EC system based on the data from a set of calibration flights and standard operational flights and concludes that the system can measure the georeferenced wind vector and turbulent flux with sufficient precision.
Shaomin Liu, Ziwei Xu, Tao Che, Xin Li, Tongren Xu, Zhiguo Ren, Yang Zhang, Junlei Tan, Lisheng Song, Ji Zhou, Zhongli Zhu, Xiaofan Yang, Rui Liu, and Yanfei Ma
Earth Syst. Sci. Data, 15, 4959–4981, https://doi.org/10.5194/essd-15-4959-2023, https://doi.org/10.5194/essd-15-4959-2023, 2023
Short summary
Short summary
We present a suite of observational datasets from artificial and natural oases–desert systems that consist of long-term turbulent flux and auxiliary data, including hydrometeorological, vegetation, and soil parameters, from 2012 to 2021. We confirm that the 10-year, long-term dataset presented in this study is of high quality with few missing data, and we believe that the data will support ecological security and sustainable development in oasis–desert areas.
Xinlei He, Yanping Li, Shaomin Liu, Tongren Xu, Fei Chen, Zhenhua Li, Zhe Zhang, Rui Liu, Lisheng Song, Ziwei Xu, Zhixing Peng, and Chen Zheng
Hydrol. Earth Syst. Sci., 27, 1583–1606, https://doi.org/10.5194/hess-27-1583-2023, https://doi.org/10.5194/hess-27-1583-2023, 2023
Short summary
Short summary
This study highlights the role of integrating vegetation and multi-source soil moisture observations in regional climate models via a hybrid data assimilation and machine learning method. In particular, we show that this approach can improve land surface fluxes, near-surface atmospheric conditions, and land–atmosphere interactions by implementing detailed land characterization information in basins with complex underlying surfaces.
Mo Zhang and Wenjiao Shi
Hydrol. Earth Syst. Sci. Discuss., https://doi.org/10.5194/hess-2021-86, https://doi.org/10.5194/hess-2021-86, 2021
Revised manuscript not accepted
Short summary
Short summary
We paid more attention to explain the performance of linear model, machine-learning model and their hybrid patterns on both Euclidean space and Aitchison space using appropriate statistical methods. Different accuracy performance of soil particle-size fraction interpolation were revealed in terms of different compositional balances of isometric log ratio transformation. This study provides a reference for the mapping of soil PSFs combined with transformed data at the regional scale.
Cited articles
Abdi, D., Cade-Menun, B. J., Ziadi, N., and Parent, L. E.: Compositional
statistical analysis of soil 31P-NMR forms, Geoderma, 257, 40–47, https://doi.org/10.1016/j.geoderma.2015.03.019, 2015.
Adhikari, K. and Hartemink, A. E.: Linking soils to ecosystem services – A
global review, Geoderma, 262, 101–111,
https://doi.org/10.1016/j.geoderma.2015.08.009, 2016.
Aitchison, J.: The statistical-analysis of compositional data, Chapman and
Hall, 139–177, 1982.
Aitchison, J.: On criteria for measures of compositional difference,
Math. Geol., 24, 365–379,
https://doi.org/10.1007/bf00891269, 1992.
Bagheri Bodaghabadi, M., Antonio Martinez-Casasnovas, J., Salehi, M. H.,
Mohammadi, J., Esfandiarpoor Borujeni, I., Toomanian, N., and Gandomkar, A.: Digital soil mapping using artificial neural networks and terrain-related
attributes, Pedosphere, 25, 580–591, 2015.
Bationo, A., Kihara, J., Vanlauwe, B., Waswa, B., and Kimetu, J.: Soil
organic carbon dynamics, functions and management in west african
agro-ecosystems, Agr. Syst., 94, 13–25,
https://doi.org/10.1016/j.agsy.2005.08.011, 2007.
Bedall, F. K. and Zimmermann, H.: Algorithm as 143: The mediancentre,
J. Roy. Stat. Soc. C-Appl., 28,
325–328, https://doi.org/10.2307/2347218, 1979.
Behrens, T. and Scholten, T.: Chapter 25 A comparison of data-mining
techniques in predictive soil mapping, in: Developments in soil science,
edited by: Lagacherie, P., McBratney, A. B., and Voltz, M., Elsevier,
353–617, https://doi.org/10.1016/S0166-2481(06)31025-2, 2006.
Bergmeir, C. and Benitez, J. M.: Neural networks in R using the stuttgart
neural network simulator: RSNNS, J. Stat. Softw., 46, 1–26,
https://doi.org/10.18637/jss.v046.i07, 2012.
Breiman, L.: Bagging predictors, Mach. Learn., 24, 123–140,
https://doi.org/10.1023/a:1018054314350, 1996.
Breiman, L.: Random forests, Mach. Learn., 45, 5–32,
https://doi.org/10.1023/a:1010933404324, 2001.
Brus, D. J., Kempen, B., and Heuvelink, G. B. M.: Sampling for validation of
digital soil maps, Eur. J. Soil Sci., 62, 394–407,
https://doi.org/10.1111/j.1365-2389.2011.01364.x, 2011.
Burges, C. J. C.: A tutorial on support vector machines for pattern
recognition, Data Min. Knowl. Disc., 2, 121–167,
https://doi.org/10.1023/a:1009715923555, 1998.
Burrough, P. A., van Gaans, P. F. M., and Hootsmans, R.: Continuous
classification in soil survey: Spatial correlation, confusion and
boundaries, Geoderma, 77, 115–135,
https://doi.org/10.1016/S0016-7061(97)00018-9, 1997.
Butler, J. C.: Effects of closure on the moments of a distribution, J.
Int. Ass. Math. Geol., 11, 75–84,
https://doi.org/10.1007/bf01043247, 1979.
Camera, C., Zomeni, Z., Noller, J. S., Zissimos, A. M., Christoforou, I. C.,
and Bruggeman, A.: A high resolution map of soil types and physical
properties for Cyprus: A digital soil mapping optimization, Geoderma, 285,
35–49, https://doi.org/10.1016/j.geoderma.2016.09.019, 2017.
Chen, T. and Guestrin, C.: Xgboost: A scalable tree boosting system,
Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, San Francisco, California, USA,
https://doi.org/10.1145/2939672.2939785, 2016.
Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., Chen, K.,
Mitchell, R., Cano, I., Zhou, T., Li, M., Xie, J., Lin, M., Geng, Y., and
Li, Y.: Xgboost: Extreme gradient boosting, R package version 0.71.2,
available at: https://CRAN.R-project.org/package=xgboost (last access: 14 March 2020), 2018.
Conrad, O., Bechtel, B., Bock, M., Dietrich, H., Fischer, E., Gerlitz, L., Wehberg, J., Wichmann, V., and Böhner, J.: System for Automated Geoscientific Analyses (SAGA) v. 2.1.4, Geosci. Model Dev., 8, 1991–2007, https://doi.org/10.5194/gmd-8-1991-2015, 2015.
Cortes, C. and Vapnik, V.: Support-vector networks, Mach. Learn., 20,
273–297, https://doi.org/10.1023/a:1022627411411, 1995.
Cover, T. M. and Hart, P. E.: Nearest neighbor pattern classification, IEEE
T. Inform. Theory, 13, 21–27,
https://doi.org/10.1109/tit.1967.1053964, 1967.
Crouvi, O., Pelletier, J. D., and Rasmussen, C.: Predicting the thickness
and aeolian fraction of soils in upland watersheds of the Mojave Desert,
Geoderma, 195, 94–110,
https://doi.org/10.1016/j.geoderma.2012.11.015, 2013.
Davis, J. and Goadrich, M.: The relationship between precision-recall and
ROC curves, Proceedings of the 23rd international conference on Machine
learning, Pittsburgh, Pennsylvania, USA, 2006.
Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., and Barcelo-Vidal,
C.: Isometric logratio transformations for compositional data analysis,
Math. Geol., 35, 279–300,
https://doi.org/10.1023/a:1023818214614, 2003.
Egozcue, J. J. and Pawlowsky-Glahn, V.: Groups of parts and their balances
in compositional data analysis, Math. Geol., 37, 795–828,
https://doi.org/10.1007/s11004-005-7381-9, 2005.
Elith, J., Leathwick, J. R., and Hastie, T.: A working guide to boosted
regression trees, J. Anim. Ecol., 77, 802–813,
https://doi.org/10.1111/j.1365-2656.2008.01390.x, 2008.
Filzmoser, P., and Hron, K.: Correlation analysis for compositional data,
Math. Geosci., 41, 905–919,
https://doi.org/10.1007/s11004-008-9196-y, 2009.
Filzmoser, P., Hron, K., and Reimann, C.: Univariate statistical analysis of
environmental (compositional) data: Problems and possibilities, Sci.
Total Environ., 407, 6100–6108,
https://doi.org/10.1016/j.scitotenv.2009.08.008, 2009.
Fiserova, E. and Hron, K.: On the interpretation of orthonormal coordinates
for compositional data, Math. Geosci., 43, 455–468,
https://doi.org/10.1007/s11004-011-9333-x, 2011.
Follain, S., Minasny, B., McBratney, A. B., and Walter, C.: Simulation of
soil thickness evolution in a complex agricultural landscape at fine spatial
and temporal scales, Geoderma, 133, 71–86,
https://doi.org/10.1016/j.geoderma.2006.03.038, 2006.
Fu, G., Xu, F., Zhang, B., and Yi, L.: Stable variable selection of
class-imbalanced data with precision-recall criterion, Chemometr.
Intell. Lab., 171, 241–250,
https://doi.org/10.1016/j.chemolab.2017.10.015, 2017.
Gobin, A., Campling, P., and Feyen, J.: Soil-landscape modelling to quantify
spatial variability of soil texture, Phys. Chem. Earth Pt.
B, 26, 41–45,
https://doi.org/10.1016/s1464-1909(01)85012-7, 2001.
Gochis, D. J., Vivoni, E. R., and Watts, C. J.: The impact of soil depth on
land surface energy and water fluxes in the North American Monsoon region,
J. Arid Environ., 74, 564–571,
https://doi.org/10.1016/j.jaridenv.2009.11.002, 2010.
Hengl, T., Heuvelink, G. B. M., Kempen, B., Leenaars, J. G. B., Walsh, M.
G., Shepherd, K. D., Sila, A., MacMillan, R. A., de Jesus, J. M., Tamene,
L., and Tondoh, J. E.: Mapping soil properties of Africa at 250 m
resolution: Random forests significantly improve current predictions, Plos
One, 10, e0125814, https://doi.org/10.1371/journal.pone.0125814, 2015.
Hengl, T., de Jesus, J. M., Heuvelink, G. B. M., Gonzalez, M. R., Kilibarda,
M., Blagotic, A., Shangguan, W., Wright, M. N., Geng, X.,
Bauer-Marschallinger, B., Guevara, M. A., Vargas, R., MacMillan, R. A.,
Batjes, N. H., Leenaars, J. G. B., Ribeiro, E., Wheeler, I., Mantel, S., and
Kempen, B.: Soilgrids250m: Global gridded soil information based on machine
learning, Plos One, 12, e0169748,
https://doi.org/10.1371/journal.pone.0169748, 2017.
Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B. M., and Graeler,
B.: Random forest as a generic framework for predictive modeling of spatial
and spatio-temporal variables, Peerj, 6, e5518,
https://doi.org/10.7717/peerj.5518, 2018.
Heung, B., Ho, H. C., Zhang, J., Knudby, A., Bulmer, C. E., and Schmidt, M.
G.: An overview and comparison of machine-learning techniques for
classification purposes in digital soil mapping, Geoderma, 265, 62–77,
https://doi.org/10.1016/j.geoderma.2015.11.014, 2016.
Hijazi, R., and Jernigan, R.: Modelling compositional data using Dirichlet
regression models, Journal of Applied Probability and Statistics, 4, 77–91,
2009.
Huang, G. and Jiang, Y.: Soil texture of soil sampling points in Yingke Irrigation District, available at: http://data.tpdc.ac.cn/zh-hans/data/2e9cbc1a-5972-4e29-945d-99a1902cadb7/, last access: 11 May 2020.
Huang, J., Subasinghe, R., and Triantafilis, J.: Mapping particle-size
fractions as a composition using additive log-ratio transformation and
ancillary data, Soil Sci. Soc. Am. J., 78, 1967–1976,
https://doi.org/10.2136/sssaj2014.05.0215, 2014.
Huete, A., Didan, K., Miura, T., Rodriguez, E. P., Gao, X., and Ferreira, L.
G.: Overview of the radiometric and biophysical performance of the MODIS
vegetation indices, Remote Sens. Environ., 83, 195–213,
https://doi.org/10.1016/s0034-4257(02)00096-2, 2002.
Huete, A. R.: A soil-adjusted vegetation index (SAVI), Remote Sens.
Environ., 25, 295–309,
https://doi.org/10.1016/0034-4257(88)90106-x, 1988.
Jafari, A., Khademi, H., Finke, P. A., Van de Wauw, J., and Ayoubi, S.:
Spatial prediction of soil great groups by boosted regression trees using a
limited point dataset in an arid region, southeastern Iran, Geoderma, 232,
148–163, https://doi.org/10.1016/j.geoderma.2014.04.029, 2014.
Kuhn, M.: Caret: Classification and regression training, R package version
6.0-80, available at: https://CRAN.R-project.org/package=caret (last access: 14 March 2020), 2018.
Landis, J. R. and Koch, G. G.: Measurement of observer agreement for
categorical data, Biometrics, 33, 159–174,
https://doi.org/10.2307/2529310, 1977.
Liaw, A., and Wiener, M.: Classification and regression by randomforest, R
News, 2, 18–22, available at: https://CRAN.R-project.org/doc/Rnews/ (last access: 29 April 2020), 2002.
Liess, M., Glaser, B., and Huwe, B.: Uncertainty in the spatial prediction
of soil texture comparison of regression tree and random forest models,
Geoderma, 170, 70–79,
https://doi.org/10.1016/j.geoderma.2011.10.010, 2012.
Lloyd, C. D., Pawlowsky-Glahn, V., and Jose Egozcue, J.: Compositional data
analysis in population studies, Ann. Assoc. Am.
Geogr., 102, 1251–1266,
https://doi.org/10.1080/00045608.2011.652855, 2012.
Ma, M.: HiWATER: Dataset of soil parameters in the midstream of the Heihe River Basin (2012), available at: http://data.tpdc.ac.cn/zh-hans/data/371ce545-e8d0-4e96-81e1-e862dbfc3b50/, last access: 11 May 2020.
Martin-Fernandez, J. A., Olea-Meneses, R. A., and Pawlowsky-Glahn, V.:
Criteria to compare estimation methods of regionalized compositions,
Math. Geol., 33, 889–909,
https://doi.org/10.1023/a:1012293922142, 2001.
McNamara, J. P., Chandler, D., Seyfried, M., and Achet, S.: Soil moisture
states, lateral flow, and streamflow generation in a semi-arid,
snowmelt-driven catchment, Hydrol. Process., 19, 4023–4038,
https://doi.org/10.1002/hyp.5869, 2005.
Menafoglio, A., Guadagnini, A., and Secchi, P.: A kriging approach based on
Aitchison geometry for the characterization of particle-size curves in
heterogeneous aquifers, Stoch. Environ. Res. Risk Assess., 28, 1835–1851,
https://doi.org/10.1007/s00477-014-0849-8, 2014.
Menafoglio, A., Secchi, P., and Guadagnini, A.: A class-kriging predictor
for functional compositions with application to particle-size curves in
heterogeneous aquifers, Math. Geosci., 48, 463–485,
https://doi.org/10.1007/s11004-015-9625-7, 2016a.
Menafoglio, A., Guadagnini, A., and Secchi, P.: Stochastic simulation of
soil particle-size curves in heterogeneous aquifer systems through a Bayes
space approach, Water Resour. Res., 52, 5708–5726,
https://doi.org/10.1002/2015wr018369, 2016b.
Metternicht, G. I. and Zinck, J. A.: Remote sensing of soil salinity:
Potentials and constraints, Remote Sens. Environ., 85, 1–20,
https://doi.org/10.1016/s0034-4257(02)00188-8, 2003.
Meyer, D., Dimitriadou, E., Hornik, K., Andreas, W., and Friedrich, L.:
e1071: Misc functions of the department of statistics, probability theory
group (formerly: E1071), TU Wien, R package version 1.6-8, available at:
https://CRAN.R-project.org/package=e1071 (last
access: 14 March 2020), 2017.
Mishra, S., and Datta-Gupta, A.: Exploratory data analysis, in:
Applied Statistical Modeling and Data Analytics, chap. 2, edited by: Mishra, S. and
Datta-Gupta, A., Elsevier, 15–29,
https://doi.org/10.1016/B978-0-12-803279-4.00002-X, 2018.
Molayemat, H., Torab, F. M., Pawlowsky-Glahn, V., Morshedy, A. H., and Jose
Egozcue, J.: The impact of the compositional nature of data on coal reserve
evaluation, a case study in Parvadeh IV coal deposit, Central Iran,
Int. J. Coal Geol., 188, 94–111,
https://doi.org/10.1016/j.coal.2018.02.003, 2018.
Pahlavan-Rad, M. R. and Akbarimoghaddam, A.: Spatial variability of soil
texture fractions and pH in a flood plain (case study from eastern Iran),
Catena, 160, 275–281,
https://doi.org/10.1016/j.catena.2017.10.002, 2018.
Poggio, L. and Gimona, A.: 3D mapping of soil texture in Scotland, Geoderma
Regional, 9, 5–16,
https://doi.org/10.1016/j.geodrs.2016.11.003, 2017.
Reimann, C. and Filzmoser, P.: Normal and lognormal data distribution in
geochemistry: Death of a myth. Consequences for the statistical treatment of
geochemical and environmental data, Environ. Geol., 39, 1001–1014,
https://doi.org/10.1007/s002549900081, 2000.
Saito, T. and Rehmsmeier, M.: Precrec: Fast and accurate precision-recall
and ROC curve calculations in R, Bioinformatics, 33, 145–147,
https://doi.org/10.1093/bioinformatics/btw570, 2017.
Salazar, E., Giraldo, R., and Porcu, E.: Spatial prediction for
infinite-dimensional compositional data, Stoch. Environ. Res.
Risk A., 29, 1737–1749,
https://doi.org/10.1007/s00477-014-1010-4, 2015.
Schliep, K. and Hechenbichler, K.: kknn: Weighted K-nearest neighbors, R
package version 1.3.1, available at:
https://CRAN.R-project.org/package=kknn (last access: 14 March 2020), 2016.
Segal, M. and Xiao, Y. Y.: Multivariate random forests, Wiley
Interdisciplinary Reviews-Data Mining and Knowledge Discovery, 1, 80–87,
https://doi.org/10.1002/widm.12, 2011.
Si, J.: Data set of soil moisture in the lower reaches of Heihe River (2012), available at: http://data.tpdc.ac.cn/zh-hans/data/438fc689-ad9e-4370-8961-5b2de53d8b87/, last access: 12 May 2020.
Small, C. G.: A survey of multidimensional medians, Int.
Stat. Rev., 58, 263–277,
https://doi.org/10.2307/1403809, 1990.
Song, X., Brus, D. J., Liu, F., Li, D., Zhao, Y., Yang, J., and Zhang, G.:
Mapping soil organic carbon content by geographically weighted regression: A
case study in the Heihe River Basin, China, Geoderma, 261, 11–22,
https://doi.org/10.1016/j.geoderma.2015.06.024, 2016.
Streiner, D. L.: Maintaining standards: Differences between the standard
deviation and standard error, and when to use each, Can. J.
Psychiat., 41, 498–502,
https://doi.org/10.1177/070674379604100805, 1996.
Subasi, A.: Eeg signal classification using wavelet feature extraction and a
mixture of expert model, Expert Syst. Appl., 32, 1084–1093,
https://doi.org/10.1016/j.eswa.2006.02.005, 2007.
Taalab, K., Corstanje, R., Zawadzka, J., Mayr, T., Whelan, M. J., Hannam, J.
A., and Creamer, R.: On the application of bayesian networks in digital soil
mapping, Geoderma, 259, 134–148,
https://doi.org/10.1016/j.geoderma.2015.05.014, 2015.
Taghizadeh-Mehrjardi, R., Nabiollahi, K., Minasny, B., and Triantafilis, J.:
Comparing data mining classifiers to predict spatial distribution of
USDA-family soil groups in Baneh region, Iran, Geoderma, 253, 67–77,
https://doi.org/10.1016/j.geoderma.2015.04.008, 2015.
Thompson, J. A., Roecker, S., Grunwald, S., and Owens, P. R.:
Digital soil mapping: Interactions with and applications for hydropedology,
chap. 21, in: Hydropedology, edited by: Lin, H., Academic Press, Boston, 665–709,
https://doi.org/10.1016/B978-0-12-386941-8.00021-6, 2012.
Tolosana-Delgado, R., Mueller, U., and van den Boogaart, K. G.:
Geostatistics for compositional data: An overview, Math. Geosci.,
51, 485–526, https://doi.org/10.1007/s11004-018-9769-3, 2019.
van den Boogaart, K. G. and Tolosana-Delgado, R.: Compositions: A unified R
package to analyze compositional data, Comput. Geosci., 34,
320–338, https://doi.org/10.1016/j.cageo.2006.11.017, 2008.
Vapnik, V.: The support vector method of function estimation, Nonlinear
modeling: Advanced black-box techniques, edited by: Suykens, J. A. K. and
Vandewalle, J., 55–85,
https://doi.org/10.1007/978-1-4615-5703-6_3, 1998.
Wang, Z. and Shi, W.: Mapping soil particle-size fractions: A comparison of
compositional kriging and log-ratio kriging, J. Hydrol., 546,
526–541, https://doi.org/10.1016/j.jhydrol.2017.01.029, 2017.
Wang, Z. and Shi, W.: Robust variogram estimation combined with isometric
log-ratio transformation for improved accuracy of soil particle-size
fraction mapping, Geoderma, 324, 56–66,
https://doi.org/10.1016/j.geoderma.2018.03.007, 2018.
Wu, B., Yan, N., Xiong, J., Bastiaanssen, W. G. M., Zhu, W., and Stein, A.:
Validation of ETWatch using field measurements at diverse landscapes: A case
study in Hai Basin of China, J. Hydrol., 436, 67–80,
https://doi.org/10.1016/j.jhydrol.2012.02.043, 2012.
Wu, W., Li, A., He, X., Ma, R., Liu, H., and Lv, J.: A comparison of support
vector machines, artificial neural network and classification tree for
identifying soil texture classes in southwest China, Comput.
Electron. Agr., 144, 86-93,
https://doi.org/10.1016/j.compag.2017.11.037, 2018.
Xu, T., He, X., Bateni, S. M., Auligne, T., Liu, S., Xu, Z., Zhou, J., and
Mao, K.: Mapping regional turbulent heat fluxes via variational assimilation
of land surface temperature data from polar orbiting satellites, Remote
Sens. Environ., 221, 444–461,
https://doi.org/10.1016/j.rse.2018.11.023, 2019.
Yang, R., Zhang, G., Liu, F., Lu, Y., Yang, F., Yang, F., Yang, M., Zhao,
Y., and Li, D.: Comparison of boosted regression tree and random forest
models for mapping topsoil organic carbon concentration in an alpine
ecosystem, Ecol. Indic., 60, 870–878,
https://doi.org/10.1016/j.ecolind.2015.08.036, 2016.
Yi, C., Li, D., Zhang, G., Zhao, Y., Yang, J., Liu, F., and Song, X.:
Criteria for partition of soil thickness and case studies, Acta Pedologica
Sinica, 52, 220–227, https://doi.org/10.11766/trxb201402180069,
2015.
Yoo, K., Amundson, R., Heimsath, A. M., and Dietrich, W. E.: Spatial
patterns of soil organic carbon on hillslopes: Integrating geomorphic
processes and the biological C cycle, Geoderma, 130, 47–65,
https://doi.org/10.1016/j.geoderma.2005.01.008, 2006.
Yue, T. and Zhao, N.: Digital soil mapping dataset of soil texture (soil particle-size fractions) in the Tianlaochi basin (2012–2014), available at: http://data.tpdc.ac.cn/zh-hans/data/737e4d01-c5f8-4940-98d2-3bda306784ad/, last access: 11 May 2020a.
Yue, T. and Zhao, N.: Digital soil mapping dataset of soil texture (soil particle-size fractions) in the upstream of the Heihe river basin (2012–2016), available at: http://data.tpdc.ac.cn/zh-hans/data/7f91d36d-8bbd-40d5-8eaf-7c035e742f40/, last access: 11 May 2020b.
Yue, T., Zhang, L., Zhao, N., Zhao, M., Chen, C., Du, Z., Song, D., Fan, Z.,
Shi, W., Wang, S., Yan, C., Li, Q., Sun, X., Yang, H., Wilson, J., and Xu,
B.: A review of recent developments in HASM, Environ. Earth Sci.,
74, 6541–6549, https://doi.org/10.1007/s12665-015-4489-1, 2015.
Yue, T., Liu, Y., Zhao, M., Du, Z., and Zhao, N.: A fundamental theorem of
Earth's surface modelling, Environ. Earth Sci., 75, 751,
https://doi.org/10.1007/s12665-016-5310-5, 2016.
Zeraatpisheh, M., Ayoubi, S., Jafari, A., and Finke, P.: Comparing the
efficiency of digital and conventional soil mapping to predict soil types in
a semi-arid region in Iran, Geomorphology, 285, 186–204,
https://doi.org/10.1016/j.geomorph.2017.02.015, 2017.
Zhang, G.: Soil texture of representative samples in the Heihe River Basin, available at: http://data.tpdc.ac.cn/zh-hans/data/b5835154-1e3c-41a4-ba6c-a6ec5c968949/, last access: 11 May 2020.
Zhang, G. and Song, X.: Digital soil mapping dataset of hydrological parameters in the Heihe River Basin (2012), available at: http://data.tpdc.ac.cn/zh-hans/data/e977f5e8-972b-42a5-bffe-cd0195f3b42b/, last access: 11 May 2020a.
Zhang, G. and Song, X.: Digital soil mapping dataset of soil depth in the Heihe River Basin (2012–2014), available at: http://data.tpdc.ac.cn/zh-hans/data/fc84083e-8c66-4a42-b729-4f19334d0d67/, last access: 11 May 2020b.
Zhang, S., Shen, C., Chen, X., Ye, H., Huang, Y., and Lai, S.: Spatial
interpolation of soil texture using compositional kriging and regression
kriging with consideration of the characteristics of compositional data and
environment variables, J. Integr. Agr., 12, 1673–1683,
https://doi.org/10.1016/s2095-3119(13)60395-0, 2013.
Zhang, X., Liu, H., Zhang, X., Yu, S., Dou, X., Xie, Y., and Wang, N.:
Allocate soil individuals to soil classes with topsoil spectral
characteristics and decision trees, Geoderma, 320, 12–22,
https://doi.org/10.1016/j.geoderma.2018.01.023, 2018.
Zhao, C. and Ma, W.: Soil physical properties-soil bulk density and mechanical composition dataset of Tianlaochi Watershed in Qilian Mountains, available at: http://data.tpdc.ac.cn/zh-hans/data/b8bfbb8b-97e4-4622-acbd-06b5ac466403/, last access: 12 May 2020.
Short summary
We systematically compared 45 models for direct and indirect soil texture classification and soil particle size fraction interpolation based on 5 machine-learning models and 3 log-ratio transformation methods. Random forest showed powerful performance in both classification of imbalanced data and regression assessment. Extreme gradient boosting is more meaningful and computationally efficient when dealing with large data sets. The indirect classification and log-ratio methods are recommended.
We systematically compared 45 models for direct and indirect soil texture classification and...