Preprints
https://doi.org/10.5194/hess-2021-261
https://doi.org/10.5194/hess-2021-261

  30 Jun 2021

30 Jun 2021

Review status: this preprint is currently under review for the journal HESS.

Preprocessing approaches in machine learning-based groundwater potential mapping: an application to the Koulikoro and Bamako regions, Mali

Víctor Gómez-Escalonilla1, Pedro Martínez-Santos1, and Miguel Martín-Loeches2 Víctor Gómez-Escalonilla et al.
  • 1UNESCO/UNITWIN Chair Appropriate Technologies for Human Development. Department of Geodynamic, Stratigraphy and Paleontology, Faculty of Geology, Complutense University of Madrid, C/José Antonio Novais 12, 28040 Madrid, Spain
  • 2Department of Geology, Geography and Environment. Geology UD, University of Alcalá, Alcalá de Henares, Madrid, Spain

Abstract. Groundwater is crucial for domestic supplies in the Sahel, where the strategic importance of aquifers can only be expected to increase in the coming years due to climate change. Groundwater potential mapping is gaining recognition as a valuable tool to underpin water management practices in the region, and hence, to improve water access. This paper presents a machine learning method to map groundwater potential and illustrates it through an application to two regions of Mali. A set of explanatory variables for the presence of groundwater is developed first. Several scaling methods (standardization, normalization, maximum absolute value and min-max scaling) are used to avoid the pitfalls associated with the reclassification of explanatory variables. A number of supervised learning classifiers is then trained and tested on a large borehole database (n = 3,345) in order to find meaningful correlations between the presence or absence of groundwater and the explanatory variables. This process identifies noisy, collinear and counterproductive variables and excludes them from the input dataset. Tree-based algorithms, including the AdaBoost, Gradient Boosting, Random Forest, Decision Tree and Extra Trees classifiers were found to outperform other algorithms on a consistent basis (accuracy > 0.85), whereas maximum absolute value and standardization proved the most efficient methods to scale explanatory variables. Borehole flow rate data is used to calibrate the results beyond standard machine learning metrics, thus adding robustness to the predictions. The southern part of the study area was identified as the better groundwater prospect, which is consistent with the geological and climatic setting. From a methodological standpoint, the outcomes lead to three major conclusions: (1) because there is no aprioristic way to know which algorithm will work better on a given dataset, we advocate the use of a large number of machine learning classifiers, out of which the best are subsequently picked for ensembling; (2) standard machine learning metrics may be of limited value when appraising map outcomes, and should be complemented with hydrogeological indicators whenever possible; and (3) the scaling of the variables helps to minimize bias arising from expert judgement and maintains robust predictive capabilities.

Víctor Gómez-Escalonilla et al.

Status: final response (author comments only)

Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor | : Report abuse
  • RC1: 'Comment on hess-2021-261', Anonymous Referee #1, 05 Aug 2021
    • AC1: 'Reply on RC1', Victor Gómez-Escalonilla, 03 Sep 2021
  • RC2: 'Comment on hess-2021-261', Anonymous Referee #2, 09 Aug 2021
    • AC2: 'Reply on RC2', Victor Gómez-Escalonilla, 03 Sep 2021

Víctor Gómez-Escalonilla et al.

Víctor Gómez-Escalonilla et al.

Viewed

Total article views: 462 (including HTML, PDF, and XML)
HTML PDF XML Total BibTeX EndNote
343 103 16 462 2 3
  • HTML: 343
  • PDF: 103
  • XML: 16
  • Total: 462
  • BibTeX: 2
  • EndNote: 3
Views and downloads (calculated since 30 Jun 2021)
Cumulative views and downloads (calculated since 30 Jun 2021)

Viewed (geographical distribution)

Total article views: 385 (including HTML, PDF, and XML) Thereof 385 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Latest update: 21 Oct 2021
Download
Short summary
Many communities in the Sahel rely solely on groundwater. We develop a machine learning technique to map areas of groundwater potential. Algorithms are trained to detect areas where there is a confluence of factors that facilitate groundwater occurrence. Our contribution focuses on using variable scaling to minimize expert bias and on testing our results beyond standard metrics. This approach is illustrated through its application to two administrative regions of Mali.