Articles | Volume 28, issue 13
https://doi.org/10.5194/hess-28-2949-2024
© Author(s) 2024. This work is distributed under the Creative Commons Attribution 4.0 License.
High-resolution long-term average groundwater recharge in Africa estimated using random forest regression and residual interpolation
Download
- Final revised paper (published on 05 Jul 2024)
- Supplement to the final revised paper
- Preprint (discussion started on 06 Oct 2023)
- Supplement to the preprint
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on egusphere-2023-1898', Anonymous Referee #1, 06 Nov 2023
- AC1: 'Reply on RC1', Anna Pazola, 08 Feb 2024
-
RC2: 'Comment on egusphere-2023-1898', Anonymous Referee #2, 21 Nov 2023
- AC2: 'Reply on RC2', Anna Pazola, 08 Feb 2024
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
ED: Publish subject to minor revisions (further review by editor) (09 Feb 2024) by Marnik Vanclooster
AR by Anna Pazola on behalf of the Authors (08 Apr 2024)
Author's response
Author's tracked changes
Manuscript
ED: Publish as is (14 May 2024) by Marnik Vanclooster
AR by Anna Pazola on behalf of the Authors (20 May 2024)
The authors use an RF approach to generate spatial long-term average groundwater recharge for Africa based on 134 recharge values from the literature and compare their results with the field observations and a previous publication using an LMM (linear mixing model). The results are generated and compared for two spatial resolutions. The RF approach is very similar to LMM but offers a higher spatial variability than LMM and therefore also shows small-scale trends.
Even though the approach is generally ok, the manuscript is very well written and the workflow and code(s) is available through github (which I really appreciate), I still have some critical points that should be considered and discussed in detail in a revised version.
I'm somewhat unsure about the better spatial resolution of the results. Just because the resolution is better doesn't mean the results are more reliable. There is a very large uncertainty due to the few observations and their distribution but the maps suggest a much better and more robust result and this is dangerous. What would be the next step with the results or what can the better spatial resolution be used for? If the data is extracted directly from the maps (for water budget calculations, for example) this can lead to very distorted results, as the simulated recharge values are very uncertain for many areas. I believe the whole uncertain should be better discussed and the maps must better highlight the uncertainties (maybe with transparent colors, see my comment below)
I wonder why, for example, seasonality in precipitation is not present in the climatic input data. In some regions, precipitation only falls in a few months and therefore the processes for recharge are significantly different for conditions when precipitation is distributed throughout the year. Yes, LMM or RF show a good fit /regression, but certain parameters may compensate for the missing input. Also, of course, the relative importance does not show the importance of seasonality but only because this has not been tested in the RF (although it was in the previous work using LMM, but this is not transferable directly to the RF approach). Similar for depth to groundwater table (or call it unsaturated zone thickness) which is important for recharge processes, rate and timing. How important is this input for the RF algorithm and for the process description. I also wonder why distance to rivers is not included as an (raster)input, perhaps paired with discharge rates. This would help to better capture the important process of groundwater-surface water interaction and bank filtration, which many of the authors know better than I do.
Of course there is a large uncertainty in the precipitation data sets and in the timing of recharge, but wouldn't it be possible to minimize these uncertainties and also the scaling (regression is dominated by the high recharge values) significantly by using the recharge / precipitation ratio and obtain more robust results? It would be nice if this can be discussed and tested more.
How does the spatially uneven distribution of the observations affect the results? Wouldn't it make more sense to show only the more robust areas and show the very uncertain ones transparently? Since not all climatic conditions have been covered, would clustering be useful to minimize the spatial discrepancy and influence?
Is the correlation of the aridity index with precipitation and ET not a problem for parameter estimation and generally with all estimation methods? Aridity is based on P and ET, and I wonder what is the advantage of using all three parameters? Looking at the SI, precipitation and aridity are the most important parameters, and I wonder what the results would look like if only aridity was used. When I see table S4, I wonder why the results look almost the same for training and test, even if only P us used.
I'm not an expert on RF, but aren't the results validated using the ROC curve and sensitivity, specificity and accuracy rather than just the regression? That would be more informative about the model results and robustness instead of using only a regression, or?
Line 451: Also process based models require careful input selection and quantification of uncertainties in the input dataset.