the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Machine-learning-based downscaling of modelled climate change impacts on groundwater table depth
Julian Koch
Lars Troldborg
Hans Jørgen Henriksen
Simon Stisen
Download
- Final revised paper (published on 23 Nov 2022)
- Preprint (discussion started on 30 May 2022)
Interactive discussion
Status: closed
-
RC1: 'Comment on hess-2022-122', Anonymous Referee #1, 13 Jul 2022
The manuscript "Machine learning-based downscaling of modelled climate change impacts on groundwater table depth" by Schneider et al. presents a novel downscaling method which uses hydrological model simulation data at a coarse scale (500 meters) together with ancillary data (e.g. topography and hydrogeologic information) to derive indicators for groundwater changes for future climate scenarios at higher spatial resolution (100 meters). Model simulations at a scale of 100 meters for five selected catchments and five input data sets from different regional climate model simulations are used as training data for the downscaling algorithm which is based on the Random Forest method. Estimates of groundwater changes at high resolution are made by using hydrologic simulations at coarse scale (500 meters) with input from 18 regional climate model simulations. The downscaling method is verified with data from a high resolution (100 meters) simulation for one additional catchment.
The topic of the paper is relevant to the hydrologic community as it describes an interesting possibility to provide stakeholders with high resolution information on potential changes in groundwater resources with an affordable computational cost. Generally, the paper is well written but there are a few issues that need to be clarified in my opinion.
General comments:
- The proposed downscaling method can be seen as a data-driven surrogate model for generating high resolution data out of the simulation results of the 500 meter model. This allows to avoid the computationally expensive direct simulation at this higher resolution, but adds some additional uncertainties and errors. In order to judge the quality and usefulness of these high resolution data, the user would still require some information on how the predictions improve when going from 500 to 100 meter resolution. Currently, the manuscript only provides information on how well the downscaling algorithm works but it does not describe the pratical benefits and improvements of the higher resolution. Hence I would suggest to add a paragraph (e.g. around line 123) that summarizes the main advantages of the high resolution model as inferred from previous comparisons of the low and high resolution model with observation data.
- In section 2.4.3 (lines 248-258) it is mentioned that additional points outside the five 'calibration' catchments were used in the calibration procedure of the algorithm to improve the robustness of the method. Can you explain in more detail what kind of robustness issues you detected? Do you have any explanation why these additional points were necessary although the five chosen 'calibration' catchments closely resempled the statistical properties for whole Denmark (Figure 2)? Which additional information did these 'dummy points' provide?- Additionally, the selection of additional calibration data through the 'dummy points' is not really in line with the argumentation in the rest of the paper which only refers to a calibration procedure with data from the five subcatchments. I would suggest to clearly state in all relevant parts how the calibration dataset was chosen (i.e. also mentioning the 'dummy points').
- Is it possible to provide guidelines on the size of the training data set? This would be an important information when applying the proposed downscaling method to other regions.
- Some plots are difficult to understand and need to be revised (see specific comments below).
Specific comments:
- Line 150: "...aggregated as described below." Please add the section number you are referring to.
- Line 154: It is not clear how the initial conditions were determined. Did you choose any random simulation time step between 1991 and 2100 as initial conditions or did you e.g. use the mean of this simulation period?
- Equation 1: Please make clear also through the notation that these statistics are calculated individually for each grid cell of the model.
- Line 218: Please provide details on the "...differences between a historic dry and wet period,...".
- Line 222: Why is the 500 meter model output interpolated to 100 meters although this does not provide further information to the downscaling method? Is it a hard requirement of the algorithm to operate on equally sized vectors? Is there any explanation why the algorithm works better with interpolated TBDV data?
- Line 392: Unit missing.
- Figure 4: The scale break in the figure is a bit counterintuitive and misleading. I would suggest to show the different factors on a plot with the same scale (0 to 1) and add an additional plot (either separate or as an inset) with the second scale.
- Figure 5: Legends for the plots in the uppermost row seem to be missing. Generally, it is not readily clear with legend applies to which subplot.
- Figure 6: It is difficult to grasp what part of the verification data is shown in the different subplots (i.e. model input or output of the downscaling method). I would suggest to improve the figure headings and the caption text to guide the reader better through the figure.
- Figure 7: Please clarify the abbreviations in the figure, e.g. nf and ff. This might be guessed from the manuscript text but should also be made clear somewhere in the figure or the figure caption.
Citation: https://doi.org/10.5194/hess-2022-122-RC1 -
AC1: 'Reply on RC1', Raphael Schneider, 18 Jul 2022
Reply to Review by Anonymous Referee #1
[Reviewer comments in normal font; Author replies in itialic]
The manuscript "Machine learning-based downscaling of modelled climate change impacts on groundwater table depth" by Schneider et al. presents a novel downscaling method which uses hydrological model simulation data at a coarse scale (500 meters) together with ancillary data (e.g. topography and hydrogeologic information) to derive indicators for groundwater changes for future climate scenarios at higher spatial resolution (100 meters). Model simulations at a scale of 100 meters for five selected catchments and five input data sets from different regional climate model simulations are used as training data for the downscaling algorithm which is based on the Random Forest method. Estimates of groundwater changes at high resolution are made by using hydrologic simulations at coarse scale (500 meters) with input from 18 regional climate model simulations. The downscaling method is verified with data from a high resolution (100 meters) simulation for one additional catchment.
The topic of the paper is relevant to the hydrologic community as it describes an interesting possibility to provide stakeholders with high resolution information on potential changes in groundwater resources with an affordable computational cost. Generally, the paper is well written but there are a few issues that need to be clarified in my opinion.Reply: We thank the reviewer for their positive and constructive feedback to improve the manuscript. Below, we outline how we consider to respond to the issues pointed out by the reviewer in the revision.
General comments:
- The proposed downscaling method can be seen as a data-driven surrogate model for generating high resolution data out of the simulation results of the 500 meter model. This allows to avoid the computationally expensive direct simulation at this higher resolution, but adds some additional uncertainties and errors. In order to judge the quality and usefulness of these high resolution data, the user would still require some information on how the predictions improve when going from 500 to 100 meter resolution. Currently, the manuscript only provides information on how well the downscaling algorithm works but it does not describe the pratical benefits and improvements of the higher resolution. Hence I would suggest to add a paragraph (e.g. around line 123) that summarizes the main advantages of the high resolution model as inferred from previous comparisons of the low and high resolution model with observation data.Reply: The reviewer raises a valid point. When originally developing/calibrating the two versions (100m and 500m) of the model, the 100m resolution performed slightly better in terms of groundwater head performance (especially for shallow wells). However, we expect the 100m model to generally be better able to reproduce fine-scale variations of the uppermost groundwater level, as these are controlled majorly by topography. And many of the relevant topographic variations will be smoothed out at 500m resolution, but remain visible in 100m resolution. These variations are hard to show with conventional groundwater observations, for example because some of the relevant areas such as river valleys are under-represented. However, we managed to show some of this benefit of the 100m by comparing satellite land surface temperature products (as a proxy for the shallow groundwater table) with modelled results across river valleys.
Plan for revision: Extend the section mentioned by the reviewer by more clearly pointing out some of the benefits of the 100m vs. 500m model wrt. the representation of the uppermost groundwater level- In section 2.4.3 (lines 248-258) it is mentioned that additional points outside the five 'calibration' catchments were used in the calibration procedure of the algorithm to improve the robustness of the method. Can you explain in more detail what kind of robustness issues you detected? Do you have any explanation why these additional points were necessary although the five chosen 'calibration' catchments closely resempled the statistical properties for whole Denmark (Figure 2)? Which additional information did these 'dummy points' provide?
Reply: With “robustness”, we mean spatial transferability/performance on the spatial hold-out. While it is true that the covariate space seems to be adequately covered by the training catchments, a random sampling of all of Denmark still seems to be adding some covariate values/covariate combinations that inform the Random Forest regressor. (furthermore, performance of a Random Forest algorithm or similar is not only determined by covering the covariate space, but also by covering the relevant combinations of the different covariates – a thought that was behind the development of the dissimilarity index by Meyer and Pebesma, 2021)
Plan for revision: Clarify the idea behind the dummy points with the above mentioned- Additionally, the selection of additional calibration data through the 'dummy points' is not really in line with the argumentation in the rest of the paper which only refers to a calibration procedure with data from the five subcatchments. I would suggest to clearly state in all relevant parts how the calibration dataset was chosen (i.e. also mentioning the 'dummy points').
Reply: That is correct, we will clarify this, e.g. in the end of the introduction and in Figure 3. However, in case this was misunderstood, we want to point out that the dummy points originate from the coarse-scale resolution run of the hydrological model, so they did not require any additional runs of fine-scale hydrological models.
- Is it possible to provide guidelines on the size of the training data set? This would be an important information when applying the proposed downscaling method to other regions.
Reply: That is a relevant question, but also a difficult one. For a start, the necessary size of the training data depends a lot on the desired application. Are we only interested in (i) predictions within very limited areas/within the training catchments, or are we – as in the manuscript – interested in (ii) an algorithm that can be extrapolated beyond its training data?
In case of (i), much smaller datasets than the one used here might be sufficient. In case of (ii), any possible answer probably is less related to a size of a training dataset, but rather to how well the training dataset covers the covariates (and covariate combinations) of the area to be extrapolated to (as also mentioned in the comment above when discussing “robustness”).
Plan for revision: Add some of these thoughts to the Discussion, section 3.3- Some plots are difficult to understand and need to be revised (see specific comments below).
Specific comments:
- Line 150: "...aggregated as described below." Please add the section number you are referring to.Reply: This refers to section 2.3.1; will be added in revision
- Line 154: It is not clear how the initial conditions were determined. Did you choose any random simulation time step between 1991 and 2100 as initial conditions or did you e.g. use the mean of this simulation period?
Reply: The initial conditions were taken from a continuous run of the 500m national hydrological model with each of the respective climate models, where we used data from the simulation time step corresponding to the start time of each of the reference, near, and far future periods.
Plan for revision: Clarify.- Equation 1: Please make clear also through the notation that these statistics are calculated individually for each grid cell of the model.
Reply: Valid point; will be made clear in revised version.
- Line 218: Please provide details on the "...differences between a historic dry and wet period,...".
Reply: For this, we took the difference between a relatively dry historic period (the 12 consecutive years between 1990 and 2001; average yearly precipitation 817mm) and a relatively wet historic period (2004 to 2015; average yearly precipitation 852mm)
Plan for revision: Clarify.- Line 222: Why is the 500 meter model output interpolated to 100 meters although this does not provide further information to the downscaling method? Is it a hard requirement of the algorithm to operate on equally sized vectors? Is there any explanation why the algorithm works better with interpolated TBDV data?
Reply: Yes, the algorithm expects equally sized vectors, i.e. some kind of resampling from the coarse to the fine resolution has to be performed. Whether an interpolation (a simple bilinear interpolation in this case; not adding any data requirements or computational bottlenecks) is necessary or not remains unclear. However, in initial tests with non-interpolated data we experienced some artefacts from the edges of the 500m data in the 100m downscaled results.
Plan for revision: Discuss this.- Line 392: Unit missing.
Reply: Thanks for noting; will be added (100 m)
- Figure 4: The scale break in the figure is a bit counterintuitive and misleading. I would suggest to show the different factors on a plot with the same scale (0 to 1) and add an additional plot (either separate or as an inset) with the second scale.
Reply: For the revision, we suggest the following: A plot with a scale of 0 to 1 for all covariates, together with an inset with the less sensitive parameters and a scale of 0 to 0.1.
- Figure 5: Legends for the plots in the uppermost row seem to be missing. Generally, it is not readily clear with legend applies to which subplot.
Reply: Correct, legends for the uppermost row are missing. This is on purpose, as the absolute values have no importance in this context; the two maps of relative topography and transmissivity in layer 1 are mostly shown to get an idea of how patterns in covariates influence patterns in the climate change impact.
Plan for revision: We would suggest to keep this as it is; however with adding an explanation and making the relation between the existing legends and submaps more clear.- Figure 6: It is difficult to grasp what part of the verification data is shown in the different subplots (i.e. model input or output of the downscaling method). I would suggest to improve the figure headings and the caption text to guide the reader better through the figure.
Reply: Good point, we will try to improve the figure; potentially by adding a “header” for each row
- Figure 7: Please clarify the abbreviations in the figure, e.g. nf and ff. This might be guessed from the manuscript text but should also be made clear somewhere in the figure or the figure caption.
Reply: The abbreviations are explained in section 2.3; however, this is long before Figure 7. We will repeat this in the caption text.
Citation: https://doi.org/10.5194/hess-2022-122-AC1
-
AC1: 'Reply on RC1', Raphael Schneider, 18 Jul 2022
-
RC2: 'Comment on hess-2022-122', Anonymous Referee #2, 19 Jul 2022
Schneider et al proposed a RF-based downscaling method to downscale changes in the simulated water table depth over Denmark from 500 m resolution to 100 m resolution under different future climate scenarios. The method was trained on data from five submodels that cover a wide range of geologic, topographic, and hydrologic variability occurring across Denmark, and validated on data from another submodel (VI). The results obtained by the proposed method outperformed 500m-resolution water table depth and its bilinear interpolation in showing the climate change-induced changes to the shallow groundwater table. The paper would be of interest to the hydrological community. Overall, it is well-written and the related questions are discussed thoroughly. However, I have the following concerns regarding the paper.
General comments:
1. Traditional downscaling techniques downscale a product at a coarse resolution to the same product at a finer resolution. Here the authors used different statistics calculated from the coarse-resolution product (TBDV). Why didn’t the authors directly use the 500m water table depth as a covariate here?
2. Which criteria did the authors use to select their validation submodel (VI. Aarhus Å/Aarhus)? From Fig.1, the submodel has a very shallow mean water table depth (0.5-2.5m). There are many areas in Demark having water table depth > 10 m, like the submodel V. I wonder if the selection of VI would give a biased conclusion for the RF validation.
3. I really like the idea to study the importance of each covariate (feature) used in RF. I also think that determining the feature importance based on ML model performance is a feasible method. However, the authors may need to check the independence of their covariates before implementing such an approach. If two or more covariates are strongly correlated, perturbing one of them may not impact the ML performance, which leads to wrong results. I would like to know how the authors dealt with this issue.
4. Please improve the quality of the figures.
Specific comments:
1. Line 102, Page 4: “referred to the provided literature”. Which literature? (Abbott et al., 1986; DHI, 2020)? Please specify there.
2. Line 112-114, Page 4: I am not an expert in hydrological model simulation, and I am a bit confused here. The authors mentioned that precipitation, temperature, and potential ET used for historic climate forcing to the DK-model HIP have various resolutions, 10 km or 20 km. However, in Line 105, they mentioned that all input data have a spatial resolution of 100 m. Therefore, did they downscale historical climate forcing data to 100 km or use them directly?
3. Climate models, Page 5: Can the authors clarify which 17 RCMs they chose and which 5 RCMs are used as a subset?
4. Line 173, Page 6: Why did the authors use changes to the 1m exceedance probability? Can the authors explain the practical meaning of this statistic?
5. Line 235, Page 8: “RF is a supervised ML learning method; that means it requires training data”. This statement is wrong in my opinion. Unsupervised ML methods also require training data. I think here the authors meant supplementary teacher signals that are used to guide the training process. In addition, ML is the abbreviation for machine learning. Please delete the extra “learning” here.
6. Please mark the locations of dummy points used in RF training in Fig.1 if possible.
7. Line 276, Page 9: I believe there should be Table 3.
8. Line 335, Page 11: which statistic does “the climate change-induced changes to the shallow groundwater table” indicate?
9. Fig.7: Please explain the legends (e.g., 500m HM intp) in the caption.
Citation: https://doi.org/10.5194/hess-2022-122-RC2 -
AC2: 'Reply on RC2', Raphael Schneider, 21 Jul 2022
Reply to Review by Anonymous Referee #2
[Reviewer comments in normal font; Author replies in itialic]
[Additional figures in attached pdf]
Schneider et al proposed a RF-based downscaling method to downscale changes in the simulated water table depth over Denmark from 500 m resolution to 100 m resolution under different future climate scenarios. The method was trained on data from five submodels that cover a wide range of geologic, topographic, and hydrologic variability occurring across Denmark, and validated on data from another submodel (VI). The results obtained by the proposed method outperformed 500m-resolution water table depth and its bilinear interpolation in showing the climate change-induced changes to the shallow groundwater table. The paper would be of interest to the hydrological community. Overall, it is well-written and the related questions are discussed thoroughly. However, I have the following concerns regarding the paper.
Reply: We thank the reviewer for their positive and constructive feedback to improve the manuscript. Below, we outline how we consider responding to the concerns raised by the reviewer in a revision of the manuscript.
General comments:
1. Traditional downscaling techniques downscale a product at a coarse resolution to the same product at a finer resolution. Here the authors used different statistics calculated from the coarse-resolution product (TBDV). Why didn’t the authors directly use the 500m water table depth as a covariate here?Reply: Actually, we use the same statistics (mean, Q01, Q99, 1mex of changes to groundwater table) in 500m resolution (resampled/interpolated to 100m) as a covariate in downscaling to the respective output in 100m (i.e. again mean, Q01, Q99, or 1mex, respectively). Hence, if we understand the reviewer’s comment correctly, our method is in that respect in line with “traditional downscaling techniques”.
On a side note: We also expect the proposed method to work with time-varying groundwater depth maps (as mentioned in the 2nd paragraph of the Conclusions).2. Which criteria did the authors use to select their validation submodel (VI. Aarhus Å/Aarhus)? From Fig.1, the submodel has a very shallow mean water table depth (0.5-2.5m). There are many areas in Demark having water table depth > 10 m, like the submodel V. I wonder if the selection of VI would give a biased conclusion for the RF validation.
Reply: The submodels were picked from a set of 10 submodels in total, which were part of the model calibration (see section 2.1.3 and Henriksen et al., 2020a). These ten submodels already were chosen to be representative of hydrologic variations across Denmark. The submodels used in the downscaling training then were further selected from these ten based on their representativeness of relevant covariates (see Figure 2).
Maybe this cannot be seen very well in Figure 1, but as can be seen in the first plot in the attached pdf file (Figure A1), submodel VI actually represents the Danish conditions quite well. The plot shows a histogram of the mean historic depth to the groundwater table, separately for Denmark, the five training submodels I to V and the validation submodel VI; with linear scale y-axis in the top plot, and log-scale in the bottom plot. Hence, while the reviewer correctly noted that there are significant areas with deeper groundwater tables in Denmark, we remain confident that those conditions are well covered by our training models (in particular submodel V) and also our validation submodel VI. (Furthermore, when it comes to vulnerability to climate change impacts, areas with currently shallow groundwater levels (within the first few metres of the surface) are of greatest interest.)
Plan for revision: Extend the section 2.3.3 concerning the submodel choice, and potentially include the validation submodel in the histograms in Figure 2 as shown in the attached Figure A1.3. I really like the idea to study the importance of each covariate (feature) used in RF. I also think that determining the feature importance based on ML model performance is a feasible method. However, the authors may need to check the independence of their covariates before implementing such an approach. If two or more covariates are strongly correlated, perturbing one of them may not impact the ML performance, which leads to wrong results. I would like to know how the authors dealt with this issue.
Reply: Thanks. We agree with the reviewer, covariate correlation can affect feature importances. The covariates we used were already selected with covariate correlation in mind. Hence, covariate correlation is low for most of the covariates – see Figure A2 for a matrix of pairwise covariate correlations (Pearson’s R) in the attached pdf.
Plan for revision: Mention the issue of covariate correlation in the manuscript (likely in section 3.1). Potentially extend the feature importance analysis by a version where not only one covariate at a time is perturbed, but a whole group of (correlated) covariates – similar to Figure 4 in Koch et al., 2019a. In our case that for example would mean to perturb all the (moderately correlated) “kh_mean” covariates at a time for the feature importance analysis.4. Please improve the quality of the figures.
Reply: By that you mean the resolution and compression artefacts? In that case, we assume that the final article will be compiled in a different manner; the current quality issues are due to the pdf compiling.
Specific comments:
1. Line 102, Page 4: “referred to the provided literature”. Which literature? (Abbott et al., 1986; DHI, 2020)? Please specify there.Reply: Here we mean the various references provided throughout sections 2.1 covering different aspects of the DK-model (subsurface parameterization, climate input, MIKE SHE …).
Plan for revision: Clarify this.2. Line 112-114, Page 4: I am not an expert in hydrological model simulation, and I am a bit confused here. The authors mentioned that precipitation, temperature, and potential ET used for historic climate forcing to the DK-model HIP have various resolutions, 10 km or 20 km. However, in Line 105, they mentioned that all input data have a spatial resolution of 100 m. Therefore, did they downscale historical climate forcing data to 100 km or use them directly?
Reply: Valid point. The climate forcing (at 10km/20km resolution) is interpolated to the model grid (500m or 100m).
Plan for revision: Clarify this.3. Climate models, Page 5: Can the authors clarify which 17 RCMs they chose and which 5 RCMs are used as a subset?
Reply: Plan for revision: Add a table with the requested information.
4. Line 173, Page 6: Why did the authors use changes to the 1m exceedance probability? Can the authors explain the practical meaning of this statistic?
Reply: Good question, relevant to be clarified. The threshold of 1m was chosen in connection with stakeholders and users of the data. Water levels closer than a certain threshold to the surface can create various challenges in agriculture, infrastructure and flooding. In this context, a threshold of 1m was considered relevant (also, the widespread tile drains in Danish agriculture are located at around 1m depth). The exceedance probability then indicates how often (during an average year) the respective threshold of 1m is exceeded, and how that probability changes with climate change.
5. Line 235, Page 8: “RF is a supervised ML learning method; that means it requires training data”. This statement is wrong in my opinion. Unsupervised ML methods also require training data. I think here the authors meant supplementary teacher signals that are used to guide the training process. In addition, ML is the abbreviation for machine learning. Please delete the extra “learning” here.
Reply: The reviewer is correct; we were not precise enough with the choice of our words here. We suggest reformulating to “RF is a supervised ML method, requiring labelled training data. Based on the training dataset, a RF regressor model learns about relationships between a set of covariates and the target (training) data values.”
Thanks for also noting the typo.6. Please mark the locations of dummy points used in RF training in Fig.1 if possible.
Reply: Due to the large number of dummy points (20,000), we think it is difficult to show them on the map. They are sampled randomly in space, from all of Denmark except for the areas covered by the training submodels.
7. Line 276, Page 9: I believe there should be Table 3.
Reply: Thanks for spotting that mistake; will be corrected.
8. Line 335, Page 11: which statistic does “the climate change-induced changes to the shallow groundwater table” indicate?
Reply: Figure 7 gives an overview over all the eight TBDV (i.e. mean, Q01, Q99, and 1mex for both near and far future). Plan for revision: Clarify this.
9. Fig.7: Please explain the legends (e.g., 500m HM intp) in the caption.
Reply: Valid point, will be added in the revision.
-
AC2: 'Reply on RC2', Raphael Schneider, 21 Jul 2022