the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Estimation of groundwater age distributions from hydrochemistry: Comparison of two metamodelling algorithms in the Heretaunga Plains aquifer system, New Zealand
Christopher J. Daughney
Sapthala Karalliyadda
Brioch Hemmings
Uwe Morgenstern
Catherine Moore
Abstract. Groundwater age or residence time is important for identifying flow and contaminant pathways through groundwater systems. Typically, groundwater age and age distributions are inferred via lumped parameter models based on measured age tracer concentrations. However, due to cost and time constraints, age tracers are usually only sampled at a small percentage of the wells in a catchment. This paper describes and compares two methods to increase the number of groundwater age data points and assist with validating age distributions inferred from lumped parameter models. Two machine learning techniques with different strengths were applied to develop two independent metamodels that each aim to establish relationships between the hydrochemical parameters and the modelled groundwater age distributions in one test catchment. Ensemble medians from the best model realisations per age distribution percentile were used for comparison with the results from traditional lumped parameter models based on age tracers. Results show that both metamodelling techniques generally work well for predicting groundwater age distributions from hydrochemistry. Therefore, these techniques can be used to assist with the interpretation of lumped parameter models where age tracers have been sampled, and they can also be applied to predict groundwater age distributions for wells that have hydrochemistry data available, but no age tracer data.
- Preprint
(4290 KB) -
Supplement
(1895 KB) - BibTeX
- EndNote
Conny Tschritter et al.
Status: final response (author comments only)
-
RC1: 'Comment on hess-2022-258', Scott Wilson, 23 Sep 2022
This paper makes an excellent and novel contribution to predicting transit times using a wider dataset than isotopic age tracers. I found the text enjoyable and easy to read, and the perspective is fairly balanced. I have two main comments on the approach taken, which have some bearing on the conclusions that can be derived from this work.
The main drawback of the approach taken is that the chemically based models use the lumped model age estimates as a response variable, ie training a model on another model which is acknowledged as having shortcomings, although this is the motivation for the paper. The difficulty is that this creates an ambiguity as to whether mismatches in the trained models are due to poor lumped model estimates, or poor local performance of the trained models, or both, or something else (eg parameter or model selection). As a suggestion, an alternative or complementary approach would be to firstly train models to predict the isotopic tracer con\centration. This would provide some prior information on the mismatch between the lumped model predictions, and ensemble predictions. This approach could perhaps inform how the lumped parameter estimates could be improved, which was suggested in 553.This is not a step necessary for this paper, but perhaps something that could be carried out in future work.
The modelling approach applied here is to generate a global model from a subset of individual models, and it is assumed that the input data are spatially and temporally independent. However, the hydrochemical clustering results do indicate that the chemistry data have a predictable spatial and temporal variability. Some evidence of this infleunce is apparent in the results (eg Fig 6, T2_34), and hence in the applications section some spatial and temporal discrepancies are acknowledged. To overcome this, it may have been beneficial to train models on each hydrochemical cluster, although it has to be acknowledged that there is little data available for clusters 4 and 5. An alternative approach would be to introduce some additional predictive parameters in the model to account for spatial and temporal variability, eg elevation, depth, position, hydrochemical cluster. Some of these parameters have been used for validation, but they could also have been training parameters, or tested to see if they do inform model predictions. In doing so, one could have more confidence in the application of the models to areas with no age data.
The paper would also benefit from some corections and the clarification of some points listed below.
Title and line 60: SR and GBR methods are not metamodels per se. They are applied in this paper as metamodels because they are trained on the LPMs rather than raw observation data
97: Should be Heretaunga Plain not Plains (also elsewhere)
125: There are red lines on Fig 1 which are unreferenced. Are these flow barriers? It seems odd that there is a flow path towards a flow barrier (centre top)
154: The clustering detailed in the hydrochemistry section provides some background context, but is not used in the modelling or subsequent analysis.
219: It’s good practice to state that this is the response variable for the statistical modelling, and the hydrochemistry data are the predictor variables
247-249: How much error is the distance to these input signal datasets likely to introduce to the age estimates, and how would that compare to the error introduced by the EPM?
269: Were not was
273: The primary aim of tuning is to improve model performance, not assist convergence
278: The terms ‘chained’ and ‘unchained’ models is unorthodox, and perhaps not an apt description of what the models represent. Perhaps these would be better referred to as ‘independent’ (see line 276) or ‘individual models’, and the chained models as ‘ensemble models’
286: Why do the train/test splits differ for the two models? This approach doesn’t enable a clear comparison of modelling performance between the two models to be made
286: As a comment, a 10/90 split is quite heavy-handed and could lead to overfitting. The unchained GBR R2 values are very high, although this is also true for the SR R2 values
290: There seems to be an error in the Pearson formulas
375: Last Glacial (is a noun)
399: The third value is 1.7 (ie >1)
405: Perhaps the models could achieve good age distributions with substantially less parameters?
410: It might have been more informative to plot the cluster results here rather than the ensemble weights, since the most informative parameters are already described in the text. As a reader, I’m intrigued by the relationship between the model performance and the clusters.
434: Perhaps water chemistry has some influence of the source rock, which wouldn’t necessarily be reflected in the age estimates
517: It’s ambiguous how these parameters were treated. Were their values set to the detection limit?
522: I think this claim is a bit of a stretch since there are no spatial aspects to this study. The model is aspatial, and global, and appears to generalise well to most, but not all the data. The model has the potential to be applied to other areas with confidence if the successful or unsuccessful predictions could be identified as having an association with something eg a particular cluster. NB this comment also applies to the last sentence of the abstract.
543-547: I don’t think these statements are valid, particularly in light of the preceding sentences. There is no spatial aspect to the modelling to this modelling approach, it only uses age and chemistry data.
578: Which of these models would you have the most confidence to apply elsewhere?
Citation: https://doi.org/10.5194/hess-2022-258-RC1 -
AC1: 'Reply on RC1', Conny Tschritter, 05 Feb 2023
We have combined the review comments, and our responses, from all reviewers into one document (see attached) and introduced numbered subheadings in the remainder of this document so that we can more easily cross reference our responses between the different reviewers who raised similar concerns.
-
AC1: 'Reply on RC1', Conny Tschritter, 05 Feb 2023
-
RC2: 'Comment on hess-2022-258', Camille Bouchez, 06 Jan 2023
This work explores the use of metamodelling techniques to predict groundwater age distributions from hydrochemistry. It is a novel and interesting contribution aiming at increasing the availability of groundwater age information from easily available hydrochemical data in catchments. The knowledge gap is convincing and the paper is nicely written. However, I have some comments that should be addressed before publication.
My main concern comes from considering the LPM-derived age distributions as the true representation of groundwater age distribution, which is later used as the metamodel prediction target. I understand the interest of this choice, but I think it is a strong assumption that should be further discussed in the paper. In particular, the following points are missing:
- Where are the age tracer data? They are not in the Supplementary Material as indicated l. 219, and I could not easily find them in Morgenstern et al. 2018. There is an extensive description of how these data where acquired (l. 228-238) and how they are used to fit LPM (l. 238-264) but results are never presented in the paper while they are very important. Age tracer data fitted by the LPM must appear in Supplementary Material, to evaluate the confidence in the LPM predictions later used.
- Without this, it is hard to evaluate uncertainties associated with the LPM-derived age distributions. Would it be possible to estimate the uncertainties? How much are the trained models sensitive to the LPM? Could uncertainties in LPMs explain part of the errors?
My second main concern comes from the relationships obtained between hydrochemical data and groundwater age distribution and the processes that could explain them.
- Based on which argument and figure can you tell that “NH3-N, Fe and Mn all tend to increase with groundwater age, whereas concentrations of DO and NO3-N tend to decrease” (l. 424)? This affirmation does not appear clearly on Figure 8 and it does not appear clearly either in the correlation matrix Figure 2.
- I found interesting to try to quantify the consumption of DO in the catchment, by assuming that the organic matter oxidation is only related to DO. However, no explanations are given on how the average rate constant was derived and additional information are required. A first-order kinetics on the DOM concentration was considered, therefore not accounting for the DO concentrations (if I understood correctly from the reference given). Is it correct? It should be specified. Which groundwater age percentile was considered for the calculation? How were the DOM concentrations averaged?
- The inverse relationship between age and temperature is not expected as we would expect that older groundwater shows higher temperature. But this relationship is really strong and I think this paper would highly benefit from a close look at this relationship and clarifications in the explanations given. I do not understand the calculation of the activation energy made and I doubt the interpretation that is made from it. First, it somehow considers an aggregation of all reaction types. Secondly, where does the k1/k2=0.8 come from? Here, the age ratio is 0.8. But why would the kinetic rate ratio be equal to the age ratio? I agree that an increase in T would increase the reaction rates. However, how do you relate this to the effect of T on modelled age? Please clarify the process that is presented here to explain the inverse relationship between age and temperature. I would be more convinced by a hydrological explanation. The paper would benefit from a more convincing explanation of the relationship obtained between temperature and age.
- Relationships between Ca, Mg, Na, K and SiO2 and age would highly depend on the aquifer lithology. Would these elements be better predictors of groundwater age if an a priori classification based on the rock lithology was made?
Minor comments :
Fig.1: What are the red lines?
l. 136: it would be interesting to give the value of the recharge rate of the area
l. 140: what is the confined aquifer zone near the coast? Maybe worth showing on the map?
l. 195: there is a confusion between the text and Fig. 3, one refer to mean residence time and the other to the 50th percentile, please correct.
Figure 6: At least for the example given in Figure 6, the lumped parameter model should be described in the main text (singular or binary EPMs? Which values of the parameters?)
l. 337 : MAE : Mean Absolute Error?
Figure 8: change DRP for PO4-P as this is how it is referred to in the main text
l. 540 I wonder of the generalization of the approach and on the application of the trained model elsewhere. The obtained hydrochemistry-age relationships are not easy to explain (at least for temperature), and therefore it is difficult to tell if they are applicable elsewhere or if they are only related to some local effects. Would other predictive parameters such as depth, distance to the river, or elevation inform on water age predictions?
The authors acknowledge that the work might be only applicable to the selected catchment. Is there another similar catchment, where age data are available and where the models could be applied to determine groundwater age distributions from hydrochemistry, in order to validate the method?
Citation: https://doi.org/10.5194/hess-2022-258-RC2 -
AC2: 'Reply on RC2', Conny Tschritter, 05 Feb 2023
We have combined the review comments, and our responses, from all reviewers into one document (see attached) and introduced numbered subheadings in the remainder of this document so that we can more easily cross reference our responses between the different reviewers who raised similar concerns.
-
RC3: 'Comment on hess-2022-258', Anonymous Referee #3, 10 Jan 2023
Overall, this is an interesting metamodeling application using water quality information to emulate a lumped-parameter model and make forecasts of groundwater age. Two methods were used (gradient boosted regression and symbolic regression) with advantages to each and with generally similar performance. The authors also make a detailed interpretation of the parameter and model behavior.
This is a fine contribution and I have just a few minor comments to consider.
- Line 61: There is some ambiguity to how the model is described here. It’s not really trained on data, but rather is trained on the LPM model that, in turn, is trained on data. Being super clear here is important, particularly for readers less familiar with metamodeling
- Figure 1 and in the text: The clusters from previous work are both identified on the figure and in the text, but no context is provided beyond a reference to previous work. A sentence or two would be key to explain this.
- Figure 2 and elsewhere: Many of these water quality constituents are obviously identified by their chemical formulae, but some of not defined. Even if it’s in supplemental material, a table defining the quantities would be helpful.
- Line 290: There seems to be a formatting glitch here – hard to understand what the equation is meaning to explain.
- Line 327: more formatting glitches
- Lines 359-362: This is a great point and I appreciate the context because it’s true that the extrema of the distribution would be of interest to many users.
Citation: https://doi.org/10.5194/hess-2022-258-RC3 -
AC3: 'Reply on RC3', Conny Tschritter, 05 Feb 2023
We have combined the review comments, and our responses, from all reviewers into one document (see attached) and introduced numbered subheadings in the remainder of this document so that we can more easily cross reference our responses between the different reviewers who raised similar concerns.
-
RC4: 'Comment on hess-2022-258', Anonymous Referee #4, 10 Jan 2023
This manuscript aims at assessing the validity of using two machine leaning techniques to extrapolate beyond available groundwater age data and infer the lumped RTD from hydrochemistry.
This contribution is novel and appears quite appealing to complement tracers dataset which are costly and time consuming.
The manuscript is nicely written and easy to follow.
Major comments:
I have reservations about the choice of the LPM models as calibration targets. I understand that the study is closer to the reality in which the age distribution is unknown. Still, I consider that it would have been much stronger to test the validity of the methodology on a pure synthetic case controlling every aspect of the problem: data and associated uncertainty, full shape of the age distribution, etc. An important aspect as well is that, without a priori information about the age distribution, a few LPM differing in their hydrogeological conceptual representation can equally fit. My point is that it is difficult to evaluate the validity of a calibration or inference methodology on real largely under-constrained cases. One way to tackle this would be I think to highlight the fact that the system studied here is a “not so complex” system (a textbook system?) and have been widely studied so that the target LPM is a more than reasonable estimation (see my minor comment below).
I have reservations as well about the independence for the percentiles and the further chaining approach. It appears to me that it goes again physics and flow mechanics to consider percentiles as separate entities, and not the age distribution as a whole. My point is that a LPM or numerically-generated distribution lies on a hydrogeological conceptual representation which describes the functioning of the system. It has been shown (Leray et al, 2019: https://doi.org/10.1016/j.jhydrol.2019.04.032) that local modification of the system properties affects not only local flow lines and mass balance locally but the overall response and functioning of the system and consequently the age distribution. So it is confusing to me that the distribution is considered by part (even if the chaining approach intends to reconstruct the puzzle)
Minor comments:
Line 26: I would write the age as plural (“understanding the ages of water”) to reinforce the fact that natural groundwater systems are made of a wide variety of flow paths and consequently of residence times (or ages). If it is correct grammatically of course.
Line 53: “most such previous studies”. Revise
Line 218: I am not an expert but should it not be half of the detection limit?
Lines 252 to 254: It is argued that the EPM provided good matches for a wide range of New Zealand systems. A fault-bounded, local, relatively homogeneous and thick system with uniform recharge rate upstream and zero recharge rate downstream looks like an EPM to me. So, I think the validity of the EPM should be argued considering specific aspects of the system (that may be quite similar to other sites in New Zealand)
Line 278: to differentiate.
Citation: https://doi.org/10.5194/hess-2022-258-RC4 -
AC4: 'Reply on RC4', Conny Tschritter, 05 Feb 2023
We have combined the review comments, and our responses, from all reviewers into one document (see attached) and introduced numbered subheadings in the remainder of this document so that we can more easily cross reference our responses between the different reviewers who raised similar concerns.
-
AC4: 'Reply on RC4', Conny Tschritter, 05 Feb 2023
Conny Tschritter et al.
Conny Tschritter et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
482 | 144 | 20 | 646 | 38 | 4 | 4 |
- HTML: 482
- PDF: 144
- XML: 20
- Total: 646
- Supplement: 38
- BibTeX: 4
- EndNote: 4
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1