the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Data-driven modeling of hydraulic head time series: results and lessons learned from the 2022 groundwater modeling challenge
Abstract. This paper presents the results of the 2022 groundwater time series modeling challenge, where 15 teams from different institutes applied various data-driven models to simulate hydraulic head time series at four monitoring wells. Three of the wells were located in Europe and one in the USA, in different hydrogeological settings but all in temperate or continental climates. Participants were provided with approximately 15 years of measured heads at (almost) regular time intervals and daily measurements of weather data starting some 10 years prior to the first head measurements and extending around 5 years after the last head measurement. The participants were asked to simulate the measured heads (the calibration period), provide a forecast for around 5 years after the last measurement (the validation period for which weather data was provided but not head measurements), and to include an uncertainty estimate. Three different groups of models were identified among the submissions: lumped-parameter models (3 teams), machine learning models (4 teams), and deep learning models (8 teams). Lumped-parameter models apply relatively simple response functions with few parameters, while the artificial intelligence models used models of varying complexity, generally with more parameters and more input, including input engineered from the provided data (e.g., multi-day averages).
The models were evaluated on their performance to simulate the heads in the calibration period and the validation period. Different metrics were used to assess performance including metrics for average relative fit, average absolute fit, fit of extreme (high or low) heads, and the coverage of the uncertainty interval. For all wells, reasonable performance was obtained by at least one team from each of the three groups. However, the performance was not consistent across submissions within each groups, which implies that application of each method to individual sites requires significant effort and experience. Especially estimates of the uncertainty interval varied widely between teams, although some teams submitted confidence intervals rather than prediction intervals. There was not one team, let alone one method, that performed best for all wells and all performance metrics. Lumped-parameter models generally performed as well as artificial intelligence models, except for the well in the USA, where the lumped-parameter models did not use (or use to the full benefit) the provided river stage data, which was crucial for obtaining a good model. In conclusion, the challenge was a successful initiative to compare different models and learn from each other. Future challenges are needed to investigate, e.g., the performance of models in more variable climatic settings, to simulate head series with significant gaps, or to estimate the effect of drought periods.
- Preprint
(2485 KB) - Metadata XML
-
Supplement
(192 KB) - BibTeX
- EndNote
Status: open (until 09 Jul 2024)
-
RC1: 'Comment on hess-2024-111', Anonymous Referee #1, 25 Jun 2024
reply
The paper presents a modeling challenge performed by 15 teams from different institutions to reproduce temporal evolution of hydraulic heads at four monitoring wells, based provided meteorological data and a calibration time window with previously observed heads. The teams adopt different methods, with large predominance of methods based on artificial intelligence (AI). I find this experiment of much interest for the hydrology community, and particularly timely considering the increasing and widespread use of AI. For this reason, I think that the paper fits the quality standard of HESS and I recommend it for publication. I only have a few minor comments for the authors.
Comments to authors
Method description in section 3.1.1 to 3.1.3 could be slightly expanded to better highlight the difference between the different methods within the same category.
It is not sufficiently clear if any information about geology and setting (e.g., well depth) is provided to the teams.
Minor issues:
L1 and L51: “2022 groundwater time series modeling challenge”. I suggest to put this in italic or between quotes
L70: “not allowed to use the observed head data itself as an explanatory variable.” Could the authors develop more on this point? What kind of modeling would violate this rule?
L86: At this point of the reading, calibration and validation period have not been defined yet (except in the abstract), which might complicate the understanding of the sentence “calibrate the model without head measurements in the validation period”. I suggest to reformulate this sentence.
L96: is the descriptions in lines 97 to 121 the same that was provided to the participants?
Figure 1: Authors should provide references for the head time series in the text or in the figure.
Table 3: Team name is not indicative of the geographical provenience nor of the participants. A connection between participant names and group name should be provided. The acronyms ML and DL should be defined in the caption.
Fig. 2 and 3: Do the box plots use quartiles or 20%-80% quantiles?
Citation: https://doi.org/10.5194/hess-2024-111-RC1 -
RC2: 'Comment on hess-2024-111', Anonymous Referee #2, 25 Jun 2024
reply
This manuscript is very interesting by presenting the results from the 2022 groundwater modeling challenge. The provided data, evaluation of model results are well described. However, I suggest to reject the manuscript considering the following reasons:
(1) This manuscript is more like a summary (or technical report) of the groundwater modeling challenge. Scientific results and discussions is limited.
(2) The novelty of this work is limited both from the point of view of groundwater hydrology or the point of view of modeling. The machine learning models and deep learning models conducted in the manuscript are all classic algorithms. Furthermore, numerical models are not included.
(3) The details of the models (lump-parameters, machine learning, deep learning) are not illustrated which may because they all used the classic ones.
Citation: https://doi.org/10.5194/hess-2024-111-RC2 -
RC3: 'Comment on hess-2024-111', Anonymous Referee #3, 26 Jun 2024
reply
The paper presents a collaboration effort between 15 Teams to compare the performance of different types of models to simulate groundwater heads at four boreholes. The paper is clearly written, and I think it is of interest for hydrogeological modelers. I recommend the publication of this paper after addressing the following points:
- In the introduction, the authors argue that modelling will increase our understanding of groundwater systems. Also, they mention that AI may results in new knowledge that may be used to improve empirical and process-based groundwater models. Unfortunately, the modelling outcome is not discussed within this context and the hydro-geological characteristics of the aquifer systems hosting these boreholes are not inferred from these models and not discussed in the paper.
- Information regarding the structures of the models and how these reflects the hydro-geological settings should be included. I expect that the lumped model structures and parameters to reflect the hydro-geological characteristics. If ML models are black boxes and nothing can be inferred from them, this should be explicitly mentioned in the paper and included in the discussion. It would be good to know the opinion of the Teams regarding the use of these models as it poses a philosophical question regarding their use especially in prediction mode.
- Please revise the text describing the data used for calibration and validation. Also, validation and prediction terms are used interchangeably. It is stated that 10 years of data including groundwater levels are provided for calibration, and five years without GWLs are provided for validation. Is this meant to be for prediction? But later it is clear that GWLs for the five years are used for validation. Is the validation done by someone other than the Teams after the submission of model output? Please clarify.
- For the USA site, it is mentioned that the nearest surface water is approximately 6.8 km. Later, it has been found that the river has an important role in improving the performance of the models at this site. What is the magnitude of the river stage fluctuations? Does the porous medium hydraulic characteristics justify the river control of GWLs at a borehole that is approximately 7 km away?
- If possible, can you please explain why additional engineered input data are needed for ML and DL models and why these models are not able to self-adjust to avoid the need for this additional input?
- Can you please rewrite or simplify the statement regarding the AI models and lumped model written in Section 5.1 Lines 302 and 306 as I find it confusing.
Citation: https://doi.org/10.5194/hess-2024-111-RC3
Data sets
Data and Code from the 2022 Groundwater time series Modeling Challenge Organisers https://zenodo.org/records/10438290
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
504 | 165 | 21 | 690 | 24 | 10 | 13 |
- HTML: 504
- PDF: 165
- XML: 21
- Total: 690
- Supplement: 24
- BibTeX: 10
- EndNote: 13
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1