<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">HESS</journal-id><journal-title-group>
    <journal-title>Hydrology and Earth System Sciences</journal-title>
    <abbrev-journal-title abbrev-type="publisher">HESS</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Hydrol. Earth Syst. Sci.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1607-7938</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/hess-30-2373-2026</article-id><title-group><article-title>Never Train a Deep Learning Model on a Single Well? Revisiting Training Strategies for Groundwater Level Prediction</article-title><alt-title>Never Train a Deep Learning Model on a Single Well?</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Ohmer</surname><given-names>Marc</given-names></name>
          <email>marc.ohmer@kit.edu</email>
        <ext-link>https://orcid.org/0000-0002-2322-335X</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Liesch</surname><given-names>Tanja</given-names></name>
          
        <ext-link>https://orcid.org/0000-0001-8648-5333</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>Institute for Applied Geosciences (AGW), Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Marc Ohmer (marc.ohmer@kit.edu)</corresp></author-notes><pub-date><day>27</day><month>April</month><year>2026</year></pub-date>
      
      <volume>30</volume>
      <issue>8</issue>
      <fpage>2373</fpage><lpage>2394</lpage>
      <history>
        <date date-type="received"><day>19</day><month>August</month><year>2025</year></date>
           <date date-type="rev-request"><day>6</day><month>October</month><year>2025</year></date>
           <date date-type="rev-recd"><day>9</day><month>March</month><year>2026</year></date>
           <date date-type="accepted"><day>5</day><month>April</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Marc Ohmer</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://hess.copernicus.org/articles/30/2373/2026/hess-30-2373-2026.html">This article is available from https://hess.copernicus.org/articles/30/2373/2026/hess-30-2373-2026.html</self-uri><self-uri xlink:href="https://hess.copernicus.org/articles/30/2373/2026/hess-30-2373-2026.pdf">The full text article is available as a PDF file from https://hess.copernicus.org/articles/30/2373/2026/hess-30-2373-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e86">Deep learning (DL) models are increasingly used for hydrological forecasting, with a growing shift from site-specific to globally trained architectures. This study tests whether the widely held assumption that global models consistently outperform local ones also applies to groundwater systems, which differ substantially from surface water due to slow response dynamics, data scarcity, and strong site heterogeneity. Using a benchmark dataset of nearly 3000 monitoring wells across Germany, we systematically compare global Long Short-Term Memory (LSTM) models with locally trained single-well models in terms of overall performance, training data characteristics, prediction of extremes, and spatial generalization.</p>

      <p id="d2e89">For groundwater level prediction, we find that global models provide no systematic accuracy advantage over local models. Local models more often capture site-specific behavior, while global models yield more robust but less specialized predictions across diverse wells. Performance gains arise primarily from dynamically coherent training data, whereas random data reduction has little effect, indicating that similarity matters more than quantity in this setting. Both model types struggle with extreme groundwater conditions, and global models generalize reliably only to wells with comparable dynamics.</p>

      <p id="d2e92">These findings qualify the assumption of global model superiority and highlight the need to align modeling strategies with groundwater-specific constraints and application goals.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

      
<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e106">In recent years, deep learning (DL) has transformed hydrological forecasting, and global models often outperform site-specific approaches for streamflow prediction. In the surface-water domain, this progress is underpinned by large-sample benchmarks and systematic assessments of cross-catchment transfer <xref ref-type="bibr" rid="bib1.bibx15 bib1.bibx18 bib1.bibx26" id="paren.1"><named-content content-type="pre">e.g.,</named-content></xref>.</p>
      <p id="d2e114">It is still an open question whether the success of global DL models in streamflow prediction carries over to groundwater. Compared to surface waters, groundwater dynamics are often slower, more heterogeneous, and monitored by much sparser observations. With large-sample benchmarks and community intercomparisons for groundwater head forecasting only now emerging <xref ref-type="bibr" rid="bib1.bibx9 bib1.bibx30" id="paren.2"/>, it remains unclear whether global training yields consistent performance gains, or whether single-well models are better suited to groundwater-specific behavior.</p>
      <p id="d2e120">Traditionally, hydrological predictions have relied on physically based, process-oriented models. While powerful, these models demand extensive domain expertise, high-quality input data, and often face considerable implementation challenges, inherent uncertainties, and limited transferability across regions <xref ref-type="bibr" rid="bib1.bibx25" id="paren.3"/>. For groundwater, additional hurdles arise from geological complexity and the need for long observation periods supported by costly monitoring networks. Further, groundwater head dynamics are often strongly affected by pumping that is rarely available in global datasets, and aquifer boundaries are more difficult to delineate than surface-water catchments, complicating spatial transfer <xref ref-type="bibr" rid="bib1.bibx5" id="paren.4"/>.</p>
      <p id="d2e129">Against this backdrop, data-driven methods, particularly DL, offer a compelling alternative. These models can learn hydrological relationships directly from data, reducing the need for detailed local information <xref ref-type="bibr" rid="bib1.bibx12 bib1.bibx10" id="paren.5"/>, and efficiently capture nonlinear, time-lagged dependencies that characterize systems with strong storage effects such as groundwater, soil moisture, or snowmelt processes <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx7" id="paren.6"/>. Common DL architectures include recurrent neural networks (RNNs) such as Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), as well as convolutional neural networks (CNNs).</p>
      <p id="d2e139">Driven by the increasing availability of hydrological data and advances in machine learning, modeling strategies have shifted from locally calibrated, site-specific approaches toward regional and global architectures trained on data from many catchments simultaneously, with the aim of extracting generalizable patterns from distributed time series <xref ref-type="bibr" rid="bib1.bibx27 bib1.bibx18" id="paren.7"/>. Analogous multi-well training strategies are increasingly explored for groundwater head prediction <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx5 bib1.bibx13 bib1.bibx19" id="paren.8"/>.</p>
      <p id="d2e148">Current DL strategies in groundwater hydrogeology can thus be broadly categorized into two types: local (single-well) models and global models, the latter sometimes further refined into partitioned subsets motivated by spatial or dynamic similarity.</p>
<sec id="Ch1.S1.SS1">
  <label>1.1</label><title>Local Models (Single-Well Models)</title>
      <p id="d2e158">Single-well  models (also referred to as local or single-station models in the groundwater literature) are trained individually for each monitoring well, based on the assumption that each time series originates from its own data-generating process <xref ref-type="bibr" rid="bib1.bibx8" id="paren.9"/>. These models enable a detailed representation of local hydrogeological characteristics and dynamic changes <xref ref-type="bibr" rid="bib1.bibx36 bib1.bibx7 bib1.bibx33" id="paren.10"/>, offering high interpretability due to their sensitivity to site-specific input features and time windows. However, their applicability is primarily limited by a tendency toward overfitting <xref ref-type="bibr" rid="bib1.bibx23 bib1.bibx8" id="paren.11"/> and a lack of generalizability to other locations, as models must be trained separately for each well. Consequently, local models cannot exploit spatial variability or regional dynamics present in different time series across the monitoring network <xref ref-type="bibr" rid="bib1.bibx3" id="paren.12"/>.</p>
</sec>
<sec id="Ch1.S1.SS2">
  <label>1.2</label><title>Global Models</title>
      <p id="d2e181">Global models (also referred to as regional or multi-well/multi-site models) are trained on combined data from multiple monitoring wells and can generate predictions for all locations included in the training set <xref ref-type="bibr" rid="bib1.bibx23 bib1.bibx19 bib1.bibx13" id="paren.13"/>. This approach enables efficient use of large datasets and facilitates information sharing across the entire network <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx5" id="paren.14"/>. Owing to their architecture, global LSTM models can identify and generalize co-occurring patterns across time series <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx13" id="paren.15"/>. A key motivation is spatial transfer, i.e., applying a model trained on many wells to entirely withheld wells (spatial out-of-sample sites) when suitable static descriptors are available; in streamflow hydrology, this is closely related to the PUB paradigm (Prediction in Ungauged Basins). Streamflow studies have shown that global models can improve generalization across basins, including transferability to ungauged or data-sparse basins and unseen periods <xref ref-type="bibr" rid="bib1.bibx16 bib1.bibx21 bib1.bibx38" id="paren.16"/>. For groundwater head prediction, however, systematic large-sample evidence for spatial out-of-sample generalization remains limited, and recent work suggests that spatial transfer can be challenging <xref ref-type="bibr" rid="bib1.bibx13" id="paren.17"/>, partly because static attributes may act primarily as identifiers rather than enabling transferable representations <xref ref-type="bibr" rid="bib1.bibx13" id="paren.18"/>. Additional benefits include the ability to model long-memory patterns, robustness to data gaps, and potentially higher computational efficiency compared to training many local models individually <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx5" id="paren.19"/>. In surface-water benchmarking studies, global models have frequently outperformed conventional, locally calibrated hydrological models <xref ref-type="bibr" rid="bib1.bibx16 bib1.bibx38" id="paren.20"/>.</p>
      <p id="d2e209">However, the often-assumed superiority of global models is increasingly questioned. In a large-scale evaluation, <xref ref-type="bibr" rid="bib1.bibx32" id="text.21"/> found that a Google-developed global streamflow model (trained on 5680 basins) underperformed locally trained single-basin models in 46 % of 609 catchments and substantially underestimated high flows (95th–99th percentiles) by an average of 45 %. Consistently, their meta-analysis of 123 studies showed that single-basin models frequently attain high skill (NSE <inline-formula><mml:math id="M1" display="inline"><mml:mo>≥</mml:mo></mml:math></inline-formula> 0.75 in <inline-formula><mml:math id="M2" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">92</mml:mn></mml:mrow></mml:math></inline-formula> % of cases), underscoring that local models can be highly competitive.</p>
      <p id="d2e232">Although this evidence comes from the surface-water domain, similar limitations have been reported for globally trained models in groundwater head prediction. In the presence of highly heterogeneous groundwater head time series, performance can decline <xref ref-type="bibr" rid="bib1.bibx5 bib1.bibx8" id="paren.22"/>. Global models tend to focus on dominant shared patterns, potentially at the expense of local variability. Learning nonlinear relationships between inputs and targets can become challenging when fundamentally different dynamical behaviors are combined during training <xref ref-type="bibr" rid="bib1.bibx39" id="paren.23"/>. Furthermore, studies have shown that static features such as geology, climate, or land use often fail to create proper entity awareness and instead act merely as identifiers <xref ref-type="bibr" rid="bib1.bibx13" id="paren.24"/>, which limits spatial generalization. Deficits have also been observed for extreme groundwater conditions, for example, due to saturation effects in the LSTM architecture or underestimation of extremes <xref ref-type="bibr" rid="bib1.bibx4 bib1.bibx33" id="paren.25"/>. More generally, sequence models such as LSTMs rely on bounded gating and activation functions, which can limit extrapolation beyond the training range and contribute to biased predictions under unprecedented extremes <xref ref-type="bibr" rid="bib1.bibx4" id="paren.26"/>. Finally, the black-box nature of deep neural networks remains a key challenge for decision support in groundwater management, especially for global models, as they capture complex, cross-site patterns that reduce the transparency of local relationships <xref ref-type="bibr" rid="bib1.bibx10 bib1.bibx13 bib1.bibx2" id="paren.27"/>.</p>
</sec>
<sec id="Ch1.S1.SS3">
  <label>1.3</label><title>Partitioned Models</title>
      <p id="d2e262">Partitioned models (also referred to as clustering-based or subgroup-specific models) are essentially global models trained on subgroups of monitoring wells that share similar temporal dynamics or static attributes. These models operate on more homogeneous subgroups of time series, which are typically formed through data-driven methods (e.g., clustering algorithms) or domain-specific groupings. The objective is to homogenize training data and to specifically align modeling capacity with similar time series types <xref ref-type="bibr" rid="bib1.bibx3" id="paren.28"/>. Clustering is usually based on (i) dynamic time series features such as trend, seasonality, autocorrelation, or entropy <xref ref-type="bibr" rid="bib1.bibx37 bib1.bibx10" id="paren.29"/>; (ii) spectral characteristics to separate typical frequency patterns <xref ref-type="bibr" rid="bib1.bibx5" id="paren.30"/>; (iii) shape-based similarity metrics such as Grey Relational Analysis (GRA) <xref ref-type="bibr" rid="bib1.bibx39" id="paren.31"/>; or (iv) static site attributes such as climate, geology, or topography <xref ref-type="bibr" rid="bib1.bibx19" id="paren.32"/>. In the surface-water domain, analogous groupings can also be based on static catchment attributes <xref ref-type="bibr" rid="bib1.bibx18" id="paren.33"/>. Subsequently, a dedicated model is trained for each group. Several studies have shown that partitioned models are often more robust to heterogeneity than fully global approaches, particularly when time series exhibit strongly divergent dynamics <xref ref-type="bibr" rid="bib1.bibx5" id="paren.34"/>. By focusing on homogeneous subgroups, partitioned models can enhance both predictive performance and interpretability <xref ref-type="bibr" rid="bib1.bibx39" id="paren.35"/>.</p>
</sec>
<sec id="Ch1.S1.SS4">
  <label>1.4</label><title>Research Questions and Objectives</title>
      <p id="d2e298">In light of these developments, this study aims to systematically compare the predictive performance of global and local deep learning (DL) models for groundwater level forecasting. The central question is whether the advantages of globally trained models, whose superior performance has been widely demonstrated in hydrological streamflow modeling, can be transferred to hydrogeological applications, particularly under the specific conditions of diverse system dynamics; ranging from highly dynamic behavior in karstic aquifers to inertial responses in low-permeability porous aquifers; as well as heterogeneous site conditions and limited data availability.</p>
      <p id="d2e301">In contrast to previous studies, the analysis is based on an extensive, Germany-wide groundwater level benchmark dataset comprising nearly 3000 monitoring wells <xref ref-type="bibr" rid="bib1.bibx30" id="paren.36"/>, spanning over three decades. The associated spatial and dynamic diversity enables a differentiated assessment of the generalizability of data-driven models across different geological and climatic settings.</p>
      <p id="d2e307">The core research questions are:</p>
      <p id="d2e310"><list list-type="order">
            <list-item>

      <p id="d2e315"><italic>Overall Model Performance</italic>: Are globally trained LSTM models generally superior to local (single-well) models in terms of overall predictive accuracy across a large and heterogeneous set of monitoring wells?</p>
            </list-item>
            <list-item>

      <p id="d2e323"><italic>Influence of the Training Data Basis</italic>: How does the predictive performance of global models depend on the characteristics of the training dataset, in particular the number of training wells and the degree of dynamic similarity among them, and the length of the available training record?</p>
            </list-item>
            <list-item>

      <p id="d2e331"><italic>Prediction of Extreme Events</italic>: Are globally trained models better than single-well models in predicting groundwater-level extremes (e.g., drought lows and high peaks) that were not observed during training? How does predictive performance under extrapolative conditions depend on the size of the training dataset and its degree of dynamic similarity?</p>
            </list-item>
            <list-item>

      <p id="d2e339"><italic>Out-of-Sample Spatial Prediction</italic>: How well can global models predict groundwater levels at monitoring wells that were entirely withheld from the training data (leave-well-out spatial out-of-sample sites)?</p>
            </list-item>
          </list></p>
      <p id="d2e347">To address these questions, we conducted a comprehensive experimental comparison. The experiments involve global LSTM models, trained either on the full dataset or on differently partitioned subsets, and locally trained CNN single-well models. All models are evaluated on the same standardized data basis, using test designs that systematically vary the size and dynamic composition of the training dataset, as well as extrapolative settings, including extreme groundwater levels outside the training range (below the well-specific 1st percentile or above the 99th percentile of the training distribution) and spatial out-of-sample prediction at monitoring wells withheld from training.</p>
</sec>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Data</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Groundwater Level Data</title>
      <p id="d2e366">The analysis is based on the GEMS-GER dataset <xref ref-type="bibr" rid="bib1.bibx30" id="paren.37"/>, which provides standardized groundwater-level observations and associated predictor variables for Germany. The dataset contains weekly time series from 3207 monitoring wells for 1991–2022, covering all major hydrogeological regions and a wide range of aquifer types and system dynamics. For this study, we used a filtered subset of 2951 wells, excluding sites that achieved an NSE <inline-formula><mml:math id="M3" display="inline"><mml:mrow><mml:mo>≤</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> across all three benchmark baseline models provided with the GEMS-GER benchmark workflow (single-well CNN and global LSTM). Details on data sources, preprocessing, and quality control are provided in the dataset description. The full dataset is openly available via Zenodo (DOI: <ext-link xlink:href="https://doi.org/10.5281/zenodo.15530171" ext-link-type="DOI">10.5281/zenodo.15530171</ext-link>, <xref ref-type="bibr" rid="bib1.bibx28" id="altparen.38"/>). The spatial distribution of the monitoring wells is shown in Appendix A (Fig. <xref ref-type="fig" rid="FA1"/>), and representative time series are shown in Appendix B (Fig. <xref ref-type="fig" rid="FB1"/>).</p>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Dynamic Input Variables</title>
      <p id="d2e402">Each groundwater time series is complemented by dynamic input variables representing meteorological and hydrological conditions, including precipitation, temperature, relative humidity, evapotranspiration, soil moisture, soil temperature, snowmelt, snow water equivalent, and surface as well as subsurface runoff. These variables are taken from the GEMS-GER dataset and were used here exactly as defined therein <xref ref-type="bibr" rid="bib1.bibx30" id="paren.39"/>. All dynamic inputs are provided at weekly resolution; daily fields were aggregated to weekly values using variable-specific operators (weekly mean or weekly sum depending on the variable; see Table 1 in the GEMS-GER dataset paper). We did not compute additional multi-window indices or running aggregations (e.g., SPI or rolling net-precipitation sums) beyond the weekly aggregation already defined in GEMS-GER. HYRAS-based variables were selected whenever available (higher spatial resolution), and ERA5-Land was only used to complement variables not provided by HYRAS.</p>
</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Static Site Attributes</title>
      <p id="d2e416">In addition to dynamic inputs, each monitoring well is characterized by a set of more than 50 static attributes, including hydrogeological, topographic, soil, and land-use properties. Static attributes are time-invariant site descriptors and do not include statistics derived from the groundwater-level time series (e.g., mean head or standard deviation). From the full set of static features provided in the dataset <xref ref-type="bibr" rid="bib1.bibx30" id="paren.40"/>, variables related to well depth, screen characteristics, pumping, and pressure state were excluded, as these were sparsely available for the majority of monitoring wells. All categorical static features were label-encoded for use in the machine-learning models.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Methods</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Modeling Strategies</title>
      <p id="d2e438">We implemented and compared two main types of modeling strategies: local (single-well) models and global models. The latter were trained either on the full dataset or on differently partitioned subsets (referred to as partitioned models). Throughout this study, we refer to local (single-well) models as S, global models as G, and partitioned variants as S-P<inline-formula><mml:math id="M4" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> and G-P<inline-formula><mml:math id="M5" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>, where <inline-formula><mml:math id="M6" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> indicates the partition stage. To indicate the partitioning strategy, the subscript COR denotes correlation-based removal (increasing dynamic similarity), and the subscript RND denotes random removal. <list list-type="bullet"><list-item>
      <p id="d2e464"><italic>(i) Local (Single-Well) Models (S-P0):</italic> Independent CNN models were trained for each monitoring well using only local dynamic input variables. These models serve as a site-specific baseline without transferring cross-site information. All single-well models were trained for each of the 2951 wells (Stage P0).</p></list-item><list-item>
      <p id="d2e470"><italic>(ii) Global Model (G-P0):</italic> A single LSTM model was trained on all 2951 wells of Stage P0 jointly, using both dynamic and static input features to learn generalizable spatio-temporal patterns.</p></list-item><list-item>
      <p id="d2e476"><italic>(iii) Partitioned Models (S-Px, G-Px):</italic> To assess the influence of training set composition, we implemented a series of partitioned models derived from the P0 dataset. For both partitioning strategies, stages are cumulative: P<inline-formula><mml:math id="M7" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> denotes the subset obtained after removing <inline-formula><mml:math id="M8" display="inline"><mml:mrow><mml:mi>x</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">500</mml:mn></mml:mrow></mml:math></inline-formula> wells from P0 (i.e., P1: 500 removed, P2: 1000 removed, …, P5: 2500 removed), resulting in <inline-formula><mml:math id="M9" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="normal">P</mml:mi><mml:mn mathvariant="normal">0</mml:mn><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2951</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M10" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="normal">P</mml:mi><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2451</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M11" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="normal">P</mml:mi><mml:mn mathvariant="normal">2</mml:mn><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1951</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M12" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="normal">P</mml:mi><mml:mn mathvariant="normal">3</mml:mn><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1451</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M13" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="normal">P</mml:mi><mml:mn mathvariant="normal">4</mml:mn><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mn mathvariant="normal">951</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="normal">P</mml:mi><mml:mn mathvariant="normal">5</mml:mn><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mn mathvariant="normal">451</mml:mn></mml:mrow></mml:math></inline-formula> wells.</p></list-item></list></p>
      <p id="d2e621">The partitioning procedure is defined as follows:</p>
      <p id="d2e624"><list list-type="bullet">
            <list-item>

      <p id="d2e629"><italic>Stages P1–P5</italic><sub><italic>COR</italic></sub>: Starting from P0, an additional 500 wells were successively removed in each stage based on their dynamic dissimilarity to other wells. To quantify this, we computed the pairwise <italic>absolute</italic> Pearson correlations between the standardized groundwater level time series. Each well’s dynamic <italic>representativeness</italic> was then defined as the mean absolute correlation with all others. Wells with the lowest representativeness were considered least typical in terms of dynamics and removed first, resulting in subsets with increasing internal similarity. We deliberately did not impose a spatial constraint on the similarity criterion, as dynamically similar groundwater responses are not necessarily local and preserving such non-local analogs is part of the motivation for global learning. While spatio-dynamic clustering is a plausible alternative, it introduces additional design choices (e.g., spatial weighting and cluster definition) and would make the controlled, stage-wise comparability of the progressive reduction (P0–P5) less straightforward.</p>
            </list-item>
            <list-item>

      <p id="d2e651"><italic>Stages P1–P5</italic><sub><italic>RND</italic></sub><italic>:</italic> In parallel, random removal of 500 wells per stage was applied to generate baseline subsets with the same size progression and to serve as a control for the correlation-based strategy.</p>
            </list-item>
          </list></p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Model Architectures</title>
      <p id="d2e676">All models in this study follow the benchmark architectures introduced in <xref ref-type="bibr" rid="bib1.bibx30" id="text.41"/>, using a standard sequence-to-value forecasting setup. Input sequences of 52 time steps (i.e., weeks) were used to predict the groundwater level at the following time step. Models were trained and validated on the periods 1991–2007 and 2008–2012, respectively, and evaluated on the final 10 years (2013–2022). All metrics were computed from the median prediction of an ensemble of ten independently initialized models.</p>
      <p id="d2e682">In this study, single-well models are implemented using a 1D convolutional neural network (CNN), whereas global models are implemented using a Long Short-Term Memory network (LSTM). Architecture selection was guided by preliminary baseline experiments and computational practicality, as the study aims to isolate the effects of training strategy rather than to benchmark architectures. For single-well training across thousands of wells, the CNN yielded comparable skill while being markedly faster and more stable (LSTM runs were more prone to optimization instabilities), whereas the LSTM provided a strong and widely used baseline for global multi-well sequence learning. To enable a controlled comparison across training strategies and partitioning stages, we adopted the benchmark architectures and hyperparameter settings of <xref ref-type="bibr" rid="bib1.bibx30" id="text.42"/> and kept them fixed throughout; robustness to stochasticity is addressed via an ensemble of ten initializations (median performance). All dynamic inputs and the target groundwater head series were standardized per well (<inline-formula><mml:math id="M17" display="inline"><mml:mi>z</mml:mi></mml:math></inline-formula>-score) using pre-test statistics (1991–2012) and back-transformed accordingly. In the spatial out-of-sample experiments, wells were withheld from weight training, but their scaling parameters were derived from their own pre-test head observations.</p>
      <p id="d2e695">The <italic>single-well models</italic> are based on a 1D convolutional neural network (CNN) architecture. Each model consists of a convolutional layer with 256 filters and kernel size 3, followed by max pooling, flattening, a dense layer with 32 units, and a final output layer. The models were trained using the Adam optimizer (learning rate 0.001), early stopping (patience 5), a batch size of 16, and a maximum of 30 epochs. Only dynamic input features were used.</p>
      <p id="d2e701">The <italic>global models</italic> are based on a Long Short-Term Memory (LSTM) architecture. The dynamic input branch consists of a single LSTM layer with 128 units, followed by a dropout layer with a dropout rate of 0.3. Models were trained for up to 20 epochs using a batch size of 512, early stopping (patience: 5), and a learning rate scheduler targeting a value of 0.001. Static features were incorporated using a second model branch that processes static inputs via a dense layer with 128 units. The outputs of both branches are concatenated and passed through a dense layer with 256 units before the final output layer. Categorical static features were label-encoded. For further architectural and implementation details, we refer to <xref ref-type="bibr" rid="bib1.bibx30" id="text.43"/>.</p>
      <p id="d2e711">All global models (G and G-P<inline-formula><mml:math id="M18" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>) were retrained independently using the same LSTM architecture and hyperparameters, ensuring architectural consistency across all partitioning stages. In contrast, the single-well models (S) were trained once per well on the P0 dataset and remained unchanged; for each partition, only the models corresponding to the retained wells were considered in the evaluation.</p>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Experimental Design</title>
      <p id="d2e729">The experimental design addresses the four research questions outlined in Sect. <xref ref-type="sec" rid="Ch1.S1.SS4"/>, each examining a distinct aspect of model performance and generalization. To systematically evaluate these aspects, we conducted four targeted experiments focusing on: <list list-type="bullet"><list-item>
      <p id="d2e736"><italic>(i) Overall Performance Comparison</italic>: We compared the predictive accuracy of global, local, and partitioned models across all monitoring wells (P0). This experiment serves as a baseline to assess overall model performance and consistency across dynamic groundwater regimes.</p></list-item><list-item>
      <p id="d2e742"><italic>(ii) Influence of the Training Data Basis</italic>: To evaluate how characteristics of the training dataset affect model performance, we conducted three complementary experiments. First, we assessed the sensitivity to <italic>training-record length</italic> by progressively shortening the available training period in 1-year steps while keeping the validation and test periods fixed (2008–2012 and 2013–2022). Second, we compared models trained on subsets with varying degrees of dynamic similarity, created by correlation-based or random well removal (see Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/>). Third, we analyzed the effect of progressive random training set size reduction, ranging from 2951 to 451 wells. This allows us to disentangle the effects of training set size and dynamic similarity on prediction accuracy and robustness.</p></list-item><list-item>
      <p id="d2e753"><italic>(iii) Prediction of Extreme Events</italic>: To assess performance under extrapolative conditions, we evaluated predictions for groundwater levels outside the typical range observed during training. For each well, low extremes were defined as test-period values below the 1st percentile of its training distribution, and high extremes as values above the 99th percentile. We refer to these as low/high extremes (extrapolation) rather than “groundwater drought” to avoid ambiguity in drought definitions.</p></list-item><list-item>
      <p id="d2e759"><italic>(iv) Out-of-Sample Spatial Prediction</italic>: To evaluate the spatial generalization capability of global models, we used the correlation- and random-based partitioning described in Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/>. For each stage (P1–P5<sub>COR</sub>  and P1–P5<sub>RND</sub>), the excluded wells, i.e., those removed from the training data, served as a spatial out-of-sample test set. This design allows direct assessment of predictive performance at previously unseen locations and isolates the impact of training data composition on generalization. Withheld wells were excluded from model training, but their pre-test head observations were used for well-specific scaling (i.e., transfer to unseen wells with historical records).</p></list-item></list></p>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Results</title>
      <p id="d2e794">The following subsections present the results of the four experiments outlined in Sect. <xref ref-type="sec" rid="Ch1.S3.SS3"/>, each addressing one of the research questions (RQ i–iv). We evaluate model accuracy and generalization behavior under varying training conditions and hydrological contexts.</p>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Overall Performance Comparison (RQ i)</title>
      <p id="d2e806">To assess whether globally trained LSTM models outperform local single-well (S-P0) models, we compare their predictive performance across 2951 monitoring wells. Both models were trained on the full dataset (P0).</p>
      <p id="d2e809">Figure <xref ref-type="fig" rid="F3"/>a compares global (G-P0) and single-well (S-P0) performance. Overall, differences are moderate: the median NSE is slightly higher for S-P0 (0.49 vs. 0.47; Table <xref ref-type="table" rid="T1"/>), and S-P0 attains more high-performing wells (upper tail). G-P0 shows a slightly narrower interquartile range, but also more very low NSE values, indicating an elevated risk of underperformance at some locations.</p>

<table-wrap id="T1" specific-use="star"><label>Table 1</label><caption><p id="d2e819">Overview of performance metrics for global (G) and corresponding single-well (S) models, including Nash–Sutcliffe efficiency (NSE), root mean square error (RMSE), coefficient of determination (<inline-formula><mml:math id="M21" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>), and bias. Values are reported as minimum, median, mean, and maximum across all wells for each model configuration.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="17">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right" colsep="1"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:colspec colnum="9" colname="col9" align="right" colsep="1"/>
     <oasis:colspec colnum="10" colname="col10" align="right"/>
     <oasis:colspec colnum="11" colname="col11" align="right"/>
     <oasis:colspec colnum="12" colname="col12" align="right"/>
     <oasis:colspec colnum="13" colname="col13" align="right" colsep="1"/>
     <oasis:colspec colnum="14" colname="col14" align="right"/>
     <oasis:colspec colnum="15" colname="col15" align="right"/>
     <oasis:colspec colnum="16" colname="col16" align="right"/>
     <oasis:colspec colnum="17" colname="col17" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Model</oasis:entry>
         <oasis:entry rowsep="1" namest="col2" nameend="col5" align="center" colsep="1">NSE </oasis:entry>
         <oasis:entry rowsep="1" namest="col6" nameend="col9" align="center" colsep="1">RMSE </oasis:entry>
         <oasis:entry rowsep="1" namest="col10" nameend="col13" align="center" colsep="1"><inline-formula><mml:math id="M22" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry rowsep="1" namest="col14" nameend="col17" align="center">Bias </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">min</oasis:entry>
         <oasis:entry colname="col3">med</oasis:entry>
         <oasis:entry colname="col4">mean</oasis:entry>
         <oasis:entry colname="col5">max</oasis:entry>
         <oasis:entry colname="col6">min</oasis:entry>
         <oasis:entry colname="col7">med</oasis:entry>
         <oasis:entry colname="col8">mean</oasis:entry>
         <oasis:entry colname="col9">max</oasis:entry>
         <oasis:entry colname="col10">min</oasis:entry>
         <oasis:entry colname="col11">med</oasis:entry>
         <oasis:entry colname="col12">mean</oasis:entry>
         <oasis:entry colname="col13">max</oasis:entry>
         <oasis:entry colname="col14">min</oasis:entry>
         <oasis:entry colname="col15">med</oasis:entry>
         <oasis:entry colname="col16">mean</oasis:entry>
         <oasis:entry colname="col17">max</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">G-P0</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M23" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1.16</oasis:entry>
         <oasis:entry colname="col3">0.53</oasis:entry>
         <oasis:entry colname="col4">0.47</oasis:entry>
         <oasis:entry colname="col5">0.91</oasis:entry>
         <oasis:entry colname="col6">0.02</oasis:entry>
         <oasis:entry colname="col7">0.25</oasis:entry>
         <oasis:entry colname="col8">0.37</oasis:entry>
         <oasis:entry colname="col9">8.90</oasis:entry>
         <oasis:entry colname="col10">0.00</oasis:entry>
         <oasis:entry colname="col11">0.62</oasis:entry>
         <oasis:entry colname="col12">0.57</oasis:entry>
         <oasis:entry colname="col13">0.93</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M24" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>7.89</oasis:entry>
         <oasis:entry colname="col15">0.02</oasis:entry>
         <oasis:entry colname="col16">0.04</oasis:entry>
         <oasis:entry colname="col17">6.33</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P1<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M26" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1.22</oasis:entry>
         <oasis:entry colname="col3">0.58</oasis:entry>
         <oasis:entry colname="col4">0.54</oasis:entry>
         <oasis:entry colname="col5">0.91</oasis:entry>
         <oasis:entry colname="col6">0.04</oasis:entry>
         <oasis:entry colname="col7">0.24</oasis:entry>
         <oasis:entry colname="col8">0.33</oasis:entry>
         <oasis:entry colname="col9">7.02</oasis:entry>
         <oasis:entry colname="col10">0.00</oasis:entry>
         <oasis:entry colname="col11">0.65</oasis:entry>
         <oasis:entry colname="col12">0.62</oasis:entry>
         <oasis:entry colname="col13">0.93</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M27" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>4.86</oasis:entry>
         <oasis:entry colname="col15">0.03</oasis:entry>
         <oasis:entry colname="col16">0.04</oasis:entry>
         <oasis:entry colname="col17">4.59</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P2<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M29" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.71</oasis:entry>
         <oasis:entry colname="col3">0.62</oasis:entry>
         <oasis:entry colname="col4">0.59</oasis:entry>
         <oasis:entry colname="col5">0.92</oasis:entry>
         <oasis:entry colname="col6">0.04</oasis:entry>
         <oasis:entry colname="col7">0.23</oasis:entry>
         <oasis:entry colname="col8">0.32</oasis:entry>
         <oasis:entry colname="col9">7.39</oasis:entry>
         <oasis:entry colname="col10">0.08</oasis:entry>
         <oasis:entry colname="col11">0.69</oasis:entry>
         <oasis:entry colname="col12">0.66</oasis:entry>
         <oasis:entry colname="col13">0.94</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M30" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1.59</oasis:entry>
         <oasis:entry colname="col15">0.02</oasis:entry>
         <oasis:entry colname="col16">0.04</oasis:entry>
         <oasis:entry colname="col17">4.36</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P3<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M32" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.51</oasis:entry>
         <oasis:entry colname="col3">0.66</oasis:entry>
         <oasis:entry colname="col4">0.63</oasis:entry>
         <oasis:entry colname="col5">0.93</oasis:entry>
         <oasis:entry colname="col6">0.04</oasis:entry>
         <oasis:entry colname="col7">0.22</oasis:entry>
         <oasis:entry colname="col8">0.30</oasis:entry>
         <oasis:entry colname="col9">7.35</oasis:entry>
         <oasis:entry colname="col10">0.08</oasis:entry>
         <oasis:entry colname="col11">0.71</oasis:entry>
         <oasis:entry colname="col12">0.69</oasis:entry>
         <oasis:entry colname="col13">0.94</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M33" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1.25</oasis:entry>
         <oasis:entry colname="col15">0.03</oasis:entry>
         <oasis:entry colname="col16">0.05</oasis:entry>
         <oasis:entry colname="col17">3.54</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P4<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M35" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.51</oasis:entry>
         <oasis:entry colname="col3">0.68</oasis:entry>
         <oasis:entry colname="col4">0.65</oasis:entry>
         <oasis:entry colname="col5">0.92</oasis:entry>
         <oasis:entry colname="col6">0.05</oasis:entry>
         <oasis:entry colname="col7">0.22</oasis:entry>
         <oasis:entry colname="col8">0.29</oasis:entry>
         <oasis:entry colname="col9">7.23</oasis:entry>
         <oasis:entry colname="col10">0.13</oasis:entry>
         <oasis:entry colname="col11">0.73</oasis:entry>
         <oasis:entry colname="col12">0.71</oasis:entry>
         <oasis:entry colname="col13">0.94</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M36" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.64</oasis:entry>
         <oasis:entry colname="col15">0.03</oasis:entry>
         <oasis:entry colname="col16">0.04</oasis:entry>
         <oasis:entry colname="col17">3.52</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">G-P5<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col2">0.20</oasis:entry>
         <oasis:entry colname="col3">0.72</oasis:entry>
         <oasis:entry colname="col4">0.70</oasis:entry>
         <oasis:entry colname="col5">0.94</oasis:entry>
         <oasis:entry colname="col6">0.05</oasis:entry>
         <oasis:entry colname="col7">0.20</oasis:entry>
         <oasis:entry colname="col8">0.26</oasis:entry>
         <oasis:entry colname="col9">5.05</oasis:entry>
         <oasis:entry colname="col10">0.36</oasis:entry>
         <oasis:entry colname="col11">0.77</oasis:entry>
         <oasis:entry colname="col12">0.75</oasis:entry>
         <oasis:entry colname="col13">0.96</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M38" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.50</oasis:entry>
         <oasis:entry colname="col15">0.03</oasis:entry>
         <oasis:entry colname="col16">0.05</oasis:entry>
         <oasis:entry colname="col17">3.58</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P1<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M40" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1.27</oasis:entry>
         <oasis:entry colname="col3">0.54</oasis:entry>
         <oasis:entry colname="col4">0.47</oasis:entry>
         <oasis:entry colname="col5">0.90</oasis:entry>
         <oasis:entry colname="col6">0.02</oasis:entry>
         <oasis:entry colname="col7">0.25</oasis:entry>
         <oasis:entry colname="col8">0.38</oasis:entry>
         <oasis:entry colname="col9">9.01</oasis:entry>
         <oasis:entry colname="col10">0.00</oasis:entry>
         <oasis:entry colname="col11">0.62</oasis:entry>
         <oasis:entry colname="col12">0.57</oasis:entry>
         <oasis:entry colname="col13">0.93</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M41" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>7.92</oasis:entry>
         <oasis:entry colname="col15">0.02</oasis:entry>
         <oasis:entry colname="col16">0.05</oasis:entry>
         <oasis:entry colname="col17">6.53</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P2<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M43" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1.08</oasis:entry>
         <oasis:entry colname="col3">0.54</oasis:entry>
         <oasis:entry colname="col4">0.48</oasis:entry>
         <oasis:entry colname="col5">0.91</oasis:entry>
         <oasis:entry colname="col6">0.02</oasis:entry>
         <oasis:entry colname="col7">0.25</oasis:entry>
         <oasis:entry colname="col8">0.38</oasis:entry>
         <oasis:entry colname="col9">7.85</oasis:entry>
         <oasis:entry colname="col10">0.00</oasis:entry>
         <oasis:entry colname="col11">0.63</oasis:entry>
         <oasis:entry colname="col12">0.58</oasis:entry>
         <oasis:entry colname="col13">0.93</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M44" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>7.80</oasis:entry>
         <oasis:entry colname="col15">0.03</oasis:entry>
         <oasis:entry colname="col16">0.06</oasis:entry>
         <oasis:entry colname="col17">6.43</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P3<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M46" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.96</oasis:entry>
         <oasis:entry colname="col3">0.55</oasis:entry>
         <oasis:entry colname="col4">0.48</oasis:entry>
         <oasis:entry colname="col5">0.89</oasis:entry>
         <oasis:entry colname="col6">0.02</oasis:entry>
         <oasis:entry colname="col7">0.25</oasis:entry>
         <oasis:entry colname="col8">0.37</oasis:entry>
         <oasis:entry colname="col9">7.69</oasis:entry>
         <oasis:entry colname="col10">0.00</oasis:entry>
         <oasis:entry colname="col11">0.63</oasis:entry>
         <oasis:entry colname="col12">0.58</oasis:entry>
         <oasis:entry colname="col13">0.94</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M47" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>5.62</oasis:entry>
         <oasis:entry colname="col15">0.03</oasis:entry>
         <oasis:entry colname="col16">0.06</oasis:entry>
         <oasis:entry colname="col17">6.25</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P4<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M49" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1.07</oasis:entry>
         <oasis:entry colname="col3">0.55</oasis:entry>
         <oasis:entry colname="col4">0.49</oasis:entry>
         <oasis:entry colname="col5">0.88</oasis:entry>
         <oasis:entry colname="col6">0.02</oasis:entry>
         <oasis:entry colname="col7">0.24</oasis:entry>
         <oasis:entry colname="col8">0.36</oasis:entry>
         <oasis:entry colname="col9">8.00</oasis:entry>
         <oasis:entry colname="col10">0.00</oasis:entry>
         <oasis:entry colname="col11">0.63</oasis:entry>
         <oasis:entry colname="col12">0.58</oasis:entry>
         <oasis:entry colname="col13">0.92</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M50" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>5.26</oasis:entry>
         <oasis:entry colname="col15">0.03</oasis:entry>
         <oasis:entry colname="col16">0.07</oasis:entry>
         <oasis:entry colname="col17">6.59</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">G-P5<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M52" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1.10</oasis:entry>
         <oasis:entry colname="col3">0.57</oasis:entry>
         <oasis:entry colname="col4">0.50</oasis:entry>
         <oasis:entry colname="col5">0.90</oasis:entry>
         <oasis:entry colname="col6">0.04</oasis:entry>
         <oasis:entry colname="col7">0.24</oasis:entry>
         <oasis:entry colname="col8">0.34</oasis:entry>
         <oasis:entry colname="col9">5.83</oasis:entry>
         <oasis:entry colname="col10">0.00</oasis:entry>
         <oasis:entry colname="col11">0.62</oasis:entry>
         <oasis:entry colname="col12">0.58</oasis:entry>
         <oasis:entry colname="col13">0.93</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M53" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>5.30</oasis:entry>
         <oasis:entry colname="col15">0.03</oasis:entry>
         <oasis:entry colname="col16">0.05</oasis:entry>
         <oasis:entry colname="col17">5.40</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">S-P0</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M54" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1.21</oasis:entry>
         <oasis:entry colname="col3">0.55</oasis:entry>
         <oasis:entry colname="col4">0.49</oasis:entry>
         <oasis:entry colname="col5">0.94</oasis:entry>
         <oasis:entry colname="col6">0.02</oasis:entry>
         <oasis:entry colname="col7">0.25</oasis:entry>
         <oasis:entry colname="col8">0.36</oasis:entry>
         <oasis:entry colname="col9">8.06</oasis:entry>
         <oasis:entry colname="col10">0.00</oasis:entry>
         <oasis:entry colname="col11">0.59</oasis:entry>
         <oasis:entry colname="col12">0.54</oasis:entry>
         <oasis:entry colname="col13">0.93</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M55" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>5.78</oasis:entry>
         <oasis:entry colname="col15">0.03</oasis:entry>
         <oasis:entry colname="col16">0.07</oasis:entry>
         <oasis:entry colname="col17">6.34</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">S-P1<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M57" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.53</oasis:entry>
         <oasis:entry colname="col3">0.59</oasis:entry>
         <oasis:entry colname="col4">0.54</oasis:entry>
         <oasis:entry colname="col5">0.94</oasis:entry>
         <oasis:entry colname="col6">0.04</oasis:entry>
         <oasis:entry colname="col7">0.24</oasis:entry>
         <oasis:entry colname="col8">0.33</oasis:entry>
         <oasis:entry colname="col9">8.06</oasis:entry>
         <oasis:entry colname="col10">0.00</oasis:entry>
         <oasis:entry colname="col11">0.63</oasis:entry>
         <oasis:entry colname="col12">0.59</oasis:entry>
         <oasis:entry colname="col13">0.93</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M58" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>2.25</oasis:entry>
         <oasis:entry colname="col15">0.03</oasis:entry>
         <oasis:entry colname="col16">0.06</oasis:entry>
         <oasis:entry colname="col17">5.63</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">S-P2<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M60" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.53</oasis:entry>
         <oasis:entry colname="col3">0.62</oasis:entry>
         <oasis:entry colname="col4">0.57</oasis:entry>
         <oasis:entry colname="col5">0.94</oasis:entry>
         <oasis:entry colname="col6">0.04</oasis:entry>
         <oasis:entry colname="col7">0.23</oasis:entry>
         <oasis:entry colname="col8">0.33</oasis:entry>
         <oasis:entry colname="col9">8.06</oasis:entry>
         <oasis:entry colname="col10">0.00</oasis:entry>
         <oasis:entry colname="col11">0.66</oasis:entry>
         <oasis:entry colname="col12">0.62</oasis:entry>
         <oasis:entry colname="col13">0.93</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M61" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>2.25</oasis:entry>
         <oasis:entry colname="col15">0.03</oasis:entry>
         <oasis:entry colname="col16">0.06</oasis:entry>
         <oasis:entry colname="col17">5.63</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">S-P3<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M63" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.41</oasis:entry>
         <oasis:entry colname="col3">0.64</oasis:entry>
         <oasis:entry colname="col4">0.60</oasis:entry>
         <oasis:entry colname="col5">0.94</oasis:entry>
         <oasis:entry colname="col6">0.04</oasis:entry>
         <oasis:entry colname="col7">0.23</oasis:entry>
         <oasis:entry colname="col8">0.31</oasis:entry>
         <oasis:entry colname="col9">8.06</oasis:entry>
         <oasis:entry colname="col10">0.01</oasis:entry>
         <oasis:entry colname="col11">0.67</oasis:entry>
         <oasis:entry colname="col12">0.64</oasis:entry>
         <oasis:entry colname="col13">0.93</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M64" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1.13</oasis:entry>
         <oasis:entry colname="col15">0.04</oasis:entry>
         <oasis:entry colname="col16">0.06</oasis:entry>
         <oasis:entry colname="col17">5.63</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">S-P4<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M66" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.41</oasis:entry>
         <oasis:entry colname="col3">0.65</oasis:entry>
         <oasis:entry colname="col4">0.62</oasis:entry>
         <oasis:entry colname="col5">0.94</oasis:entry>
         <oasis:entry colname="col6">0.06</oasis:entry>
         <oasis:entry colname="col7">0.22</oasis:entry>
         <oasis:entry colname="col8">0.31</oasis:entry>
         <oasis:entry colname="col9">8.06</oasis:entry>
         <oasis:entry colname="col10">0.13</oasis:entry>
         <oasis:entry colname="col11">0.68</oasis:entry>
         <oasis:entry colname="col12">0.66</oasis:entry>
         <oasis:entry colname="col13">0.93</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M67" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.46</oasis:entry>
         <oasis:entry colname="col15">0.04</oasis:entry>
         <oasis:entry colname="col16">0.06</oasis:entry>
         <oasis:entry colname="col17">5.63</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">S-P5<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M69" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.27</oasis:entry>
         <oasis:entry colname="col3">0.67</oasis:entry>
         <oasis:entry colname="col4">0.64</oasis:entry>
         <oasis:entry colname="col5">0.89</oasis:entry>
         <oasis:entry colname="col6">0.07</oasis:entry>
         <oasis:entry colname="col7">0.21</oasis:entry>
         <oasis:entry colname="col8">0.29</oasis:entry>
         <oasis:entry colname="col9">7.16</oasis:entry>
         <oasis:entry colname="col10">0.24</oasis:entry>
         <oasis:entry colname="col11">0.69</oasis:entry>
         <oasis:entry colname="col12">0.67</oasis:entry>
         <oasis:entry colname="col13">0.92</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M70" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.46</oasis:entry>
         <oasis:entry colname="col15">0.05</oasis:entry>
         <oasis:entry colname="col16">0.07</oasis:entry>
         <oasis:entry colname="col17">5.63</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">S-P1<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M72" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1.21</oasis:entry>
         <oasis:entry colname="col3">0.55</oasis:entry>
         <oasis:entry colname="col4">0.49</oasis:entry>
         <oasis:entry colname="col5">0.94</oasis:entry>
         <oasis:entry colname="col6">0.02</oasis:entry>
         <oasis:entry colname="col7">0.25</oasis:entry>
         <oasis:entry colname="col8">0.37</oasis:entry>
         <oasis:entry colname="col9">8.06</oasis:entry>
         <oasis:entry colname="col10">0.00</oasis:entry>
         <oasis:entry colname="col11">0.59</oasis:entry>
         <oasis:entry colname="col12">0.54</oasis:entry>
         <oasis:entry colname="col13">0.93</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M73" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>5.78</oasis:entry>
         <oasis:entry colname="col15">0.03</oasis:entry>
         <oasis:entry colname="col16">0.07</oasis:entry>
         <oasis:entry colname="col17">6.34</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">S-P2<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M75" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>1.21</oasis:entry>
         <oasis:entry colname="col3">0.55</oasis:entry>
         <oasis:entry colname="col4">0.48</oasis:entry>
         <oasis:entry colname="col5">0.94</oasis:entry>
         <oasis:entry colname="col6">0.02</oasis:entry>
         <oasis:entry colname="col7">0.25</oasis:entry>
         <oasis:entry colname="col8">0.37</oasis:entry>
         <oasis:entry colname="col9">8.06</oasis:entry>
         <oasis:entry colname="col10">0.00</oasis:entry>
         <oasis:entry colname="col11">0.59</oasis:entry>
         <oasis:entry colname="col12">0.54</oasis:entry>
         <oasis:entry colname="col13">0.92</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M76" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>5.78</oasis:entry>
         <oasis:entry colname="col15">0.03</oasis:entry>
         <oasis:entry colname="col16">0.07</oasis:entry>
         <oasis:entry colname="col17">6.34</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">S-P3<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M78" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.83</oasis:entry>
         <oasis:entry colname="col3">0.55</oasis:entry>
         <oasis:entry colname="col4">0.48</oasis:entry>
         <oasis:entry colname="col5">0.94</oasis:entry>
         <oasis:entry colname="col6">0.02</oasis:entry>
         <oasis:entry colname="col7">0.25</oasis:entry>
         <oasis:entry colname="col8">0.36</oasis:entry>
         <oasis:entry colname="col9">8.06</oasis:entry>
         <oasis:entry colname="col10">0.00</oasis:entry>
         <oasis:entry colname="col11">0.59</oasis:entry>
         <oasis:entry colname="col12">0.54</oasis:entry>
         <oasis:entry colname="col13">0.92</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M79" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>4.09</oasis:entry>
         <oasis:entry colname="col15">0.03</oasis:entry>
         <oasis:entry colname="col16">0.07</oasis:entry>
         <oasis:entry colname="col17">6.34</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">S-P4<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M81" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.83</oasis:entry>
         <oasis:entry colname="col3">0.54</oasis:entry>
         <oasis:entry colname="col4">0.48</oasis:entry>
         <oasis:entry colname="col5">0.91</oasis:entry>
         <oasis:entry colname="col6">0.02</oasis:entry>
         <oasis:entry colname="col7">0.25</oasis:entry>
         <oasis:entry colname="col8">0.36</oasis:entry>
         <oasis:entry colname="col9">8.06</oasis:entry>
         <oasis:entry colname="col10">0.00</oasis:entry>
         <oasis:entry colname="col11">0.58</oasis:entry>
         <oasis:entry colname="col12">0.54</oasis:entry>
         <oasis:entry colname="col13">0.92</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M82" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>3.68</oasis:entry>
         <oasis:entry colname="col15">0.04</oasis:entry>
         <oasis:entry colname="col16">0.07</oasis:entry>
         <oasis:entry colname="col17">6.34</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">S-P5<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M84" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.83</oasis:entry>
         <oasis:entry colname="col3">0.53</oasis:entry>
         <oasis:entry colname="col4">0.48</oasis:entry>
         <oasis:entry colname="col5">0.90</oasis:entry>
         <oasis:entry colname="col6">0.04</oasis:entry>
         <oasis:entry colname="col7">0.25</oasis:entry>
         <oasis:entry colname="col8">0.34</oasis:entry>
         <oasis:entry colname="col9">5.99</oasis:entry>
         <oasis:entry colname="col10">0.00</oasis:entry>
         <oasis:entry colname="col11">0.57</oasis:entry>
         <oasis:entry colname="col12">0.53</oasis:entry>
         <oasis:entry colname="col13">0.92</oasis:entry>
         <oasis:entry colname="col14"><inline-formula><mml:math id="M85" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>3.68</oasis:entry>
         <oasis:entry colname="col15">0.04</oasis:entry>
         <oasis:entry colname="col16">0.06</oasis:entry>
         <oasis:entry colname="col17">4.97</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e2617">The pairwise comparison in Table <xref ref-type="table" rid="T2"/> confirms these trade-offs: G-P0 outperforms S-P0 at 45.4 % of wells, underperforms at 48.9 %, and performs equally (within <inline-formula><mml:math id="M86" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.01</mml:mn></mml:mrow></mml:math></inline-formula> NSE) at 5.7 %. Thus, both model types perform broadly comparably, reflecting the balance between generalization and local adaptation.</p>

<table-wrap id="T2" specific-use="star"><label>Table 2</label><caption><p id="d2e2635">Summary of model performance across correlation- and randomly reduced training subsets. Left columns show NSE-based performance groups for global models (G), middle columns the corresponding results for local single-well models (S) trained on the same well subsets. Right columns report the share of wells for which global models perform better, worse, or equally (<inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.01</mml:mn></mml:mrow></mml:math></inline-formula> NSE) compared to their local counterparts.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="13">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right" colsep="1"/>
     <oasis:colspec colnum="6" colname="col6" align="left"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:colspec colnum="9" colname="col9" align="right"/>
     <oasis:colspec colnum="10" colname="col10" align="right" colsep="1"/>
     <oasis:colspec colnum="11" colname="col11" align="right"/>
     <oasis:colspec colnum="12" colname="col12" align="right"/>
     <oasis:colspec colnum="13" colname="col13" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">G-Model</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M88" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.65</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M90" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.75</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M91" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.85</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col6">S-Model</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M92" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M93" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.65</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col9"><inline-formula><mml:math id="M94" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.75</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col10"><inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.85</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col11">G <inline-formula><mml:math id="M96" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col12">S <inline-formula><mml:math id="M97" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col13">Equal</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">G-P0</oasis:entry>
         <oasis:entry colname="col2">5.5</oasis:entry>
         <oasis:entry colname="col3">27.4</oasis:entry>
         <oasis:entry colname="col4">9.8</oasis:entry>
         <oasis:entry colname="col5">1.3</oasis:entry>
         <oasis:entry colname="col6">S-P0</oasis:entry>
         <oasis:entry colname="col7">5.0</oasis:entry>
         <oasis:entry colname="col8">32.2</oasis:entry>
         <oasis:entry colname="col9">13.1</oasis:entry>
         <oasis:entry colname="col10">2.5</oasis:entry>
         <oasis:entry colname="col11">45.4</oasis:entry>
         <oasis:entry colname="col12">48.9</oasis:entry>
         <oasis:entry colname="col13">5.7</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P1<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col2">2.1</oasis:entry>
         <oasis:entry colname="col3">34.4</oasis:entry>
         <oasis:entry colname="col4">11.4</oasis:entry>
         <oasis:entry colname="col5">1.3</oasis:entry>
         <oasis:entry colname="col6">S-P1<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col7">2.8</oasis:entry>
         <oasis:entry colname="col8">37.0</oasis:entry>
         <oasis:entry colname="col9">15.3</oasis:entry>
         <oasis:entry colname="col10">2.9</oasis:entry>
         <oasis:entry colname="col11">47.2</oasis:entry>
         <oasis:entry colname="col12">45.9</oasis:entry>
         <oasis:entry colname="col13">6.9</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P2<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col2">0.9</oasis:entry>
         <oasis:entry colname="col3">44.2</oasis:entry>
         <oasis:entry colname="col4">17.9</oasis:entry>
         <oasis:entry colname="col5">2.1</oasis:entry>
         <oasis:entry colname="col6">S-P2<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col7">2.0</oasis:entry>
         <oasis:entry colname="col8">42.1</oasis:entry>
         <oasis:entry colname="col9">17.6</oasis:entry>
         <oasis:entry colname="col10">3.4</oasis:entry>
         <oasis:entry colname="col11">50.0</oasis:entry>
         <oasis:entry colname="col12">43.4</oasis:entry>
         <oasis:entry colname="col13">6.6</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P3<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col2">0.3</oasis:entry>
         <oasis:entry colname="col3">52.2</oasis:entry>
         <oasis:entry colname="col4">24.2</oasis:entry>
         <oasis:entry colname="col5">3.0</oasis:entry>
         <oasis:entry colname="col6">S-P3<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col7">1.3</oasis:entry>
         <oasis:entry colname="col8">46.8</oasis:entry>
         <oasis:entry colname="col9">19.7</oasis:entry>
         <oasis:entry colname="col10">3.9</oasis:entry>
         <oasis:entry colname="col11">53.5</oasis:entry>
         <oasis:entry colname="col12">38.9</oasis:entry>
         <oasis:entry colname="col13">7.6</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P4<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col2">0.1</oasis:entry>
         <oasis:entry colname="col3">58.0</oasis:entry>
         <oasis:entry colname="col4">27.5</oasis:entry>
         <oasis:entry colname="col5">2.7</oasis:entry>
         <oasis:entry colname="col6">S-P4<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col7">0.9</oasis:entry>
         <oasis:entry colname="col8">51.5</oasis:entry>
         <oasis:entry colname="col9">20.5</oasis:entry>
         <oasis:entry colname="col10">3.4</oasis:entry>
         <oasis:entry colname="col11">56.5</oasis:entry>
         <oasis:entry colname="col12">36.9</oasis:entry>
         <oasis:entry colname="col13">6.6</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">G-P5<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col2">0.0</oasis:entry>
         <oasis:entry colname="col3">71.8</oasis:entry>
         <oasis:entry colname="col4">36.6</oasis:entry>
         <oasis:entry colname="col5">6.9</oasis:entry>
         <oasis:entry colname="col6">S-P5<sub>COR</sub></oasis:entry>
         <oasis:entry colname="col7">0.7</oasis:entry>
         <oasis:entry colname="col8">57.6</oasis:entry>
         <oasis:entry colname="col9">21.1</oasis:entry>
         <oasis:entry colname="col10">3.8</oasis:entry>
         <oasis:entry colname="col11">67.0</oasis:entry>
         <oasis:entry colname="col12">27.3</oasis:entry>
         <oasis:entry colname="col13">5.8</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P1<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col2">5.3</oasis:entry>
         <oasis:entry colname="col3">26.5</oasis:entry>
         <oasis:entry colname="col4">8.9</oasis:entry>
         <oasis:entry colname="col5">1.0</oasis:entry>
         <oasis:entry colname="col6">S-P1<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col7">5.0</oasis:entry>
         <oasis:entry colname="col8">32.1</oasis:entry>
         <oasis:entry colname="col9">12.9</oasis:entry>
         <oasis:entry colname="col10">2.4</oasis:entry>
         <oasis:entry colname="col11">44.4</oasis:entry>
         <oasis:entry colname="col12">50.1</oasis:entry>
         <oasis:entry colname="col13">5.4</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P2<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col2">5.1</oasis:entry>
         <oasis:entry colname="col3">29.0</oasis:entry>
         <oasis:entry colname="col4">11.2</oasis:entry>
         <oasis:entry colname="col5">1.6</oasis:entry>
         <oasis:entry colname="col6">S-P2<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col7">4.9</oasis:entry>
         <oasis:entry colname="col8">31.7</oasis:entry>
         <oasis:entry colname="col9">12.5</oasis:entry>
         <oasis:entry colname="col10">2.4</oasis:entry>
         <oasis:entry colname="col11">48.8</oasis:entry>
         <oasis:entry colname="col12">45.4</oasis:entry>
         <oasis:entry colname="col13">5.8</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P3<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col2">5.1</oasis:entry>
         <oasis:entry colname="col3">30.2</oasis:entry>
         <oasis:entry colname="col4">11.2</oasis:entry>
         <oasis:entry colname="col5">1.4</oasis:entry>
         <oasis:entry colname="col6">S-P3<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col7">4.9</oasis:entry>
         <oasis:entry colname="col8">31.2</oasis:entry>
         <oasis:entry colname="col9">12.9</oasis:entry>
         <oasis:entry colname="col10">2.5</oasis:entry>
         <oasis:entry colname="col11">46.9</oasis:entry>
         <oasis:entry colname="col12">46.3</oasis:entry>
         <oasis:entry colname="col13">6.8</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P4<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col2">4.5</oasis:entry>
         <oasis:entry colname="col3">31.3</oasis:entry>
         <oasis:entry colname="col4">11.5</oasis:entry>
         <oasis:entry colname="col5">1.2</oasis:entry>
         <oasis:entry colname="col6">S-P4<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col7">5.0</oasis:entry>
         <oasis:entry colname="col8">30.6</oasis:entry>
         <oasis:entry colname="col9">12.5</oasis:entry>
         <oasis:entry colname="col10">2.2</oasis:entry>
         <oasis:entry colname="col11">48.9</oasis:entry>
         <oasis:entry colname="col12">44.2</oasis:entry>
         <oasis:entry colname="col13">6.9</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">G-P5<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col2">5.1</oasis:entry>
         <oasis:entry colname="col3">32.8</oasis:entry>
         <oasis:entry colname="col4">14.9</oasis:entry>
         <oasis:entry colname="col5">1.6</oasis:entry>
         <oasis:entry colname="col6">S-P5<sub>RND</sub></oasis:entry>
         <oasis:entry colname="col7">4.2</oasis:entry>
         <oasis:entry colname="col8">29.3</oasis:entry>
         <oasis:entry colname="col9">12.2</oasis:entry>
         <oasis:entry colname="col10">2.2</oasis:entry>
         <oasis:entry colname="col11">53.2</oasis:entry>
         <oasis:entry colname="col12">39.5</oasis:entry>
         <oasis:entry colname="col13">7.3</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e3452">The cumulative NSE curves (Fig. <xref ref-type="fig" rid="F3"/>a, lower) show that performance differences vary across the NSE spectrum: S-P0 is slightly better at the very low end (NSE <inline-formula><mml:math id="M118" display="inline"><mml:mo>&lt;</mml:mo></mml:math></inline-formula> 0.05), G-P0 performs better from 0.05 to 0.325, both are similar between 0.325 and 0.425, and S-P0 is more favorable in the mid-to-high NSE range.</p>
      <p id="d2e3464">To test whether these local differences are spatially structured, Fig. <xref ref-type="fig" rid="F1"/> maps per-well <inline-formula><mml:math id="M119" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi mathvariant="normal">NSE</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="normal">NSE</mml:mi><mml:mi mathvariant="normal">G</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="normal">NSE</mml:mi><mml:mi mathvariant="normal">S</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and summarizes local spatial association using a LISA analysis. Although <inline-formula><mml:math id="M120" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi mathvariant="normal">NSE</mml:mi></mml:mrow></mml:math></inline-formula> is close to zero on average (P0: mean <inline-formula><mml:math id="M121" display="inline"><mml:mrow><mml:mo>=</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.012</mml:mn></mml:mrow></mml:math></inline-formula>, median <inline-formula><mml:math id="M122" display="inline"><mml:mrow><mml:mo>=</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.006</mml:mn></mml:mrow></mml:math></inline-formula>, share of wells with <inline-formula><mml:math id="M123" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> is 48.7  %), it exhibits significant positive spatial autocorrelation (Global Moran's <inline-formula><mml:math id="M124" display="inline"><mml:mrow><mml:mi>I</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.322</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M125" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.001</mml:mn></mml:mrow></mml:math></inline-formula>). The LISA results indicate localized clusters of consistently positive or negative <inline-formula><mml:math id="M126" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi mathvariant="normal">NSE</mml:mi></mml:mrow></mml:math></inline-formula> as well as spatial contrasts (19.7 % significant at <inline-formula><mml:math id="M127" display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.05</mml:mn></mml:mrow></mml:math></inline-formula>), with the most pronounced clustering in hydrogeological region (3) (Upper Rhine Graben including the Mainz Basin and the North Hessian Tertiary), where dense monitoring facilitates the detection of significant local patterns and the patchwork of river-influenced and more regionally coherent dynamics likely contributes to spatially organized model advantages. Overall, this suggests that performance differences are spatially organized in parts of the domain, while remaining non-significant for the majority of wells.</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e3590"><italic>Spatial patterns of per-well performance differences</italic>. Spatial patterns of per-well performance differences expressed as <inline-formula><mml:math id="M128" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi mathvariant="normal">NSE</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="normal">NSE</mml:mi><mml:mi mathvariant="normal">G</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="normal">NSE</mml:mi><mml:mi mathvariant="normal">S</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. <bold>(a)</bold> Pointwise map of <inline-formula><mml:math id="M129" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi mathvariant="normal">NSE</mml:mi></mml:mrow></mml:math></inline-formula> across all wells (positive: global model performs better; negative: single-well model performs better). <bold>(b)</bold> Local Indicators of Spatial Association (LISA) map of <inline-formula><mml:math id="M130" display="inline"><mml:mrow><mml:mi mathvariant="normal">Δ</mml:mi><mml:mi mathvariant="normal">NSE</mml:mi></mml:mrow></mml:math></inline-formula> based on a <inline-formula><mml:math id="M131" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-nearest-neighbor graph (<inline-formula><mml:math id="M132" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">8</mml:mn></mml:mrow></mml:math></inline-formula>, row-standardized weights; <inline-formula><mml:math id="M133" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.05</mml:mn></mml:mrow></mml:math></inline-formula>), highlighting significant High–High (HH), Low–Low (LL), High–Low (HL), and Low–High (LH) clusters; non-significant locations are marked as ns.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/2373/2026/hess-30-2373-2026-f01.png"/>

        </fig>

</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Influence of the Training Data Basis (RQ ii)</title>
      <p id="d2e3691">To assess how training data characteristics affect global model performance, we analyze three complementary experiments: (i) sensitivity to <italic>training-record length</italic> by progressively shortening the available training period in annual steps while keeping the validation and test periods fixed (2008–2012 and 2013–2022), (ii) increasing dynamic similarity through correlation-based well removal, and (iii) reducing training data volume (while maintaining diverse dynamics) through random subsampling.</p>
<sec id="Ch1.S4.SS2.SSS1">
  <label>4.2.1</label><title>Training-record length</title>
      <p id="d2e3704">We additionally tested how sensitive the model comparison is to the length of the available training record. Starting from 1991–2007, we progressively shortened the training period in 1-year steps (removing the earliest years first), while keeping validation (2008–2012) and test (2013–2022) fixed.</p>
      <p id="d2e3707">Figure <xref ref-type="fig" rid="F2"/> summarizes NSE as a function of training-record length for single-well (S) and global (G) models. In (a), median NSE and the 5th–95th percentile band are shown. For long records (approximately 10–17 years), both models exhibit similar medians with strongly overlapping bands. With decreasing training length, S shows a clear drop in median performance and a pronounced widening of the distribution, including negative NSE in the lower tail at short records. In contrast, G remains comparatively stable in median NSE and shows only a modest increase in spread. (b) confirms these patterns in the corresponding density distributions: S progressively broadens and shifts toward lower NSE with shorter records, whereas G stays more concentrated and retains more mass at higher NSE values.</p>

      <fig id="F2"><label>Figure 2</label><caption><p id="d2e3714"><italic>Sensitivity to training-record length</italic>. Test-period NSE distributions for single-well (S) and global (G) models as a function of the available training-record length, obtained by progressively truncating the earliest training years in 1-year steps while keeping validation (2008–2012) and test (2013–2022) fixed. <bold>(a)</bold> Median NSE (solid line) and the 5th–95th percentile range across wells (shaded; dashed bounds); the vertical line marks NSE <inline-formula><mml:math id="M134" display="inline"><mml:mrow><mml:mo>=</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula>. <bold>(b)</bold> Corresponding NSE distributions for each training length shown as overlapping ridgeline density estimates (per-length normalized for visual comparability).</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/2373/2026/hess-30-2373-2026-f02.png"/>

          </fig>

</sec>
<sec id="Ch1.S4.SS2.SSS2">
  <label>4.2.2</label><title>Dynamic Similarity</title>
      <p id="d2e3750">To investigate how increasing the internal consistency of the training data affects global model performance, we compare the baseline model G-P0 to a series of partitioned models (G-P1<sub>COR</sub> to G-P5<sub>COR</sub>) trained on increasingly homogeneous subsets. In each step, 500 wells with the lowest average correlation to all other time series were removed to create dynamically more similar training sets.</p>
      <p id="d2e3771">Figure <xref ref-type="fig" rid="F3"/>a and Table <xref ref-type="table" rid="T1"/> show that model skill improves consistently with increasing similarity. The share of poorly performing wells (NSE <inline-formula><mml:math id="M137" display="inline"><mml:mo>&lt;</mml:mo></mml:math></inline-formula> 0) decreases from 5.5 % (G-P0) to 0.0 % (G-P5<sub>COR</sub>), while the proportion of highly accurate wells (NSE <inline-formula><mml:math id="M139" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 0.75) increases from 9.8 % to 36.6 %. The mean NSE rises progressively from 0.47 (G-P0) to 0.54 (G-P1<sub>COR</sub>), 0.59 (G-P2<sub>COR</sub>), 0.63 (G-P3<sub>COR</sub>), 0.65 (G-P4<sub>COR</sub>), and 0.70 (G-P5<sub>COR</sub>). The median NSE shows a similar trend, increasing from 0.53 to 0.58, 0.62, 0.66, 0.68, and finally 0.72.</p>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e3849"><italic>Comparison of single-well and global model performance across generalization stages</italic>. <bold>(a, b)</bold> Distributions of NSE scores (<italic>top</italic>) and cumulative distribution functions (<italic>bottom</italic>) for single-well (S) and global models (G-P0–G-P5), based on either correlation-based (<bold>a</bold>, G-P<sub>COR</sub>) or random (<bold>b</bold>, G-P<sub>RND</sub>) well selection. Stages are cumulative subsets of P0 obtained by removing <inline-formula><mml:math id="M147" display="inline"><mml:mrow><mml:mi>x</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">500</mml:mn></mml:mrow></mml:math></inline-formula> wells (<inline-formula><mml:math id="M148" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">2951</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2451</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1951</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1451</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">951</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">451</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula> for P0–P5). Each global model is compared to S models evaluated on the same subset, illustrating shifts in performance distributions with increasing training data homogeneity <bold>(a)</bold> or quantity <bold>(b)</bold>.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/2373/2026/hess-30-2373-2026-f03.png"/>

          </fig>

      <p id="d2e3950">Compared to the corresponding single-well models, global models benefit more strongly from this increased similarity. At stage P5<sub>COR</sub>, the median NSE of the global model (0.72) exceeds that of the local model (0.67), and the proportion of wells with NSE <inline-formula><mml:math id="M150" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 0.75 is nearly twice as high (36.6 % vs. 21.1 %) (Table <xref ref-type="table" rid="T1"/>). Moreover, the share of wells where the global model outperforms its single-well counterpart increases from 45.4 % (G-P0) to 67.0 % (G-P5<sub>COR</sub>) (Table <xref ref-type="table" rid="T2"/>).</p>
      <p id="d2e3982">This trend is further illustrated in Fig. <xref ref-type="fig" rid="F4"/>a, which plots global versus local NSE scores across wells for each partition stage. While G-P0 shows many points below the <inline-formula><mml:math id="M152" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> line, later stages exhibit a progressive upward shift toward and beyond the diagonal. This indicates that, as training sets become more homogeneous, global models increasingly match or exceed local model performance at individual wells. The point cloud also narrows at higher stages (e.g., P4, P5), reflecting more stable and consistent predictions across sites.</p>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e4001"><italic>Comparison of global and single-well model performance at the well level</italic>. Panels <bold>(a)</bold> and <bold>(b)</bold> show NSE values of global models (G-P<inline-formula><mml:math id="M153" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>) plotted against their corresponding single-well models (S-P<inline-formula><mml:math id="M154" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>) for each monitoring well. Panel <bold>(a)</bold> includes models trained on dynamically similar subsets (G-P<sub>COR</sub>), and panel <bold>(b)</bold> shows models trained on randomly selected subsets (G-P<sub>RND</sub>). Colored points indicate generalization stages (P0–P5; cumulative removal of <inline-formula><mml:math id="M157" display="inline"><mml:mrow><mml:mi>x</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">500</mml:mn></mml:mrow></mml:math></inline-formula> wells from P0); therefore, the set of wells varies across stages. Right-hand subplots display the same data disaggregated by stage. Points above the <inline-formula><mml:math id="M158" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> line mark wells where the global model outperforms its single-well counterpart.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/2373/2026/hess-30-2373-2026-f04.png"/>

          </fig>

      <p id="d2e4081">Figure <xref ref-type="fig" rid="F5"/>a highlights the strong relationship between time series representativeness, quantified as the mean absolute correlation to all other training wells, and model performance. Wells with low representativeness tend to exhibit higher error variance and more frequent underperformance, especially at early stages. From stage P3 onward, a clear threshold emerges around a representativeness value of 0.45, above which consistently high NSE values are achieved. This underscores the central role of dynamic similarity in improving global model skill and reliability.</p>

      <fig id="F5" specific-use="star"><label>Figure 5</label><caption><p id="d2e4088"><italic>Relationship between time series representativeness and model performance</italic>. NSE scores of global models are plotted against the representativeness of each well, defined as the mean absolute correlation with all other training wells. Panel <bold>(a)</bold> shows results for correlation-based removal (P1–P5<sub>COR</sub>), and panel <bold>(b)</bold> for random removal (P1–P5<sub>RND</sub>). Stages are cumulative subsets of P0 obtained by removing <inline-formula><mml:math id="M161" display="inline"><mml:mrow><mml:mi>x</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">500</mml:mn></mml:mrow></mml:math></inline-formula> wells. Densities along the top axis indicate the distribution of representativeness across generalization stages (P0–P5). Model performance increases with higher representativeness, particularly under the COR setting, where wells with atypical dynamics are systematically excluded.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/2373/2026/hess-30-2373-2026-f05.png"/>

          </fig>

      <p id="d2e4136">A qualitative view of these relationships is provided in Fig. <xref ref-type="fig" rid="FB1"/>, which displays min–max normalized groundwater level time series for every 20th well, sorted by representativeness, along with the difference in predictive performance (ΔNSE) between single-well and best-performing global models.</p>
      <p id="d2e4141">Finally, Fig. <xref ref-type="fig" rid="F6"/> complements the distributional NSE analysis by quantifying how model updates translate into performance changes across stages. Panel (a) compares successive-stage deltas (<inline-formula><mml:math id="M162" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>NSE between two consecutively retrained global models) evaluated on the wells that remain in both stages (i.e., a stage-dependent test set). Across both strategies, median <inline-formula><mml:math id="M163" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>NSE values stay close to zero (often slightly positive), indicating that retraining on moderately reduced datasets does not systematically deteriorate predictive skill for the wells that remain. A clear contrast, however, is visible in the dispersion of <inline-formula><mml:math id="M164" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>NSE: under correlation-based removal (P<inline-formula><mml:math id="M165" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">COR</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>), the distributions tend to become narrower with progressing stages, suggesting more consistent performance changes across wells as the training data are progressively homogenized in terms of dynamics. In other words, removing dynamically atypical wells reduces conflicting learning signals and stabilizes how retraining translates into changes in predictive skill on the remaining wells. Under random removal (P<inline-formula><mml:math id="M166" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">RND</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>), the spread of <inline-formula><mml:math id="M167" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>NSE increases as the training set shrinks, reflecting higher sensitivity to which wells are retained and a higher variance in retraining outcomes, including more pronounced negative deltas for a subset of sites.</p>

      <fig id="F6" specific-use="star"><label>Figure 6</label><caption><p id="d2e4199"><italic>Change in model performance across generalization stages</italic>. Distributions of <inline-formula><mml:math id="M168" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>NSE for global models across reduction stages under correlation-based (left; COR) and random (right; RND) well removal. <bold>(a)</bold> shows successive-stage deltas (e.g., P2 <inline-formula><mml:math id="M169" display="inline"><mml:mo>→</mml:mo></mml:math></inline-formula> P3), computed on wells present in both consecutive stages (stage-dependent test set). <bold>(b)</bold> shows fixed-test-set deltas evaluated on the same P5 wells for all comparisons, quantifying <inline-formula><mml:math id="M170" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>NSE (P<inline-formula><mml:math id="M171" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> <inline-formula><mml:math id="M172" display="inline"><mml:mo>→</mml:mo></mml:math></inline-formula> P5) <inline-formula><mml:math id="M173" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> NSE (P5) <inline-formula><mml:math id="M174" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula> NSE (P<inline-formula><mml:math id="M175" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula>) (e.g., P2 <inline-formula><mml:math id="M176" display="inline"><mml:mo>→</mml:mo></mml:math></inline-formula> P5). Boxes are labeled with the corresponding sample size (<inline-formula><mml:math id="M177" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>). Across both strategies, median <inline-formula><mml:math id="M178" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>NSE values remain close to zero, indicating no systematic loss of predictive skill with increasing data reduction.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/2373/2026/hess-30-2373-2026-f06.png"/>

          </fig>

      <p id="d2e4295">Panel (b) evaluates deltas on a fixed and identical test set, i.e., the wells that constitute the final stage P5 (constant <inline-formula><mml:math id="M179" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>), thereby isolating training-basis effects from changes in the evaluated well population. For P<inline-formula><mml:math id="M180" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">COR</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, deltas are predominantly positive for early stages and progressively approach zero towards P4 <inline-formula><mml:math id="M181" display="inline"><mml:mo>→</mml:mo></mml:math></inline-formula> P5, indicating that the final P5 model achieves systematically higher skill on the representative P5 wells than models trained on less filtered datasets, but that most of this improvement is already realized by intermediate stages (diminishing additional gains thereafter). In contrast, P<inline-formula><mml:math id="M182" display="inline"><mml:mrow><mml:mi>x</mml:mi><mml:mi mathvariant="normal">RND</mml:mi></mml:mrow></mml:math></inline-formula>  yields broader, near-zero-centered distributions with substantial positive and negative mass, implying that the final-stage model is not uniformly superior to randomly reduced counterparts on the same P5 wells; rather, gains and losses are site-dependent. Together with Fig. <xref ref-type="fig" rid="F5"/>, this indicates that the improvements under COR are primarily driven by the targeted exclusion of dynamically atypical wells (i.e., increasing representativeness), rather than by data-volume reduction alone.</p>
</sec>
<sec id="Ch1.S4.SS2.SSS3">
  <label>4.2.3</label><title>Training Set Size</title>
      <p id="d2e4343">To isolate the effect of training data quantity on global model performance, we conducted a second experiment in which wells were randomly removed from the original training set in steps of 500, resulting in five increasingly reduced datasets (G-P1<sub>RND</sub> to G-P5<sub>RND</sub>). In contrast to the correlation-based approach, dynamic similarity was not considered here, allowing us to assess whether model skill improves simply with more training data and whether a critical threshold exists.</p>
      <p id="d2e4364">Despite the substantial reduction in training data, down to just 451 wells in G-P5<sub>RND</sub>, global model performance remains remarkably stable. Median NSE values vary only marginally between 0.53 and 0.55, and mean values hover around 0.48 across all stages (Table <xref ref-type="table" rid="T1"/>). Similarly, the interquartile range and the overall shape of the NSE distributions (upper part of Fig. <xref ref-type="fig" rid="F3"/>b) show little variation, and the cumulative distribution curves (lower part of Fig. <xref ref-type="fig" rid="F3"/>b) remain largely overlapping. These results suggest that increasing training set size alone does not necessarily lead to better model skill. Interestingly, the global model slightly outperforms the corresponding single-well models in the final stages (P4–P5), reflecting a shift toward more dynamically coherent wells. Thus, while random data reduction does not degrade performance, it also does not yield the benefits commonly associated with larger datasets.</p>
      <p id="d2e4382">This interpretation is further supported by the summary in Table <xref ref-type="table" rid="T2"/>, where the share of poorly performing wells remains around 5 %, and the proportion of high-performing wells (NSE <inline-formula><mml:math id="M186" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 0.75) increases slightly, from 9.2 % to 14.9 %, despite the lower number of wells. The global model consistently performs as well as or slightly better than the corresponding local models in the final stages (G <inline-formula><mml:math id="M187" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> S: 53.2 % at P5<sub>RND</sub>). While dynamic similarity would be expected to remain constant under random removal, this is not entirely the case for our real-world dataset, as representativeness does not increase continuously but in discrete jumps. At smaller training set sizes, the probability of retaining a more homogeneous subset therefore increases, which can lead to a modest performance gain in later stages. Nevertheless, this improvement is far less pronounced than with correlation-based filtering.</p>
      <p id="d2e4411">The scatter plots in Fig. <xref ref-type="fig" rid="F4"/>b further support these findings: in contrast to the correlation-based experiment, there is no clear upward shift of the global model scores across stages. Points remain evenly scattered along the <inline-formula><mml:math id="M189" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> line, and the share of wells where the global model outperforms the local one increases only marginally. This visual stability reinforces the notion that model skill is mainly independent of training set size, unless accompanied by improved dynamic similarity.</p>
      <p id="d2e4429">Figure <xref ref-type="fig" rid="F5"/>b illustrates that, even under random well removal, wells with high representativeness (mean absolute correlation <inline-formula><mml:math id="M190" display="inline"><mml:mrow><mml:mo>|</mml:mo><mml:mover accent="true"><mml:mi>r</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mo>|</mml:mo><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.45</mml:mn></mml:mrow></mml:math></inline-formula>) consistently yield high NSE scores across all stages. However, unlike the correlation-based approach, the representativeness distribution remains broad, and wells with atypical dynamics persist throughout all subsets. As a result, no clear performance threshold emerges and overall skill remains largely stable. There is, however, a slight performance increase of G-P<sub>RND</sub> across stages, although this effect is modest compared to the gains observed with correlation-based filtering. This small improvement reflects a property of our dataset: representativeness does not increase gradually but in discrete steps, which increases the likelihood that smaller, randomly selected subsets contain more dynamically homogeneous wells. These findings highlight the importance of dynamic similarity rather than dataset size for achieving high predictive skill.</p>
      <p id="d2e4462">The <inline-formula><mml:math id="M192" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>NSE diagnostics in Fig. <xref ref-type="fig" rid="F6"/> further support this interpretation for random removal. In the stage-dependent comparison (panel a), median deltas remain close to zero while the spread increases towards later stages, indicating that retraining outcomes become more variable as the training set shrinks, without a systematic net improvement. Importantly, this conclusion also holds when controlling for the evaluated wells (panel b; fixed P5 test set): the <inline-formula><mml:math id="M193" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>NSE distributions remain broad and centered near zero, with substantial positive and negative mass. Thus, random reduction does not consistently steer the training basis towards higher representativeness; instead, performance changes on the same wells are largely site-dependent, reinforcing that dynamic similarity – not training set size per se – is the dominant driver of systematic performance gains.</p>
</sec>
</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Prediction of Extreme Events (RQ iii)</title>
      <p id="d2e4490">To assess model performance under extrapolative conditions, we evaluated predictions for groundwater levels (GWLs) beyond the typical range observed during training. For each well, low extremes were defined as values in the test period below the 1st percentile of its training distribution, and high extremes as values above the 99th percentile. This site-specific percentile approach ensures that extremes are identified relative to each well’s training history, while avoiding dependence on absolute thresholds.</p>
      <p id="d2e4493">Figure <xref ref-type="fig" rid="F7"/> summarizes RMSE distributions for all extrapolated values (top), low extremes (middle), and high extremes (bottom), using both correlation-based and random partitioning. Across all stages, global models do not show improved predictive skill over single-well models. For low extremes, errors are slightly higher for global models at every stage, suggesting that dynamics associated with exceptionally low GWLs are underrepresented in the training sets. For high extremes, both model types perform similarly, with neither showing a consistent advantage.</p>

      <fig id="F7" specific-use="star"><label>Figure 7</label><caption><p id="d2e4500"><italic>Model performance under extrapolated conditions</italic>. Boxplots of RMSE for single-well (S) and global (G) models across generalization stages (P0–P5). Panel <bold>(a)</bold> shows correlation-based stages (G-P<sub>COR</sub>) and panel <bold>(b)</bold> random stages (G-P<sub>RND</sub>). Results are computed for extrapolated time steps only (<italic>top</italic>) and split into low (<italic>middle</italic>) and high (<italic>bottom</italic>) groundwater-level extrapolations.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/2373/2026/hess-30-2373-2026-f07.png"/>

        </fig>

      <p id="d2e4546">The stability of error distributions across increasing training set homogeneity or size indicates that, in groundwater systems, larger or more homogeneous datasets do not automatically enhance the prediction of extremes. A plausible explanation is that extreme events often depend on site-specific factors such as fine-scale geology, localized abstraction, or land use, which are not fully captured by the available static site descriptors. Without sufficiently informative descriptors, the transfer of extreme-event knowledge between sites is limited, and events not directly inferable from the dynamic meteorological inputs remain difficult to predict. This reflects a general constraint of current large-scale groundwater datasets rather than a shortcoming of the modeling approach itself.</p>
</sec>
<sec id="Ch1.S4.SS4">
  <label>4.4</label><title>Out-of-Sample Spatial Prediction (RQ iv)</title>
      <p id="d2e4558">To evaluate the spatial transferability of global models, we assessed their performance on monitoring wells deliberately excluded from model calibration. This simulates predictions at sites without prior training data, while observations remain available for evaluation. Out-of-sample (OOS) subsets were defined using the partitioning strategies introduced in Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/>, i.e., correlation-based removal of dynamically dissimilar wells and random exclusion. Model performance at these OOS sites was compared to that of single-well models trained individually on each target well (in-sample reference). Figure <xref ref-type="fig" rid="F8"/> summarizes the resulting differences in predictive skill.</p>

      <fig id="F8" specific-use="star"><label>Figure 8</label><caption><p id="d2e4567"><italic>Comparison of single-well and global model performance across generalization stages (out-of-sample wells)</italic>. Boxplots (<italic>top</italic>) and cumulative distribution functions (CDFs, <italic>bottom</italic>) of NSE for global (G) and single-well (S) models evaluated on the held-out wells of stages P1–P5. Panel <bold>(a)</bold> shows correlation-based exclusion (G-P<sub>COR</sub>) and panel <bold>(b)</bold> random exclusion (G-P<sub>RND</sub>). Global models are trained without the held-out wells (spatial transfer), whereas S models provide a site-specific in-sample baseline for the same wells.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/2373/2026/hess-30-2373-2026-f08.png"/>

        </fig>

      <p id="d2e4609">At stage P<sub><italic>x</italic></sub>, the global model is trained on the remaining wells of that stage and evaluated on the wells excluded up to that stage (cumulative OOS target set). Hence, the OOS test set varies across stages (increasing from 500 to 2500 wells), and cross-stage comparisons should be interpreted as transfer to different target populations rather than a like-for-like evaluation on a fixed test set.</p>
<sec id="Ch1.S4.SS4.SSS1">
  <label>4.4.1</label><title>OOS Based on Dynamic Similarity</title>
      <p id="d2e4629">The upper panel of Fig. <xref ref-type="fig" rid="F8"/>a summarizes performance for wells excluded due to dynamic dissimilarity. Global models consistently underperform single-well models across all stages, reflecting the difficulty of transferring learned dynamics to sites with low similarity to the training data. In early stages, where the excluded wells are most atypical, the performance deficits are largest. As stages progress, the cumulative OOS target set contains an increasing share of wells that are less dissimilar to the remaining training subset, and global model performance improves accordingly. At the same time, the training base shrinks with progressing stages, which limits the extent of these gains.</p>
      <p id="d2e4634">Despite the rightward shift of the global distributions, the performance gap to single-well models remains substantial across stages. This suggests that global models, even when trained on dynamically more consistent subsets, often lack the site-specificity needed to match locally trained models at excluded wells. Moreover, the global predictions show more extreme low-NSE outcomes in early stages, indicating that highly dissimilar targets increase the risk of severe model failure.</p>
      <p id="d2e4637">The lower panel of Fig. <xref ref-type="fig" rid="F8"/>a shows the cumulative distribution of NSE values for the OOS wells. Across all stages, the curves for the global models lie below those of the single-well models, confirming their overall weaker performance. While the global curves shift slightly rightward with increasing stage, the separation persists across much of the NSE range, indicating that the deficit is not confined to a small subset of wells but affects a broad set of targets.</p>
</sec>
<sec id="Ch1.S4.SS4.SSS2">
  <label>4.4.2</label><title>OOS Random Based</title>
      <p id="d2e4650">Under random partitioning, wells are excluded from training regardless of dynamic similarity. Consequently, the OOS target set grows from 500 to 2500 wells across stages, while the number of training wells decreases from 2451 to 451. Figure <xref ref-type="fig" rid="F8"/>b summarizes the results.</p>
      <p id="d2e4655">The upper part of Fig. <xref ref-type="fig" rid="F8"/>b shows the NSE distributions for global and single-well models across stages. Global models are, on average, slightly less accurate, but the differences remain small and the distributions are largely stable across stages. Extreme low-NSE outliers are rare, suggesting that random data removal does not induce systematic prediction failures.</p>
      <p id="d2e4660">The lower part of Fig. <xref ref-type="fig" rid="F8"/>b displays the cumulative NSE distributions for all OOS wells. Global and single-well curves are closely aligned across stages, confirming broadly similar predictive skill under random exclusion, with a small tendency toward underperformance of the global models. These results emphasize that global models can generalize reasonably well when the excluded targets are not systematically dissimilar, but they do not gain a clear advantage over locally specialized models in this setting.</p>
</sec>
</sec>
</sec>
<sec id="Ch1.S5">
  <label>5</label><title>Discussion</title>
<sec id="Ch1.S5.SS1">
  <label>5.1</label><title>Comparison with previous studies</title>
      <p id="d2e4682">Our findings provide mixed support for earlier results from deep learning applications in hydrology and hydrogeology. In line with studies in streamflow modeling <xref ref-type="bibr" rid="bib1.bibx16 bib1.bibx18 bib1.bibx22" id="paren.44"/>, we find that global models can achieve predictive skill comparable to or exceeding that of local models when trained on large dynamically homogeneous datasets. This confirms the general advantage of cross-site learning in environments where system dynamics are similar, as also observed in other partitioning approaches <xref ref-type="bibr" rid="bib1.bibx5 bib1.bibx39 bib1.bibx8" id="paren.45"/>. More generally, multi-site training can be attractive because it consolidates model development into a single network-level model (instead of thousands of per-well calibrations) and, in principle, enlarges the training envelope by exposing the model to a broader range of hydro-climatic situations and response regimes. This is often discussed as a pathway to improved robustness and information sharing, particularly for sites with limited local data.</p>
      <p id="d2e4691">When the available training history is progressively shortened while validation and test remain fixed, single-well model performance deteriorates more strongly and becomes more variable, whereas the global model remains comparatively stable across record lengths. This pattern is consistent with the notion that local deep-learning models require sufficiently long site records to reach their full potential, while global training can partially compensate limited local information via cross-site learning <xref ref-type="bibr" rid="bib1.bibx36 bib1.bibx21" id="paren.46"/>. In other words, even if global models do not provide a systematic advantage under heterogeneous conditions for long records, their relative robustness under shorter records supports the relevance of multi-site learning for applications where monitoring histories are limited.</p>
      <p id="d2e4697">However, while global models have often shown clear advantages for streamflow applications <xref ref-type="bibr" rid="bib1.bibx17 bib1.bibx20" id="paren.47"/>, though not without recent dissenting findings <xref ref-type="bibr" rid="bib1.bibx32" id="paren.48"/>, our results for groundwater level prediction reveal no overall advantage under heterogeneous conditions. This echoes the broader debate that “global-model superiority” is not guaranteed: even in hydrology, global models can underperform local ones when local dynamics are highly site-specific and when local data quality or representativeness exceeds what the shared feature space can explain.</p>
      <p id="d2e4706">This difference is likely related to the broader diversity and high small-scale variability of groundwater system dynamics, ranging from highly responsive karst aquifers to inertial systems in low-permeability sediments. Unlike many surface water catchments, groundwater dynamics can differ markedly even over short distances. Nearby wells may share similar static feature values (e.g., geology, land use) yet exhibit distinct responses due to fine-scale geological differences, flow paths, or localized abstraction.</p>
      <p id="d2e4710">Under such conditions, a global model may still learn a broad range of dynamics present in the training set but (because the available static site descriptors are not sufficiently informative to uniquely identify the controlling local conditions) cannot reliably assign the correct dynamic regime to a specific well. This is the case even though our benchmark includes a comparatively rich set of <inline-formula><mml:math id="M199" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">50</mml:mn></mml:mrow></mml:math></inline-formula> time-invariant static attributes (hydrogeology, topography, soils, land use) and intentionally excludes groundwater time-series statistics. As a result, the model tends to average across similar but not identical behaviors, which reduces site-specific accuracy under heterogeneous conditions. This highlights a key trade-off between global and local strategies: global models emphasize breadth and generalization, whereas single-well models emphasize local precision and are less prone to “smoothing” unique site behavior into a dominant average regime. In settings with strong local controls that are not fully observable from static descriptors, local models can remain highly competitive – especially when long, high-quality site records are available. This mechanism is consistent with the strong empirical link between time-series representativeness and model skill observed in our experiments: wells whose dynamics are well represented by the remaining training pool (high mean absolute correlation) are systematically easier to predict, whereas atypical wells show substantially higher error variance. Targeted correlation-based filtering shifts the training data towards this representative regime and thereby reduces conflicting learning signals. Consistent with the concerns raised by <xref ref-type="bibr" rid="bib1.bibx13" id="text.49"/>, this reflects a broader limitation of large-scale groundwater applications, where even rich static descriptors may not fully capture fine-scale controls such as local flow paths or abstraction. While dynamic similarity may correlate with some static attributes, our filtering is purely time-series based and not spatially constrained; therefore, increasing dynamic similarity does not necessarily imply a strong homogenization of static features.</p>
      <p id="d2e4726">Finally, our partitioning experiments confirm the robustness benefits reported in other hydrological contexts <xref ref-type="bibr" rid="bib1.bibx5 bib1.bibx39" id="paren.50"/>: grouping wells with similar dynamics before training significantly improved the performance of global models, even with fewer training wells. Notably, the <inline-formula><mml:math id="M200" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>NSE diagnostics further indicate that these gains are not merely a consequence of evaluating on an increasingly “easier” subset: when assessed on a fixed well set (the final P5 wells), correlation-based reduction yields predominantly positive performance differences relative to earlier stages and converges towards zero in later stages, whereas random reduction remains broadly centered around zero with substantial site-dependent gains and losses. Together, these results reinforce the broader conclusion from the literature that data homogeneity, whether achieved via targeted filtering or clustering, can be more important for generalization skill than sheer dataset size. In groundwater modeling, dynamic similarity therefore often outweighs data quantity as a determinant of global model skill. In this sense, partitioned (or clustered) global models can be viewed as a pragmatic middle ground: they retain some benefits of information sharing within dynamically coherent subgroups while reducing the risk that strongly dissimilar wells impose conflicting learning signals that hamper site-specific accuracy.</p>
<sec id="Ch1.S5.SS1.SSS1">
  <label>5.1.1</label><title>Sensitivity to model choice and scope</title>
      <p id="d2e4746">Our comparison is based on the benchmark architectures and training protocol introduced in <xref ref-type="bibr" rid="bib1.bibx30" id="text.51"/> and was designed to isolate the effects of training strategy and training-data composition under fixed model settings; architecture optimization was not the primary objective of this study. Different model classes (e.g., attention-based sequence models or graph-enhanced architectures) may shift absolute performance levels of both local and global approaches, especially if they improve entity awareness or exploit cross-site relations more explicitly. However, we expect the central qualitative patterns reported here to be comparatively robust because all partitioning stages were evaluated under identical protocols and the observed differences are primarily driven by training-data heterogeneity and the informativeness of available site descriptors. In particular, sequence models such as LSTMs are often expected to benefit from larger training sets, whereas local models can remain competitive when cross-site transfer is impeded by heterogeneous dynamics. Accordingly, architecture choice is likely to influence the magnitude of performance gaps, but is unlikely to overturn the main conclusion that dynamic similarity and informative descriptors are key determinants of successful global groundwater modeling in heterogeneous settings. Beyond architecture choice, we also observe that under correlation-based reduction the dispersion of successive-stage performance changes narrows across stages, suggesting that retraining becomes more stable once conflicting learning signals from dynamically atypical wells are removed; this stabilization is not observed under random subsampling. An additional, practically important axis not addressed in this study is sensitivity to record length, i.e., how model skill changes when only shorter groundwater time series are available. This question is highly relevant for transferability to regions with limited historical coverage but represents a separate experimental dimension from the training-set composition and spatial generalization analyses considered here.</p>
</sec>
</sec>
</sec>
<sec id="Ch1.S6" sec-type="conclusions">
  <label>6</label><title>Conclusions</title>
      <p id="d2e4762">This study provides a comprehensive evaluation of globally and locally trained deep learning models for groundwater level forecasting. Using a dataset comprising nearly 3000 monitoring wells across Germany, we systematically assessed model performance across diverse hydrogeological settings and under varying conditions of data availability, dynamic similarity, and extrapolation demands. The analysis was guided by four research questions, each addressing a key aspect of model generalization and applicability. Below, we summarize the main findings in response to each question, placing them in the context of previous hydrological research.</p>
<sec id="Ch1.S6.SS1">
  <label>6.1</label><title>Are globally trained models generally superior to local (single-well) models?</title>
      <p id="d2e4772">Not necessarily. Despite being trained on a large and diverse dataset, globally trained models did not show a notable overall advantage over locally optimized single-well models. Local models tend to achieve slightly higher median accuracy and perform better at individual sites, while global models produce more predictions clustered around the central range of performance, suggesting a more stable but less specialized behavior. This pattern is consistent with limited entity awareness under heterogeneous conditions, where even a rich set of time-invariant static attributes (<inline-formula><mml:math id="M201" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">50</mml:mn></mml:mrow></mml:math></inline-formula> hydrogeological, topographic, soil, and land-use descriptors; no groundwater time-series statistics) may be insufficient to uniquely identify the controlling local processes at each well. Conceptually, this reflects the core trade-off: global models emphasize breadth and potential robustness via information sharing, whereas single-well models emphasize local precision and can better capture unique site-specific behavior when long, high-quality local records are available. These findings align with earlier work in groundwater modeling but contrast with the consistent superiority reported for streamflow, likely reflecting the greater diversity and small-scale variability of groundwater system dynamics. However, when training is restricted to dynamically homogeneous subsets, global models can match or even exceed single-well performance at many sites, highlighting that cross-site learning becomes beneficial once heterogeneity is reduced.</p>
</sec>
<sec id="Ch1.S6.SS2">
  <label>6.2</label><title>Does training data quality or quantity matter more for global models?</title>
      <p id="d2e4793">Training data quality in terms of dynamic similarity is more important than quantity. When the training set is filtered to include only wells with similar temporal dynamics, global model accuracy improves markedly, and performance changes become more consistent across sites. In contrast, random removal of wells (reducing size without regard to similarity) does not yield comparable gains, even when roughly 85 % of wells are removed; any changes remain modest and site-dependent. Importantly, evaluating all stages on a fixed test set (the final P5 wells) confirms that the improvements under correlation-based filtering are not merely a consequence of progressively excluding difficult sites, but reflect a genuine benefit of training on a more dynamically consistent data basis. Thus, for in-sample prediction, structure (dynamic similarity/representativeness) matters more than sheer sample size; simply enlarging the dataset without improving consistency provides little benefit. For in-sample predictions, dynamic similarity is clearly the dominant factor; for out-of-sample predictions, a broader and more diverse training base may, in principle, offer robustness benefits, although this was not a dominant effect in our experiments. Consistent with this, global models were also less sensitive to shorter training histories than single-well models under our benchmark setup.</p>
</sec>
<sec id="Ch1.S6.SS3">
  <label>6.3</label><title>Can global models reliably predict extreme groundwater events?</title>
      <p id="d2e4804">No. In our experiments, neither global nor local models could reliably predict groundwater levels outside the training range. Across all partitioning stages and extrapolation regimes, global models showed slightly higher errors and a tendency to overestimate low values while underestimating high ones, consistent with a structural averaging effect. Neither increasing the amount of training data nor improving dynamic similarity mitigated this issue. A likely reason is that the available static site descriptors (e.g., geology, land use, geomorphology) are not sufficiently informative to provide strong entity awareness and to capture the local controls governing extremes, thereby limiting the transfer of extreme-event knowledge between sites. These descriptors represent the best practical dataset currently available for large-scale groundwater modeling, so this limitation reflects a general constraint of current data availability rather than a shortcoming of the modeling approach itself.</p>
</sec>
<sec id="Ch1.S6.SS4">
  <label>6.4</label><title>How well do global models generalize to unseen locations?</title>
      <p id="d2e4816">Under correlation-based exclusion (i.e., deliberately held-out wells with low dynamic similarity), the global model shows limited spatial generalization and performance drops sharply compared to single-well models, reflecting the difficulty of transferring learned dynamics to dissimilar sites. Across successive stages, the OOS target set expands from the most atypical wells to include progressively less dissimilar sites, which leads to a modest rightward shift in global performance; however, the gap to single-well models remains substantial. In contrast, random exclusion yields broadly similar performance across stages, indicating that generalization is feasible when target wells share representative temporal patterns with the training data, consistent with previous findings that groundwater dynamics are less transferable than streamflow. Importantly, our OOS setup evaluates transfer to wells with available historical observations (needed for well-specific scaling), i.e., a realistic “new well with records but not used for training” setting rather than a fully no-head-data scenario.</p>
      <p id="d2e4819">In summary, the choice between global and local modeling depends strongly on the intended application. For in-sample prediction at known sites, training on dynamically similar wells (e.g., by partitioning or filtering the dataset based on dynamic similarity) can yield accurate results even with relatively few data, as the remaining wells benefit from increased dynamic similarity and the reduced influence of poorly representative sites. For spatial generalization, a broad and diverse training base may improve robustness, although this effect was modest in our experiments and can come at the cost of predictive precision. From an operational perspective, global models can still be attractive because they provide a single maintainable model for an entire monitoring network and enable weight transfer across sites; however, where the goal is maximal site-specific accuracy (especially for atypical wells), single-well models remain a strong and often preferable option. For groundwater systems, characterized by slow and often indirect responses, sparse measurements, high small-scale variability, and limited entity awareness due to coarse static descriptors, model transferability appears inherently more limited than in surface water applications. This highlights that single-well models remain a strong option, especially when site-specific accuracy is required and local dynamics need to be captured in detail.</p>
</sec>
</sec>

      
      </body>
    <back><app-group>

<app id="App1.Ch1.S1">
  <label>Appendix A</label><title>Spatial distribution of monitoring wells</title>
      <p id="d2e4835">Figure <xref ref-type="fig" rid="FA1"/> shows the geographical distribution of all monitoring wells used in the modeling experiments, as well as their progressive removal across different partitioning stages.</p>

      <fig id="FA1"><label>Figure A1</label><caption><p id="d2e4842"><italic>Spatial distribution of groundwater monitoring wells used in this study</italic>. The panels distinguish between correlation-based (P<sub>COR</sub>, <bold>a</bold>) and random (G-P<sub>RND</sub>, <bold>b</bold>) data removal scenarios across six stages (P0–P5). Each stage represents a progressive reduction of the training data set, either by removing wells with low dynamic similarity (P<sub>COR</sub>) or through random subsampling (G-P<sub>RND</sub>). The map highlights how spatial coverage changes with increasing data reduction</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/2373/2026/hess-30-2373-2026-f09.png"/>

      </fig>


</app>

<app id="App1.Ch1.S2">
  <label>Appendix B</label><title>Stacked groundwater level time series with representativeness and performance difference</title>
      <p id="d2e4908">Figure <xref ref-type="fig" rid="FB1"/> shows min–max normalized groundwater level time series for every 20th monitoring well (from the second-highest representativeness rank), ordered by representativeness.</p>

      <fig id="FB1"><label>Figure B1</label><caption><p id="d2e4915"><italic>Stacked groundwater level (GWL) time series (min–max normalized) by representativeness</italic>. Left: time series color-coded by representativeness (<inline-formula><mml:math id="M206" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula>); middle: bars of <inline-formula><mml:math id="M207" display="inline"><mml:mi>R</mml:mi></mml:math></inline-formula>; right: <inline-formula><mml:math id="M208" display="inline"><mml:mi mathvariant="normal">Δ</mml:mi></mml:math></inline-formula>NSE <inline-formula><mml:math id="M209" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> NSE<sub>S</sub> <inline-formula><mml:math id="M211" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula> NSE<sub>G</sub>, clipped to <inline-formula><mml:math id="M213" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.5</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.5</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> (dark gray <inline-formula><mml:math id="M214" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> S better, blue <inline-formula><mml:math id="M215" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> G better).</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/2373/2026/hess-30-2373-2026-f10.png"/>

      </fig>

</app>
  </app-group><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d2e5018">The code used in this study is publicly available on GitHub (<uri>https://github.com/KITHydrogeology/singlewell-vs-global-gwl</uri>, last access: 16 April 2026) and archived on Zenodo <xref ref-type="bibr" rid="bib1.bibx29" id="paren.52"/>. The dataset is available via Zenodo <xref ref-type="bibr" rid="bib1.bibx30" id="paren.53"/>.</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e5033">MO carried out the data analysis, prepared the figures and plots, and drafted the main part of the manuscript. Conceptualisation and methodology were developed jointly by MO and TL. TL conducted most of the modelling experiments and contributed to the interpretation of results. Both authors revised and approved the final manuscript.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e5042">The contact author has declared that neither of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e5048">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e5054">All programming was done in Python version 3.12 <xref ref-type="bibr" rid="bib1.bibx34" id="paren.54"/> and the associated libraries, including NumPy <xref ref-type="bibr" rid="bib1.bibx11" id="paren.55"/>, Pandas <xref ref-type="bibr" rid="bib1.bibx24" id="paren.56"/>, Tensorflow <xref ref-type="bibr" rid="bib1.bibx1" id="paren.57"/>, Keras <xref ref-type="bibr" rid="bib1.bibx6" id="paren.58"/>, SciPy <xref ref-type="bibr" rid="bib1.bibx35" id="paren.59"/>, Scikit-learn <xref ref-type="bibr" rid="bib1.bibx31" id="paren.60"/> and Matplotlib <xref ref-type="bibr" rid="bib1.bibx14" id="paren.61"/>. The authors further acknowledge support by the state of Baden-Württemberg through bwHPC.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e5084">The article processing charges for this open-access publication were covered by the Karlsruhe Institute of Technology (KIT).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e5094">This paper was edited by Daniel Klotz and reviewed by three anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Abadi et al.(2016)Abadi, Agarwal, Barham, Brevdo, Chen, Citro, Corrado, Davis, Dean, Devin, Ghemawat, Goodfellow, Harp, Irving, Isard, Jia, Jozefowicz, Kaiser, Kudlur, Levenberg, Mané, Monga, Moore, Murray, Olah, Schuster, Shlens, Steiner, Sutskever, Talwar, Tucker, Vanhoucke, Vasudevan, Viégas, Vinyals, Warden, Wattenberg, Wicke, Yu, and Zheng</label><mixed-citation>Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X.: TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, in: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.1603.04467" ext-link-type="DOI">10.48550/arXiv.1603.04467</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Acuña Espinoza et al.(2025)Acuna Espinoza, Loritz, Kratzert, Klotz, Gauch, Álvarez Chaves, Bäuerle, and Ehret</label><mixed-citation>Acuña Espinoza, E., Loritz, R., Kratzert, F., Klotz, D., Gauch, M., Álvarez Chaves, M., and Ehret, U.: Analyzing the generalization capabilities of a hybrid hydrological model for extrapolation to extreme events, Hydrol. Earth Syst. Sci., 29, 1277–1294, <ext-link xlink:href="https://doi.org/10.5194/hess-29-1277-2025" ext-link-type="DOI">10.5194/hess-29-1277-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Bandara et al.(2020)Bandara, Bergmeir, and Smyl</label><mixed-citation>Bandara, K., Bergmeir, C., and Smyl, S.: Forecasting across time series databases using recurrent neural networks on groups of similar series: A clustering approach, Expert Syst. Appl., 140, 112896, <ext-link xlink:href="https://doi.org/10.1016/j.eswa.2019.112896" ext-link-type="DOI">10.1016/j.eswa.2019.112896</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Baste et al.(2025)Baste, Klotz, Espinoza, Bardossy, and Loritz</label><mixed-citation>Baste, S., Klotz, D., Acuña Espinoza, E., Bardossy, A., and Loritz, R.: Unveiling the limits of deep learning models in hydrological extrapolation tasks, Hydrol. Earth Syst. Sci., 29, 5871–5891, <ext-link xlink:href="https://doi.org/10.5194/hess-29-5871-2025" ext-link-type="DOI">10.5194/hess-29-5871-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Chidepudi et al.(2025)Chidepudi, Massei, Jardani, Dieppois, Henriot, and Fournier</label><mixed-citation>Chidepudi, S. K. R., Massei, N., Jardani, A., Dieppois, B., Henriot, A., and Fournier, M.: Training deep learning models with a multi-station approach and static aquifer attributes for groundwater level simulation: what is the best way to leverage regionalised information?, Hydrol. Earth Syst. Sci., 29, 841–861, <ext-link xlink:href="https://doi.org/10.5194/hess-29-841-2025" ext-link-type="DOI">10.5194/hess-29-841-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Chollet(2015)</label><mixed-citation>Chollet, F.: Keras, GitHub [code], <uri>https://github.com/fchollet/keras</uri> (last access: 14 April 2026), 2015.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Chu et al.(2022)Chu, Bian, Lang, Sun, and Wang</label><mixed-citation>Chu, H., Bian, J., Lang, Q., Sun, X., and Wang, Z.: Daily Groundwater Level Prediction and Uncertainty Using LSTM Coupled with PMI and Bootstrap Incorporating Teleconnection Patterns Information, Sustainability, 14, 11598, <ext-link xlink:href="https://doi.org/10.3390/su141811598" ext-link-type="DOI">10.3390/su141811598</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Clark et al.(2022)Clark, Pagendam, and Ryan</label><mixed-citation>Clark, S. R., Pagendam, D., and Ryan, L.: Forecasting Multiple Groundwater Time Series with Local and Global Deep Learning Networks, Int. J. Env. Res. Pub. He., 19, 5091, <ext-link xlink:href="https://doi.org/10.3390/ijerph19095091" ext-link-type="DOI">10.3390/ijerph19095091</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Collenteur et al.(2024)Collenteur, Haaf, Bakker, Liesch, Wunsch, Soonthornrangsan, White, Martin, Hugman, de Sousa, Vanden Berghe, Fan, Peterson, Bikše, Di Ciacca, Wang, Zheng, Nölscher, Koch, Schneider, Benavides Höglund, Krishna Reddy Chidepudi, Henriot, Massei, Jardani, Rudolph, Rouhani, Gómez-Hernández, Jomaa, Pölz, Franken, Behbooei, Lin, and Meysami</label><mixed-citation>Collenteur, R. A., Haaf, E., Bakker, M., Liesch, T., Wunsch, A., Soonthornrangsan, J., White, J., Martin, N., Hugman, R., de Sousa, E., Vanden Berghe, D., Fan, X., Peterson, T. J., Bikše, J., Di Ciacca, A., Wang, X., Zheng, Y., Nölscher, M., Koch, J., Schneider, R., Benavides Höglund, N., Krishna Reddy Chidepudi, S., Henriot, A., Massei, N., Jardani, A., Rudolph, M. G., Rouhani, A., Gómez-Hernández, J. J., Jomaa, S., Pölz, A., Franken, T., Behbooei, M., Lin, J., and Meysami, R.: Data-driven modelling of hydraulic-head time series: results and lessons learned from the 2022 Groundwater Time Series Modelling Challenge, Hydrol. Earth Syst. Sci., 28, 5193–5208, <ext-link xlink:href="https://doi.org/10.5194/hess-28-5193-2024" ext-link-type="DOI">10.5194/hess-28-5193-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Gomez et al.(2024)Gomez, Nölscher, Hartmann, and Broda</label><mixed-citation>Gomez, M., Nölscher, M., Hartmann, A., and Broda, S.: Assessing groundwater level modelling using a 1-D convolutional neural network (CNN): linking model performances to geospatial and time series features, Hydrol. Earth Syst. Sci., 28, 4407–4425, <ext-link xlink:href="https://doi.org/10.5194/hess-28-4407-2024" ext-link-type="DOI">10.5194/hess-28-4407-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Harris et al.(2020)Harris, Millman, van der Walt, Gommers, Virtanen, Cournapeau, Wieser, Taylor, Berg, Smith, Kern, Picus, Hoyer, van Kerkwijk, Brett, Haldane, del Río, Wiebe, Peterson, Gérard-Marchant, Sheppard, Reddy, Weckesser, Abbasi, Gohlke, and Oliphant</label><mixed-citation>Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R., Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del Río, J. F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard, K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E.: Array programming with NumPy, Nature, 585, 357–362, <ext-link xlink:href="https://doi.org/10.1038/s41586-020-2649-2" ext-link-type="DOI">10.1038/s41586-020-2649-2</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Hauswirth et al.(2021)Hauswirth, Bierkens, Beijk, and Wanders</label><mixed-citation>Hauswirth, S. M., Bierkens, M. F., Beijk, V., and Wanders, N.: The potential of data driven approaches for quantifying hydrological extremes, Adv. Water Resour., 155, 104017, <ext-link xlink:href="https://doi.org/10.1016/j.advwatres.2021.104017" ext-link-type="DOI">10.1016/j.advwatres.2021.104017</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Heudorfer et al.(2024)Heudorfer, Liesch, and Broda</label><mixed-citation>Heudorfer, B., Liesch, T., and Broda, S.: On the challenges of global entity-aware deep learning models for groundwater level prediction, Hydrol. Earth Syst. Sci., 28, 525–543, <ext-link xlink:href="https://doi.org/10.5194/hess-28-525-2024" ext-link-type="DOI">10.5194/hess-28-525-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Hunter(2007)</label><mixed-citation>Hunter, J. D.: Matplotlib: A 2D Graphics Environment, Comput. Sci. Eng., 9, 90–95, <ext-link xlink:href="https://doi.org/10.1109/MCSE.2007.55" ext-link-type="DOI">10.1109/MCSE.2007.55</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Kratzert et al.(2019a)Kratzert, Klotz, Herrnegger, Sampson, Hochreiter, and Nearing</label><mixed-citation>Kratzert, F., Klotz, D., Herrnegger, M., Sampson, A. K., Hochreiter, S., and Nearing, G. S.: Toward Improved Predictions in Ungauged Basins: Exploiting the Power of Machine Learning, Water Resour. Res., 55, 11344–11354, <ext-link xlink:href="https://doi.org/10.1029/2019wr026065" ext-link-type="DOI">10.1029/2019wr026065</ext-link>, 2019a.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Kratzert et al.(2019b)Kratzert, Klotz, Shalev, Klambauer, Hochreiter, and Nearing</label><mixed-citation>Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, G.: Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089–5110, <ext-link xlink:href="https://doi.org/10.5194/hess-23-5089-2019" ext-link-type="DOI">10.5194/hess-23-5089-2019</ext-link>, 2019b.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Kratzert et al.(2021)Kratzert, Gauch, Nearing, Hochreiter, and Klotz</label><mixed-citation>Kratzert, F., Gauch, M., Nearing, G., Hochreiter, S., and Klotz, D.: Niederschlags-Abfluss-Modellierung mit Long Short-Term Memory (LSTM), Österreichische Wasser- und Abfallwirtschaft, 73, 270–280, <ext-link xlink:href="https://doi.org/10.1007/s00506-021-00767-z" ext-link-type="DOI">10.1007/s00506-021-00767-z</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Kratzert et al.(2024)Kratzert, Gauch, Klotz, and Nearing</label><mixed-citation>Kratzert, F., Gauch, M., Klotz, D., and Nearing, G.: HESS Opinions: Never train a Long Short-Term Memory (LSTM) network on a single basin, Hydrol. Earth Syst. Sci., 28, 4187–4201, <ext-link xlink:href="https://doi.org/10.5194/hess-28-4187-2024" ext-link-type="DOI">10.5194/hess-28-4187-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Kunz et al.(2025)Kunz, Schulz, Wetzel, Nölscher, Chiaburu, Biessmann, and Broda</label><mixed-citation>Kunz, S., Schulz, A., Wetzel, M., Nölscher, M., Chiaburu, T., Biessmann, F., and Broda, S.: Towards a global spatial machine learning model for seasonal groundwater level predictions in Germany, Hydrol. Earth Syst. Sci., 29, 3405–3433, <ext-link xlink:href="https://doi.org/10.5194/hess-29-3405-2025" ext-link-type="DOI">10.5194/hess-29-3405-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Lees et al.(2022)Lees, Reece, Kratzert, Klotz, Gauch, De Bruijn, Kumar Sahu, Greve, Slater, and Dadson</label><mixed-citation>Lees, T., Reece, S., Kratzert, F., Klotz, D., Gauch, M., De Bruijn, J., Kumar Sahu, R., Greve, P., Slater, L., and Dadson, S. J.: Hydrological concept formation inside long short-term memory (LSTM) networks, Hydrol. Earth Syst. Sci., 26, 3079–3101, <ext-link xlink:href="https://doi.org/10.5194/hess-26-3079-2022" ext-link-type="DOI">10.5194/hess-26-3079-2022</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Ma et al.(2021)Ma, Feng, Lawson, Tsai, Liang, Huang, Sharma, and Shen</label><mixed-citation>Ma, K., Feng, D., Lawson, K., Tsai, W., Liang, C., Huang, X., Sharma, A., and Shen, C.: Transferring Hydrologic Data Across Continents – Leveraging Data‐Rich Regions to Improve Hydrologic Prediction in Data‐Sparse Regions, Water Resour. Res., 57, <ext-link xlink:href="https://doi.org/10.1029/2020wr028600" ext-link-type="DOI">10.1029/2020wr028600</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Martel et al.(2025)Martel, Arsenault, Turcotte, Castañeda-Gonzalez, Brissette, Armstrong, Mailhot, Pelletier-Dumont, Lachance-Cloutier, Rondeau-Genesse, and Caron</label><mixed-citation>Martel, J.-L., Arsenault, R., Turcotte, R., Castañeda-Gonzalez, M., Brissette, F., Armstrong, W., Mailhot, E., Pelletier-Dumont, J., Lachance-Cloutier, S., Rondeau-Genesse, G., and Caron, L.-P.: Exploring the ability of LSTM-based hydrological models to simulate streamflow time series for flood frequency analysis, Hydrol. Earth Syst. Sci., 29, 4951–4968, <ext-link xlink:href="https://doi.org/10.5194/hess-29-4951-2025" ext-link-type="DOI">10.5194/hess-29-4951-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Mbouopda et al.(2022)Mbouopda, Guyet, Labroche, and Henriot</label><mixed-citation>Mbouopda, M. F., Guyet, T., Labroche, N., and Henriot, A.: Experimental study of time series forecasting methods for groundwater level prediction, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.2209.13927" ext-link-type="DOI">10.48550/arXiv.2209.13927</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>McKinney(2010)</label><mixed-citation>McKinney, W.: Data Structures for Statistical Computing in Python, in: Proceedings of the 9th Python in Science Conference, edited by: van der Walt, S. and Millman, J., SciPy, Austin, Texas, 56–61, <ext-link xlink:href="https://doi.org/10.25080/Majora-92bf1922-00a" ext-link-type="DOI">10.25080/Majora-92bf1922-00a</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Nayak et al.(2006)Nayak, Rao, and Sudheer</label><mixed-citation>Nayak, P. C., Rao, Y. R. S., and Sudheer, K. P.: Groundwater Level Forecasting in a Shallow Aquifer Using Artificial Neural Network Approach, Water Resour. Manag., 20, 77–90, <ext-link xlink:href="https://doi.org/10.1007/s11269-006-4007-z" ext-link-type="DOI">10.1007/s11269-006-4007-z</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Nearing et al.(2024)Nearing, Cohen, Dube, Gauch, Gilon, Harrigan, Hassidim, Klotz, Kratzert, Metzger, Nevo, Pappenberger, Prudhomme, Shalev, Shenzis, Tekalign, Weitzner, and Matias</label><mixed-citation>Nearing, G., Cohen, D., Dube, V., Gauch, M., Gilon, O., Harrigan, S., Hassidim, A., Klotz, D., Kratzert, F., Metzger, A., Nevo, S., Pappenberger, F., Prudhomme, C., Shalev, G., Shenzis, S., Tekalign, T. Y., Weitzner, D., and Matias, Y.: Global prediction of extreme floods in ungauged watersheds, Nature, 627, 559–563, <ext-link xlink:href="https://doi.org/10.1038/s41586-024-07145-1" ext-link-type="DOI">10.1038/s41586-024-07145-1</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Nearing et al.(2021)Nearing, Kratzert, Sampson, Pelissier, Klotz, Frame, Prieto, and Gupta</label><mixed-citation>Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D., Frame, J. M., Prieto, C., and Gupta, H. V.: What Role Does Hydrological Science Play in the Age of Machine Learning?, Water Resour. Res., 57, <ext-link xlink:href="https://doi.org/10.1029/2020wr028091" ext-link-type="DOI">10.1029/2020wr028091</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Ohmer(2025)</label><mixed-citation>Ohmer, M.: GEMS-GER: A Machine Learning Benchmark Dataset of Long-Term Groundwater Levels in Germany with Meteorological Forcings and Site-Specific Environmental Features, Zenodo [data set], <ext-link xlink:href="https://doi.org/10.5281/zenodo.16736908" ext-link-type="DOI">10.5281/zenodo.16736908</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Ohmer and Liesch(2025)</label><mixed-citation>Ohmer, M. and Liesch, T.: singlewell-vs-global-gwl, Zenodo [data set], <ext-link xlink:href="https://doi.org/10.5281/zenodo.19453511" ext-link-type="DOI">10.5281/zenodo.19453511</ext-link>, 2025. </mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Ohmer et al.(2026)Ohmer, Liesch, Habbel, Heudorfer, Gomez, Clos, Nölscher, and Broda</label><mixed-citation>Ohmer, M., Liesch, T., Habbel, B., Heudorfer, B., Gomez, M., Clos, P., Nölscher, M., and Broda, S.: GEMS-GER: a machine learning benchmark dataset of long-term groundwater levels in Germany with meteorological forcings and site-specific environmental features, Earth Syst. Sci. Data, 18, 77–95, <ext-link xlink:href="https://doi.org/10.5194/essd-18-77-2026" ext-link-type="DOI">10.5194/essd-18-77-2026</ext-link>, 2026.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Pedregosa et al.(2011)Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, Vanderplas, Passos, Cournapeau, Brucher, Perrot, and Édouard Duchesnay</label><mixed-citation> Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Édouard Duchesnay: Scikit-learn: Machine Learning in Python, J. Mach. Learn. Res., 12, 2825–2830, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Tran et al.(2025)Tran, Nguyen, Kim, and Ivanov</label><mixed-citation>Tran, V. N., Nguyen, T. V., Kim, J., and Ivanov, V. Y.: Technical note: Does Multiple Basin Training Strategy Guarantee Superior Machine Learning Performance for Streamflow Predictions in Gaged Basins?, EGUsphere [preprint], <ext-link xlink:href="https://doi.org/10.5194/egusphere-2025-769" ext-link-type="DOI">10.5194/egusphere-2025-769</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Usman et al.(2023)Usman, Waqar, and Ng</label><mixed-citation> Usman, M., Waqar, M., and Ng, C. W. W.: Groundwater level prediction using MIMO-LSTM, Proceedings of the 4th IAHR Young Professionals Congress, Online, 22–24 November 2023, ISBN 978-90-833476-5-3, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>van Rossum(1995)</label><mixed-citation>van Rossum, G.: Python Programming Language, <uri>https://www.python.org/</uri> (last access: 14 April 2026), 1995.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Virtanen et al.(2020)Virtanen, Gommers, Oliphant, Haberland, Reddy, Cournapeau, Burovski, Peterson, Weckesser, Bright, van der Walt, Brett, Wilson, Millman, Mayorov, Nelson, Jones, Kern, Larson, Carey, Polat, Feng, Moore, VanderPlas, Laxalde, Perktold, Cimrman, Henriksen, Quintero, Harris, Archibald, Ribeiro, Pedregosa, and van Mulbregt</label><mixed-citation>Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T., Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J., van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N., Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C. J., Polat, İ., Feng, Y., Moore, E. W., VanderPlas, J., Laxalde, D., Perktold, J., Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M., Ribeiro, A. H., Pedregosa, F., and van Mulbregt, P.: SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python, Nat. Methods, 17, 261–272, <ext-link xlink:href="https://doi.org/10.1038/s41592-019-0686-2" ext-link-type="DOI">10.1038/s41592-019-0686-2</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Wunsch et al.(2021)Wunsch, Liesch, and Broda</label><mixed-citation>Wunsch, A., Liesch, T., and Broda, S.: Groundwater level forecasting with artificial neural networks: a comparison of long short-term memory (LSTM), convolutional neural networks (CNNs), and non-linear autoregressive networks with exogenous input (NARX), Hydrol. Earth Syst. Sci., 25, 1671–1687, <ext-link xlink:href="https://doi.org/10.5194/hess-25-1671-2021" ext-link-type="DOI">10.5194/hess-25-1671-2021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Wunsch et al.(2022)Wunsch, Liesch, and Broda</label><mixed-citation>Wunsch, A., Liesch, T., and Broda, S.: Feature-based Groundwater Hydrograph Clustering Using Unsupervised Self-Organizing Map-Ensembles, Water Resour. Manag., 36, 39–54, <ext-link xlink:href="https://doi.org/10.1007/s11269-021-03006-y" ext-link-type="DOI">10.1007/s11269-021-03006-y</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Yu et al.(2024)Yu, Tolson, Shen, Han, Mai, and Lin</label><mixed-citation>Yu, Q., Tolson, B. A., Shen, H., Han, M., Mai, J., and Lin, J.: Enhancing long short-term memory (LSTM)-based streamflow prediction with a spatially distributed approach, Hydrol. Earth Syst. Sci., 28, 2107–2122, <ext-link xlink:href="https://doi.org/10.5194/hess-28-2107-2024" ext-link-type="DOI">10.5194/hess-28-2107-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Zhou et al.(2024)Zhou, Zhang, Bai, Zhao, Shuai, Cui, and Shao</label><mixed-citation>Zhou, Y., Zhang, Q., Bai, G., Zhao, H., Shuai, G., Cui, Y., and Shao, J.: Groundwater dynamics clustering and prediction based on grey relational analysis and LSTM model: A case study in Beijing Plain, China, Journal of Hydrology: Regional Studies, 56, 102011, <ext-link xlink:href="https://doi.org/10.1016/j.ejrh.2024.102011" ext-link-type="DOI">10.1016/j.ejrh.2024.102011</ext-link>, 2024.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Never Train a Deep Learning Model on a Single Well? Revisiting Training Strategies for Groundwater Level Prediction</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Abadi et al.(2016)Abadi, Agarwal, Barham, Brevdo, Chen, Citro,
Corrado, Davis, Dean, Devin, Ghemawat, Goodfellow, Harp, Irving, Isard, Jia,
Jozefowicz, Kaiser, Kudlur, Levenberg, Mané, Monga, Moore, Murray, Olah,
Schuster, Shlens, Steiner, Sutskever, Talwar, Tucker, Vanhoucke, Vasudevan,
Viégas, Vinyals, Warden, Wattenberg, Wicke, Yu, and
Zheng</label><mixed-citation>
      
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado,
G. S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp,
A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M.,
Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C.,
Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P.,
Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P.,
Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X.: TensorFlow: Large-Scale
Machine Learning on Heterogeneous Distributed Systems, in: 12th USENIX
Symposium on Operating Systems Design and Implementation (OSDI 16), arXiv [preprint],
<a href="https://doi.org/10.48550/arXiv.1603.04467" target="_blank">https://doi.org/10.48550/arXiv.1603.04467</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Acuña Espinoza et al.(2025)Acuna Espinoza, Loritz, Kratzert, Klotz,
Gauch, Álvarez Chaves, Bäuerle, and Ehret</label><mixed-citation>
      
Acuña Espinoza, E., Loritz, R., Kratzert, F., Klotz, D., Gauch, M., Álvarez Chaves, M., and Ehret, U.: Analyzing the generalization capabilities of a hybrid hydrological model for extrapolation to extreme events, Hydrol. Earth Syst. Sci., 29, 1277–1294, <a href="https://doi.org/10.5194/hess-29-1277-2025" target="_blank">https://doi.org/10.5194/hess-29-1277-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Bandara et al.(2020)Bandara, Bergmeir, and
Smyl</label><mixed-citation>
      
Bandara, K., Bergmeir, C., and Smyl, S.: Forecasting across time series
databases using recurrent neural networks on groups of similar series: A
clustering approach, Expert Syst. Appl., 140, 112896,
<a href="https://doi.org/10.1016/j.eswa.2019.112896" target="_blank">https://doi.org/10.1016/j.eswa.2019.112896</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Baste et al.(2025)Baste, Klotz, Espinoza, Bardossy, and
Loritz</label><mixed-citation>
      
Baste, S., Klotz, D., Acuña Espinoza, E., Bardossy, A., and Loritz, R.: Unveiling the limits of deep learning models in hydrological extrapolation tasks, Hydrol. Earth Syst. Sci., 29, 5871–5891, <a href="https://doi.org/10.5194/hess-29-5871-2025" target="_blank">https://doi.org/10.5194/hess-29-5871-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Chidepudi et al.(2025)Chidepudi, Massei, Jardani, Dieppois, Henriot,
and Fournier</label><mixed-citation>
      
Chidepudi, S. K. R., Massei, N., Jardani, A., Dieppois, B., Henriot, A., and Fournier, M.: Training deep learning models with a multi-station approach and static aquifer attributes for groundwater level simulation: what is the best way to leverage regionalised information?, Hydrol. Earth Syst. Sci., 29, 841–861, <a href="https://doi.org/10.5194/hess-29-841-2025" target="_blank">https://doi.org/10.5194/hess-29-841-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Chollet(2015)</label><mixed-citation>
      
Chollet, F.: Keras, GitHub [code], <a href="https://github.com/fchollet/keras" target="_blank"/> (last access: 14 April 2026), 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Chu et al.(2022)Chu, Bian, Lang, Sun, and Wang</label><mixed-citation>
      
Chu, H., Bian, J., Lang, Q., Sun, X., and Wang, Z.: Daily Groundwater Level
Prediction and Uncertainty Using LSTM Coupled with PMI and
Bootstrap Incorporating Teleconnection Patterns Information,
Sustainability, 14, 11598, <a href="https://doi.org/10.3390/su141811598" target="_blank">https://doi.org/10.3390/su141811598</a>,
2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Clark et al.(2022)Clark, Pagendam, and Ryan</label><mixed-citation>
      
Clark, S. R., Pagendam, D., and Ryan, L.: Forecasting Multiple Groundwater
Time Series with Local and Global Deep Learning Networks,
Int. J. Env. Res. Pub. He., 19, 5091,
<a href="https://doi.org/10.3390/ijerph19095091" target="_blank">https://doi.org/10.3390/ijerph19095091</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Collenteur et al.(2024)Collenteur, Haaf, Bakker, Liesch, Wunsch,
Soonthornrangsan, White, Martin, Hugman, de Sousa, Vanden Berghe, Fan,
Peterson, Bikše, Di Ciacca, Wang, Zheng, Nölscher, Koch, Schneider,
Benavides Höglund, Krishna Reddy Chidepudi, Henriot, Massei, Jardani,
Rudolph, Rouhani, Gómez-Hernández, Jomaa, Pölz, Franken, Behbooei, Lin,
and Meysami</label><mixed-citation>
      
Collenteur, R. A., Haaf, E., Bakker, M., Liesch, T., Wunsch, A., Soonthornrangsan, J., White, J., Martin, N., Hugman, R., de Sousa, E., Vanden Berghe, D., Fan, X., Peterson, T. J., Bikše, J., Di Ciacca, A., Wang, X., Zheng, Y., Nölscher, M., Koch, J., Schneider, R., Benavides Höglund, N., Krishna Reddy Chidepudi, S., Henriot, A., Massei, N., Jardani, A., Rudolph, M. G., Rouhani, A., Gómez-Hernández, J. J., Jomaa, S., Pölz, A., Franken, T., Behbooei, M., Lin, J., and Meysami, R.: Data-driven modelling of hydraulic-head time series: results and lessons learned from the 2022 Groundwater Time Series Modelling Challenge, Hydrol. Earth Syst. Sci., 28, 5193–5208, <a href="https://doi.org/10.5194/hess-28-5193-2024" target="_blank">https://doi.org/10.5194/hess-28-5193-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Gomez et al.(2024)Gomez, Nölscher, Hartmann, and
Broda</label><mixed-citation>
      
Gomez, M., Nölscher, M., Hartmann, A., and Broda, S.: Assessing groundwater level modelling using a 1-D convolutional neural network (CNN): linking model performances to geospatial and time series features, Hydrol. Earth Syst. Sci., 28, 4407–4425, <a href="https://doi.org/10.5194/hess-28-4407-2024" target="_blank">https://doi.org/10.5194/hess-28-4407-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Harris et al.(2020)Harris, Millman, van der Walt, Gommers, Virtanen,
Cournapeau, Wieser, Taylor, Berg, Smith, Kern, Picus, Hoyer, van Kerkwijk,
Brett, Haldane, del Río, Wiebe, Peterson, Gérard-Marchant, Sheppard,
Reddy, Weckesser, Abbasi, Gohlke, and Oliphant</label><mixed-citation>
      
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P.,
Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., Kern, R.,
Picus, M., Hoyer, S., van Kerkwijk, M. H., Brett, M., Haldane, A., del
Río, J. F., Wiebe, M., Peterson, P., Gérard-Marchant, P., Sheppard,
K., Reddy, T., Weckesser, W., Abbasi, H., Gohlke, C., and Oliphant, T. E.:
Array programming with NumPy, Nature, 585, 357–362,
<a href="https://doi.org/10.1038/s41586-020-2649-2" target="_blank">https://doi.org/10.1038/s41586-020-2649-2</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Hauswirth et al.(2021)Hauswirth, Bierkens, Beijk, and
Wanders</label><mixed-citation>
      
Hauswirth, S. M., Bierkens, M. F., Beijk, V., and Wanders, N.: The potential of
data driven approaches for quantifying hydrological extremes, Adv.
Water Resour., 155, 104017, <a href="https://doi.org/10.1016/j.advwatres.2021.104017" target="_blank">https://doi.org/10.1016/j.advwatres.2021.104017</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Heudorfer et al.(2024)Heudorfer, Liesch, and
Broda</label><mixed-citation>
      
Heudorfer, B., Liesch, T., and Broda, S.: On the challenges of global entity-aware deep learning models for groundwater level prediction, Hydrol. Earth Syst. Sci., 28, 525–543, <a href="https://doi.org/10.5194/hess-28-525-2024" target="_blank">https://doi.org/10.5194/hess-28-525-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Hunter(2007)</label><mixed-citation>
      
Hunter, J. D.: Matplotlib: A 2D Graphics Environment, Comput. Sci.
Eng., 9, 90–95, <a href="https://doi.org/10.1109/MCSE.2007.55" target="_blank">https://doi.org/10.1109/MCSE.2007.55</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Kratzert et al.(2019a)Kratzert, Klotz, Herrnegger,
Sampson, Hochreiter, and Nearing</label><mixed-citation>
      
Kratzert, F., Klotz, D., Herrnegger, M., Sampson, A. K., Hochreiter, S., and
Nearing, G. S.: Toward Improved Predictions in Ungauged Basins:
Exploiting the Power of Machine Learning, Water Resour. Res.,
55, 11344–11354, <a href="https://doi.org/10.1029/2019wr026065" target="_blank">https://doi.org/10.1029/2019wr026065</a>, 2019a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Kratzert et al.(2019b)Kratzert, Klotz, Shalev,
Klambauer, Hochreiter, and Nearing</label><mixed-citation>
      
Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, G.: Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089–5110, <a href="https://doi.org/10.5194/hess-23-5089-2019" target="_blank">https://doi.org/10.5194/hess-23-5089-2019</a>, 2019b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Kratzert et al.(2021)Kratzert, Gauch, Nearing, Hochreiter, and
Klotz</label><mixed-citation>
      
Kratzert, F., Gauch, M., Nearing, G., Hochreiter, S., and Klotz, D.:
Niederschlags-Abfluss-Modellierung mit Long Short-Term Memory
(LSTM), Österreichische Wasser- und Abfallwirtschaft, 73, 270–280,
<a href="https://doi.org/10.1007/s00506-021-00767-z" target="_blank">https://doi.org/10.1007/s00506-021-00767-z</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Kratzert et al.(2024)Kratzert, Gauch, Klotz, and
Nearing</label><mixed-citation>
      
Kratzert, F., Gauch, M., Klotz, D., and Nearing, G.: HESS Opinions: Never train a Long Short-Term Memory (LSTM) network on a single basin, Hydrol. Earth Syst. Sci., 28, 4187–4201, <a href="https://doi.org/10.5194/hess-28-4187-2024" target="_blank">https://doi.org/10.5194/hess-28-4187-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Kunz et al.(2025)Kunz, Schulz, Wetzel, Nölscher, Chiaburu,
Biessmann, and Broda</label><mixed-citation>
      
Kunz, S., Schulz, A., Wetzel, M., Nölscher, M., Chiaburu, T., Biessmann, F., and Broda, S.: Towards a global spatial machine learning model for seasonal groundwater level predictions in Germany, Hydrol. Earth Syst. Sci., 29, 3405–3433, <a href="https://doi.org/10.5194/hess-29-3405-2025" target="_blank">https://doi.org/10.5194/hess-29-3405-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Lees et al.(2022)Lees, Reece, Kratzert, Klotz, Gauch, De Bruijn,
Kumar Sahu, Greve, Slater, and Dadson</label><mixed-citation>
      
Lees, T., Reece, S., Kratzert, F., Klotz, D., Gauch, M., De Bruijn, J., Kumar Sahu, R., Greve, P., Slater, L., and Dadson, S. J.: Hydrological concept formation inside long short-term memory (LSTM) networks, Hydrol. Earth Syst. Sci., 26, 3079–3101, <a href="https://doi.org/10.5194/hess-26-3079-2022" target="_blank">https://doi.org/10.5194/hess-26-3079-2022</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Ma et al.(2021)Ma, Feng, Lawson, Tsai, Liang, Huang, Sharma, and
Shen</label><mixed-citation>
      
Ma, K., Feng, D., Lawson, K., Tsai, W., Liang, C., Huang, X., Sharma, A., and
Shen, C.: Transferring Hydrologic Data Across Continents –
Leveraging Data‐Rich Regions to Improve Hydrologic Prediction
in Data‐Sparse Regions, Water Resour. Res., 57,
<a href="https://doi.org/10.1029/2020wr028600" target="_blank">https://doi.org/10.1029/2020wr028600</a>,
2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Martel et al.(2025)Martel, Arsenault, Turcotte, Castañeda-Gonzalez,
Brissette, Armstrong, Mailhot, Pelletier-Dumont, Lachance-Cloutier,
Rondeau-Genesse, and Caron</label><mixed-citation>
      
Martel, J.-L., Arsenault, R., Turcotte, R., Castañeda-Gonzalez, M., Brissette, F., Armstrong, W., Mailhot, E., Pelletier-Dumont, J., Lachance-Cloutier, S., Rondeau-Genesse, G., and Caron, L.-P.: Exploring the ability of LSTM-based hydrological models to simulate streamflow time series for flood frequency analysis, Hydrol. Earth Syst. Sci., 29, 4951–4968, <a href="https://doi.org/10.5194/hess-29-4951-2025" target="_blank">https://doi.org/10.5194/hess-29-4951-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Mbouopda et al.(2022)Mbouopda, Guyet, Labroche, and
Henriot</label><mixed-citation>
      
Mbouopda, M. F., Guyet, T., Labroche, N., and Henriot, A.: Experimental study
of time series forecasting methods for groundwater level prediction, arXiv [preprint],
<a href="https://doi.org/10.48550/arXiv.2209.13927" target="_blank">https://doi.org/10.48550/arXiv.2209.13927</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>McKinney(2010)</label><mixed-citation>
      
McKinney, W.: Data Structures for Statistical Computing in Python, in:
Proceedings of the 9th Python in Science Conference, edited by: van der Walt,
S. and Millman, J., SciPy, Austin, Texas, 56–61,
<a href="https://doi.org/10.25080/Majora-92bf1922-00a" target="_blank">https://doi.org/10.25080/Majora-92bf1922-00a</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Nayak et al.(2006)Nayak, Rao, and Sudheer</label><mixed-citation>
      
Nayak, P. C., Rao, Y. R. S., and Sudheer, K. P.: Groundwater Level
Forecasting in a Shallow Aquifer Using Artificial Neural
Network Approach, Water Resour. Manag., 20, 77–90,
<a href="https://doi.org/10.1007/s11269-006-4007-z" target="_blank">https://doi.org/10.1007/s11269-006-4007-z</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Nearing et al.(2024)Nearing, Cohen, Dube, Gauch, Gilon, Harrigan,
Hassidim, Klotz, Kratzert, Metzger, Nevo, Pappenberger, Prudhomme, Shalev,
Shenzis, Tekalign, Weitzner, and Matias</label><mixed-citation>
      
Nearing, G., Cohen, D., Dube, V., Gauch, M., Gilon, O., Harrigan, S., Hassidim,
A., Klotz, D., Kratzert, F., Metzger, A., Nevo, S., Pappenberger, F.,
Prudhomme, C., Shalev, G., Shenzis, S., Tekalign, T. Y., Weitzner, D., and
Matias, Y.: Global prediction of extreme floods in ungauged watersheds,
Nature, 627, 559–563, <a href="https://doi.org/10.1038/s41586-024-07145-1" target="_blank">https://doi.org/10.1038/s41586-024-07145-1</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Nearing et al.(2021)Nearing, Kratzert, Sampson, Pelissier, Klotz,
Frame, Prieto, and Gupta</label><mixed-citation>
      
Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D.,
Frame, J. M., Prieto, C., and Gupta, H. V.: What Role Does Hydrological
Science Play in the Age of Machine Learning?, Water Resour.
Res., 57, <a href="https://doi.org/10.1029/2020wr028091" target="_blank">https://doi.org/10.1029/2020wr028091</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Ohmer(2025)</label><mixed-citation>
      
Ohmer, M.: GEMS-GER: A Machine Learning Benchmark Dataset of Long-Term Groundwater Levels in Germany with Meteorological Forcings and Site-Specific Environmental Features, Zenodo [data set], <a href="https://doi.org/10.5281/zenodo.16736908" target="_blank">https://doi.org/10.5281/zenodo.16736908</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Ohmer and Liesch(2025)</label><mixed-citation>
      
Ohmer, M. and Liesch, T.: singlewell-vs-global-gwl, Zenodo [data set],
<a href="https://doi.org/10.5281/zenodo.19453511" target="_blank">https://doi.org/10.5281/zenodo.19453511</a>, 2025.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Ohmer et al.(2026)Ohmer, Liesch, Habbel, Heudorfer, Gomez, Clos,
Nölscher, and Broda</label><mixed-citation>
      
Ohmer, M., Liesch, T., Habbel, B., Heudorfer, B., Gomez, M., Clos, P., Nölscher, M., and Broda, S.: GEMS-GER: a machine learning benchmark dataset of long-term groundwater levels in Germany with meteorological forcings and site-specific environmental features, Earth Syst. Sci. Data, 18, 77–95, <a href="https://doi.org/10.5194/essd-18-77-2026" target="_blank">https://doi.org/10.5194/essd-18-77-2026</a>, 2026.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Pedregosa et al.(2011)Pedregosa, Varoquaux, Gramfort, Michel,
Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, Vanderplas, Passos,
Cournapeau, Brucher, Perrot, and Édouard Duchesnay</label><mixed-citation>
      
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel,
O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J.,
Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Édouard Duchesnay:
Scikit-learn: Machine Learning in Python, J. Mach. Learn.
Res., 12, 2825–2830, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Tran et al.(2025)Tran, Nguyen, Kim, and Ivanov</label><mixed-citation>
      
Tran, V. N., Nguyen, T. V., Kim, J., and Ivanov, V. Y.: Technical note: Does Multiple Basin Training Strategy Guarantee Superior Machine Learning Performance for Streamflow Predictions in Gaged Basins?, EGUsphere [preprint], <a href="https://doi.org/10.5194/egusphere-2025-769" target="_blank">https://doi.org/10.5194/egusphere-2025-769</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Usman et al.(2023)Usman, Waqar, and Ng</label><mixed-citation>
      
Usman, M., Waqar, M., and Ng, C. W. W.: Groundwater level prediction using MIMO-LSTM, Proceedings of the 4th IAHR Young Professionals Congress, Online, 22–24 November 2023, ISBN 978-90-833476-5-3, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>van Rossum(1995)</label><mixed-citation>
      
van Rossum, G.: Python Programming Language, <a href="https://www.python.org/" target="_blank"/> (last access: 14 April 2026),
1995.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Virtanen et al.(2020)Virtanen, Gommers, Oliphant, Haberland, Reddy,
Cournapeau, Burovski, Peterson, Weckesser, Bright, van der Walt, Brett,
Wilson, Millman, Mayorov, Nelson, Jones, Kern, Larson, Carey, Polat, Feng,
Moore, VanderPlas, Laxalde, Perktold, Cimrman, Henriksen, Quintero, Harris,
Archibald, Ribeiro, Pedregosa, and van Mulbregt</label><mixed-citation>
      
Virtanen, P., Gommers, R., Oliphant, T. E., Haberland, M., Reddy, T.,
Cournapeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J.,
van der Walt, S. J., Brett, M., Wilson, J., Millman, K. J., Mayorov, N.,
Nelson, A. R. J., Jones, E., Kern, R., Larson, E., Carey, C. J., Polat,
İ., Feng, Y., Moore, E. W., VanderPlas, J., Laxalde, D., Perktold, J.,
Cimrman, R., Henriksen, I., Quintero, E. A., Harris, C. R., Archibald, A. M.,
Ribeiro, A. H., Pedregosa, F., and van Mulbregt, P.: SciPy 1.0: Fundamental
Algorithms for Scientific Computing in Python, Nat. Methods, 17, 261–272,
<a href="https://doi.org/10.1038/s41592-019-0686-2" target="_blank">https://doi.org/10.1038/s41592-019-0686-2</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Wunsch et al.(2021)Wunsch, Liesch, and
Broda</label><mixed-citation>
      
Wunsch, A., Liesch, T., and Broda, S.: Groundwater level forecasting with artificial neural networks: a comparison of long short-term memory (LSTM), convolutional neural networks (CNNs), and non-linear autoregressive networks with exogenous input (NARX), Hydrol. Earth Syst. Sci., 25, 1671–1687, <a href="https://doi.org/10.5194/hess-25-1671-2021" target="_blank">https://doi.org/10.5194/hess-25-1671-2021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Wunsch et al.(2022)Wunsch, Liesch, and
Broda</label><mixed-citation>
      
Wunsch, A., Liesch, T., and Broda, S.: Feature-based Groundwater Hydrograph
Clustering Using Unsupervised Self-Organizing Map-Ensembles,
Water Resour. Manag., 36, 39–54, <a href="https://doi.org/10.1007/s11269-021-03006-y" target="_blank">https://doi.org/10.1007/s11269-021-03006-y</a>,
2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Yu et al.(2024)Yu, Tolson, Shen, Han, Mai, and
Lin</label><mixed-citation>
      
Yu, Q., Tolson, B. A., Shen, H., Han, M., Mai, J., and Lin, J.: Enhancing long short-term memory (LSTM)-based streamflow prediction with a spatially distributed approach, Hydrol. Earth Syst. Sci., 28, 2107–2122, <a href="https://doi.org/10.5194/hess-28-2107-2024" target="_blank">https://doi.org/10.5194/hess-28-2107-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Zhou et al.(2024)Zhou, Zhang, Bai, Zhao, Shuai, Cui, and
Shao</label><mixed-citation>
      
Zhou, Y., Zhang, Q., Bai, G., Zhao, H., Shuai, G., Cui, Y., and Shao, J.:
Groundwater dynamics clustering and prediction based on grey relational
analysis and LSTM model: A case study in Beijing Plain, China,
Journal of Hydrology: Regional Studies, 56, 102011,
<a href="https://doi.org/10.1016/j.ejrh.2024.102011" target="_blank">https://doi.org/10.1016/j.ejrh.2024.102011</a>, 2024.

    </mixed-citation></ref-html>--></article>
