<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">HESS</journal-id><journal-title-group>
    <journal-title>Hydrology and Earth System Sciences</journal-title>
    <abbrev-journal-title abbrev-type="publisher">HESS</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Hydrol. Earth Syst. Sci.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1607-7938</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/hess-29-4251-2025</article-id><title-group><article-title>Enhancing inverse modeling in groundwater systems through machine learning: a comprehensive comparative study</article-title><alt-title>Enhancing inverse modeling in groundwater systems through machine learning</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="no" rid="aff1 aff2">
          <name><surname>Chen</surname><given-names>Junjun</given-names></name>
          
        <ext-link>https://orcid.org/0000-0001-8930-6011</ext-link></contrib>
        <contrib contrib-type="author" corresp="yes" rid="aff2 aff3">
          <name><surname>Dai</surname><given-names>Zhenxue</given-names></name>
          <email>dzx@jlu.edu.cn</email>
        </contrib>
        <contrib contrib-type="author" corresp="yes" rid="aff4">
          <name><surname>Yin</surname><given-names>Shangxian</given-names></name>
          <email>yinshx03@126.com</email>
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff5">
          <name><surname>Zhang</surname><given-names>Mingkun</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff6">
          <name><surname>Soltanian</surname><given-names>Mohamad Reza</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-5126-0668</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>National and Local Joint Engineering Laboratory of Internet Application Technology on Mine, China University of Mining and Technology, Xuzhou, 221008, China</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>College of Construction Engineering, Jilin University, Changchun, 130026, China</institution>
        </aff>
        <aff id="aff3"><label>3</label><institution>School of Environmental and Municipal Engineering, Qingdao University of Technology, Qingdao, 273400, China</institution>
        </aff>
        <aff id="aff4"><label>4</label><institution>College of Safety Engineering, North China Institute of Science and Technology, Langfang, 065201, China</institution>
        </aff>
        <aff id="aff5"><label>5</label><institution>Shandong Ruyi Technology Group Co., Ltd., Jinan, 250000, China</institution>
        </aff>
        <aff id="aff6"><label>6</label><institution>Departments of Geosciences and Environmental Engineering, University of Cincinnati, Cincinnati, OH 45220, USA</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Zhenxue Dai (dzx@jlu.edu.cn) and Shangxian Yin (yinshx03@126.com)</corresp></author-notes><pub-date><day>10</day><month>September</month><year>2025</year></pub-date>
      
      <volume>29</volume>
      <issue>17</issue>
      <fpage>4251</fpage><lpage>4279</lpage>
      <history>
        <date date-type="received"><day>12</day><month>October</month><year>2024</year></date>
           <date date-type="accepted"><day>16</day><month>June</month><year>2025</year></date>
           <date date-type="rev-recd"><day>9</day><month>June</month><year>2025</year></date>
           <date date-type="rev-request"><day>6</day><month>December</month><year>2024</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2025 Junjun Chen et al.</copyright-statement>
        <copyright-year>2025</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025.html">This article is available from https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025.html</self-uri><self-uri xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025.pdf">The full text article is available as a PDF file from https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e155">Tandem neural network architecture (TNNA) is a machine learning algorithm that has recently been proposed for estimating uncertain parameters with inverse mappings. However, its reliability has only been validated in limited research scenarios, and its advantages over conventional methods remain underexplored. This study systematically compares the performance of the TNNA algorithm to four traditional metaheuristic algorithms across three heterogeneity scenarios, each employing a specific inversion framework: (i) a surrogate model coupled with an optimization algorithm for cases with eight homogeneous parameter zones, (ii) Karhunen–Loève expansion (KLE)-based dimensionality reduction combined with a surrogate model and an optimization algorithm for a high-dimensional Gaussian random field, and (iii) generative machine-learning-based dimensionality reduction integrated with a surrogate model and an optimization algorithm for a high-dimensional non-Gaussian random field. Additionally, we evaluate algorithm performance under two different noise-level conditions (multiplicative Gaussian noise with standard deviations of 1 % and 10 %) for normalized hydraulic head and solute concentration data in the non-Gaussian random field scenario, which exhibits the most complex parameter characteristics. The results demonstrate that both the TNNA algorithm and the metaheuristic algorithms achieve inversion results that satisfy the convergence accuracy within these machine-learning-based inversion frameworks. Moreover, under the 10 % high-noise condition in the non-Gaussian random field, the inversion results remain robust when sufficient constraints are imposed. Compared to metaheuristic approaches, the TNNA method yields more reliable inversion results with significantly higher computational efficiency, highlighting the considerable advantages of machine learning in advancing groundwater system inversions.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>National Natural Science Foundation of China</funding-source>
<award-id>42402241</award-id>
<award-id>U2267217</award-id>
<award-id>42141011</award-id>
<award-id>42002254</award-id>
</award-group>
<award-group id="gs2">
<funding-source>Fundamental Research Funds for the Central Universities</funding-source>
<award-id>2024QN11066</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e167">Numerical models are essential for quantifying flow and mass transport dynamics within aquifers, providing significant insights into hydrological and biogeochemical processes (Steefel et al., 2005; Sanchez-Vila et al., 2010; Sternagel et al., 2021; Xu et al., 2022). However, directly measuring aquifer parameters, such as permeability fields, remains challenging due to limitations in the current hydrogeological exploration techniques and budgetary constraints (Yeh, 1986; Kool et al., 1987; Beven and Binley, 1992; McLaughlin and Townley, 1996; Dai and Samper, 2004; Castaings et al., 2009; Chen et al., 2021). Inverse modeling has become a key approach for estimating these uncertain model parameters, improving the accuracy of numerical simulations (Ginn and Cushman, 1990; Carrera and Glorioso, 1991; Hopmans et al., 2002; Zheng and Samper, 2004; Zhou et al., 2014; Bandai and Ghezzehei, 2022; Abbas et al., 2024; Giudici, 2024).</p>
      <p id="d2e170">Inverse modeling within Bayesian theorem-based data assimilation frameworks has garnered significant attention from the hydrogeological community over the past few decades (Scharnagl et al., 2011; Chen et al., 2013; Zhang et al., 2018; Xia et al., 2021). Methods based on the minimization of objective functions or the maximization of posterior distributions require the application of optimization techniques (Tsai et al., 2003; Blasone et al., 2007; Sun, 2013; Vrugt, 2016). One type is local optimization algorithms, which update model parameters from initial guesses towards optimal solutions according to gradient directions, such as the Gaussian–Newton method (Dragonetti et al., 2018; Qin et al., 2022) and the Levenberg–Marquardt method (Schneider-Zapp et al., 2010; Nhu, 2022). These methods are highly efficient but may converge to local optima when dealing with non-convex inversion problems. Another category is to achieve global optima solutions through metaheuristic searches, which typically incorporate processes of exploration (to search the entire parameter space for a diverse range of estimates) and exploitation (to leverage local information to refine estimates). Popular metaheuristic algorithms include the genetic algorithm (GA) (Ines and Droogers, 2002; Lindsay et al., 2016), simulated annealing (SA) (Kirkpatrick et al., 1983; Jaumann and Roth, 2018), differential evolution (DE) (Li, 2019; Yan et al., 2023), and particle swarm optimization (PSO) (Rafiei et al., 2022; Travaš et al., 2023). Nevertheless, their computational efficiency may be reduced by extensive exploration and exploitation processes in achieving globally optimal inversion results. The efficiency of optimization algorithms can be enhanced by integrating them with adjoint methods, particularly when extended to high-dimensional parameter spaces. Adjoint methods are capable of efficiently computing gradients for all parameters simultaneously through solving adjoint equations derived from the original forward model (Plessix, 2006). This gradient information can directly accelerate local optimization algorithms (Epp et al., 2023) and facilitate gradient-enhanced global optimization methods (Kapsoulis et al., 2018), significantly improving efficiency in complex inverse problems. However, the practical implementation of adjoint methods remains challenging due to the complexity associated with deriving adjoint equations, especially for highly non-linear system models (Xiao et al., 2021; Ghelichkhan et al., 2024). The accurate and efficient estimation of uncertain model parameters across various scenarios remains one of the most significant challenges for developing inversion frameworks.</p>
      <p id="d2e174">In recent years, machine learning has experienced rapid developments and demonstrated significant performance in addressing complex problems characterized by high dimensionality and non-linearity (Hinton and Salakhutdinov, 2006; LeCun et al., 2015; Bentivoglio et al., 2022; Shen et al., 2023). Integrating conventional inversion methods with cutting-edge machine learning techniques has become increasingly popular in addressing the challenges of inversion studies. One effective strategy is constructing surrogate models to accelerate forward simulations, ensuring that inversion algorithms perform comprehensive searches across the entire parameter space more efficiently (Razavi et al., 2012). For instance, Zhan et al. (2021) identified lithofacies structures by utilizing a deep octave convolution residual network to construct a surrogate model for predicting solute concentrations and hydraulic heads in heterogeneous aquifers. Wang et al. (2021) constructed a subsurface flow surrogate model under heterogeneous conditions through physically informed neural network methods, specifically for uncertainty quantification and parameter inversion. Liu et al. (2023) constructed a convolutional neural network (CNN) surrogate model to combine with a hierarchical homogenization method to estimate the effective permeability of digital rocks. More related studies can also be found in recent reviews (Yu and Ma, 2021; Luo et al., 2023; Zhan et al., 2023). Additionally, due to their inherent differentiability and continuity, deep neural network (DNN)-based surrogate models can be integrated with adjoint equations, enabling efficient gradient computations and significantly facilitating their practical implementation in high-dimensional and complex scenarios (Xiao et al., 2021).</p>
      <p id="d2e177">In addition to surrogate models, parameter optimization through machine-learning-based reverse mapping represents another significant advancement in inversion techniques. Previous studies have outlined at least two strategies to achieve reverse mapping models. The first strategy is the data-driven approach, where reverse regressions are trained using datasets that comprise pairs of model outputs and inputs. For example, Sun (2018) developed a regression model from hydraulic heads to heterogeneous conductivity fields using a CNN-based generative adversarial network (GAN) approach. Kuang et al. (2021) succeeded in the real-time identification of earthquake focal mechanisms by training a DNN regression on seismic waveform data. Yang et al. (2022) established the relationship between gravity data and <inline-formula><mml:math id="M1" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> plumes to perform real-time inversion for geologic carbon sequestration. Another strategy is to train a reverse network in tandem neural network architecture (TNNA), integrated with a pre-trained surrogate model (i.e., forward network). The TNNA method was introduced with the advent of deep learning and has been successfully applied in computed tomography reconstruction (Adler and Öktem, 2017), nanophotonic structure inverse design (Liu et al., 2018; Yeung et al., 2021), and photonic topological state inverse design (Long et al., 2019). Our previous research expanded the application of the TNNA algorithm in groundwater science, evaluating its performance in reactive transport inverse modeling and improving inversion results by incorporating an adaptive update strategy to reduce local predictive errors of surrogate models. The findings indicated that accurate surrogate model predictive results for the actual parameter values yield dependable TNNA inversion outcomes (Chen et al., 2021).</p>
      <p id="d2e192">The TNNA algorithm demonstrates a fundamental advantage by requiring only a single forward simulation to update parameters in each iteration. In contrast, conventional metaheuristic algorithms typically necessitate multiple forward simulations. Despite the innovation of this approach, its applicability in more general groundwater numerical scenarios and its performance compared to conventional metaheuristic algorithms remain uncertain. This study considers three cases with different heterogeneity characteristics to compare the performance of the TNNA algorithm to four conventional metaheuristic algorithms. In case 1, the domain is divided into a finite number of homogeneous zones. The other two cases focus on high-dimensional parameter fields based on the spatial variability of the aquifer. These two cases are essential for revealing the dynamic behaviors of the groundwater system at the discrete grid scale. Depending on the spatial variability of the aquifer structure, the two high-dimensional numerical cases characterize the heterogeneity of aquifer parameters using a Gaussian random field (i.e., case 2) and a non-Gaussian random field (i.e., case 3). The Gaussian random field is suited to aquifers with a single lithofacies and relatively uniform physical structures, where the spatial variation of the parameter values is quite smooth. In contrast, the non-Gaussian random field accounts for the existence of a nugget effect in the aquifer structure, such as when it contains multiple lithofacies with varying hydraulic properties (Mariethoz and Caers, 2014). For a comparative study of the three cases, surrogate models will be used to accelerate the forward simulation. Additionally, dimensionality reduction techniques are necessary for the two high-dimensional cases to reduce the computational complexity associated with high-dimensional parameter spaces. Specifically, the Karhunen–Loève expansion (KLE) method is feasible for Gaussian random fields. It reconstructs the Gaussian random field through a linear combination of orthogonal basis functions, achieving dimensionality reduction by retaining the dominant modes corresponding to the largest eigenvalues (Loève, 1955; Zhang and Lu, 2004; Mariethoz and Caers, 2014). However, the second-order statistics relied upon by KLE are insufficient to fully represent complex characteristics for non-Gaussian random fields. In recent years, generative machine learning methods have demonstrated outstanding performance in parameter field reconstruction (Mo et al., 2020; Zhan et al., 2021; Guo et al., 2023). These methods can establish relationships between low-dimensional standard distributions (e.g., uniform distribution) and high-dimensional distributions, effectively representing non-Gaussian random fields as low-dimensional latent vectors (i.e., parameters after dimensionality reduction). Thus, extending the TNNA framework by integrating KLE and generative machine learning methods, respectively, is a potentially feasible approach for solving the high-dimensional heterogeneous aquifer parameter inversion problems presented in case 2 and case 3. In summary, the primary contributions of this study are as follows: <list list-type="order"><list-item>
      <p id="d2e197">A novel inversion framework is proposed that integrates the TNNA algorithm with dimensionality reduction techniques, including KLE for Gaussian stochastic processes and generative machine learning methods for non-Gaussian stochastic processes, thereby extending its applicability to high-dimensional heterogeneous fields.</p></list-item><list-item>
      <p id="d2e201">A comprehensive comparative analysis is conducted between the TNNA algorithm and four conventional metaheuristic algorithms across three case scenarios, highlighting the advantages of DNN-based reverse mapping over metaheuristic stochastic search strategies for inverse estimation under different heterogeneous conditions.</p></list-item></list> The sections of this paper are structured as follows: Sect. 2 introduces the fundamental principles of the methodology involved in this study. Section 3 provides detailed information on numerical models for the three cases. Section 4 presents the results and discussions. Finally, Sect. 5 presents a summary and conclusions derived from this research, along with recommendations for future studies.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Methodology</title>
      <p id="d2e213">The inversion framework, based on non-linear optimization theory, generally consists of two aspects: (1) constructing non-linear constraints for the optimization of uncertain model parameters and (2) establishing optimization algorithms to search for the model parameters that satisfy these constraints. The general form of the non-linear optimization model in this paper is as follows:

          <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M2" display="block"><mml:mtable class="aligned" rowspacing="0.2ex" columnspacing="1em" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msup><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>∗</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mo movablelimits="false">min⁡</mml:mo><mml:msubsup><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>obs</mml:mtext></mml:msub></mml:mrow></mml:msubsup><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mtext>obs</mml:mtext></mml:msub><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:msup><mml:mo>]</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mfenced open="{" close=""><mml:mtable class="cases" rowspacing="0.2ex" columnspacing="1em" columnalign="left" framespacing="0em"><mml:mtr><mml:mtd><mml:mrow><mml:mover accent="true"><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>HF</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msup><mml:mo>≤</mml:mo><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>≤</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi mathvariant="normal">U</mml:mi></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mfenced></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

        where <inline-formula><mml:math id="M3" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mtext>obs</mml:mtext></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="bold">R</mml:mi><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>obs</mml:mtext></mml:msub><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M4" display="inline"><mml:mrow><mml:mover accent="true"><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="bold">R</mml:mi><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>obs</mml:mtext></mml:msub><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> represent the observed data vector and the corresponding model simulation output vector, respectively. <inline-formula><mml:math id="M5" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mtext>obs</mml:mtext></mml:msub><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M6" display="inline"><mml:mrow><mml:mover accent="true"><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> refer to the <inline-formula><mml:math id="M7" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>th element of the observed and simulated vectors, respectively, and <inline-formula><mml:math id="M8" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> denotes the standard deviation of the <inline-formula><mml:math id="M9" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>th observed data. <inline-formula><mml:math id="M10" display="inline"><mml:mi mathvariant="bold-italic">m</mml:mi></mml:math></inline-formula> represents the vector of model parameters to be optimized; <inline-formula><mml:math id="M11" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>∗</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> denotes the optimal parameter vector obtained through optimization; and <inline-formula><mml:math id="M12" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M13" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi mathvariant="normal">U</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> are the vectors representing the lower and upper limit values of the model parameters, respectively. <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>HF</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> represents the high-fidelity numerical model.</p>
      <p id="d2e515">In this study, three different inversion frameworks are developed to compare the TNNA algorithms to four metaheuristic algorithms. In a low-dimensional parameter scenario, a surrogate model <inline-formula><mml:math id="M15" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Forward</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is constructed to approximate high-fidelity numerical prediction outputs. Therefore, the objective function of the inversion framework integrated with a surrogate model is as follows:

          <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M16" display="block"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>∗</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mo movablelimits="false">min⁡</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>obs</mml:mtext></mml:msub></mml:mrow></mml:munderover><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mtext>obs</mml:mtext></mml:msub><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Forward</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>)</mml:mo><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:msup><mml:mo>]</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

        In high-dimensional parameter scenarios, directly optimizing the model parameter <inline-formula><mml:math id="M17" display="inline"><mml:mi mathvariant="bold-italic">m</mml:mi></mml:math></inline-formula> can lead to computational difficulties due to its high dimensionality. To mitigate this issue, in addition to constructing a surrogate model <inline-formula><mml:math id="M18" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Forward</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> to improve the computational efficiency of forward simulations, dimensionality reduction algorithms are also integrated into the inversion frameworks. Let <inline-formula><mml:math id="M19" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>=</mml:mo><mml:mi mathvariant="bold">G</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> represent an operator for parameter dimensionality reduction, where <inline-formula><mml:math id="M20" display="inline"><mml:mi mathvariant="bold-italic">z</mml:mi></mml:math></inline-formula> is a low-dimensional vector whose parameter space is commonly defined as an easily sampled probability distribution (e.g., standard Gaussian or uniform distribution). Specifically, the Karhunen–Loève expansion (KLE) and the octave convolution adversarial autoencoder (OCAAE) are used for representing Gaussian random fields and non-Gaussian random fields, respectively. Once the low-dimensional vector representation of the high-dimensional parameter is obtained, the high-dimensional parameter <inline-formula><mml:math id="M21" display="inline"><mml:mi mathvariant="bold-italic">m</mml:mi></mml:math></inline-formula> can be indirectly optimized by estimating the low-dimensional vector <inline-formula><mml:math id="M22" display="inline"><mml:mi mathvariant="bold-italic">z</mml:mi></mml:math></inline-formula>:

          <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M23" display="block"><mml:mtable class="aligned" columnspacing="1em" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msup><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>∗</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mo movablelimits="false">min⁡</mml:mo><mml:msubsup><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>obs</mml:mtext></mml:msub></mml:mrow></mml:msubsup><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mtext>obs</mml:mtext></mml:msub><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Forward</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold">G</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>)</mml:mo><mml:mo>)</mml:mo><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:msup><mml:mo>]</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msup><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>∗</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mi mathvariant="bold">G</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>∗</mml:mo></mml:msup><mml:mo>)</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

        The basic mathematical theories of surrogate models, dimensionality reduction techniques, and optimization algorithms are introduced in Sect. 2.1–2.3, respectively.</p>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Surrogate modeling methods</title>
      <p id="d2e806">In this study, surrogate models <inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Forward</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> are developed using a data-driven strategy, as shown in Fig. 1. The process begins by sampling model parameters from prior distributions. The corresponding system responses for these parameter samples are simulated using a high-fidelity numerical model. Then, a training dataset consisting of paired model parameters and responses is obtained, which is subsequently used to construct surrogate models via supervised machine learning. Specifically, four popular machine learning models with distinct architectural differences are evaluated for surrogate modeling. These are multi-output support vector regression (MSVR), a kernel-based architecture for data mapping; a fully connected deep neural network (FC-DNN), composed of stacked fully connected layers; LeNet, a classical CNN architecture proposed by Yann LeCun; and a deep residual convolutional neural network (ResNet), which incorporates residual connections into the CNN structure.</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e828">The framework for data-driven surrogate model construction and the machine learning models employed. Note that for CNN-based surrogate models, the initial processing module is activated only for low-dimensional scenarios, whereas in high-dimensional scenarios, the parameter matrix (<inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mtext>Cell</mml:mtext></mml:msub><mml:mo>×</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mtext>Cell</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>) is directly input into the CNN architecture.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f01.png"/>

        </fig>

      <p id="d2e859">The detailed principles of MSVR and the three deep-learning-based methods are illustrated in the following two subsections. The predictive accuracy of four surrogate modeling approaches will be compared in this study, and the best-performing approach among them will subsequently be selected for inversion computations. Before constructing surrogate models, the training datasets are normalized separately for each simulation component using min–max normalization, in which each component is scaled independently based on its minimum and maximum values, ensuring that all normalized values fall within the range <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
<sec id="Ch1.S2.SS1.SSS1">
  <label>2.1.1</label><title>Multi-output support vector regression</title>
      <p id="d2e886">MSVR is developed from the original support vector machine (SVM) for realizing multivariate regression (Pérez-Cruz et al., 2002; Tuia et al., 2011). The mathematical expression is given as follows:

              <disp-formula id="Ch1.E4" content-type="numbered"><label>4</label><mml:math id="M27" display="block"><mml:mrow><mml:mover accent="true"><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mtext>MSVR</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="italic">φ</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">m</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mi mathvariant="normal">T</mml:mi></mml:msup><mml:mi mathvariant="bold">W</mml:mi><mml:mo>+</mml:mo><mml:mi mathvariant="bold-italic">B</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

            where <inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mtext>MSVR</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> denotes the dataset regression model operator constructed based on MSVR and <inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:mi mathvariant="italic">φ</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is a non-linear regression function that implicitly maps the input vector <inline-formula><mml:math id="M30" display="inline"><mml:mi mathvariant="bold-italic">m</mml:mi></mml:math></inline-formula> into a high-dimensional feature space. Its inner product defines the kernel function <inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. (Here, we use the Gaussian radial basis function (RBF) kernel with a bandwidth parameter <inline-formula><mml:math id="M32" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula>: <inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:mi>K</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="italic">φ</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">m</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mi mathvariant="normal">T</mml:mi></mml:msup><mml:mi mathvariant="italic">φ</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi>exp⁡</mml:mi><mml:mo>(</mml:mo><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.5</mml:mn><mml:mo>∥</mml:mo><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msup><mml:mo>∥</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>/</mml:mo><mml:msup><mml:mi mathvariant="italic">σ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>; <inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> denotes the <inline-formula><mml:math id="M35" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>th model parameter vector from the surrogate model training dataset.) Assuming <inline-formula><mml:math id="M36" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>samples</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> denotes the number of surrogate model training samples, the regression coefficients <inline-formula><mml:math id="M37" display="inline"><mml:mrow><mml:mi mathvariant="bold">W</mml:mi><mml:mo>=</mml:mo><mml:mo>[</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>obs</mml:mtext></mml:msub></mml:mrow></mml:msup><mml:msup><mml:mo>]</mml:mo><mml:mi mathvariant="normal">T</mml:mi></mml:msup><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>obs</mml:mtext></mml:msub><mml:mo>×</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mtext>samples</mml:mtext></mml:msub></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M38" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">B</mml:mi><mml:mo>=</mml:mo><mml:mo>[</mml:mo><mml:msup><mml:mi>b</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>b</mml:mi><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>obs</mml:mtext></mml:msub></mml:mrow></mml:msup><mml:msup><mml:mo>]</mml:mo><mml:mi mathvariant="normal">T</mml:mi></mml:msup><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>obs</mml:mtext></mml:msub><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> are determined by minimizing the structural risk, as outlined in Eqs. (5) and (6):

              <disp-formula id="Ch1.E5" content-type="numbered"><label>5</label><mml:math id="M39" display="block"><mml:mtable class="aligned" rowspacing="0.2ex" columnspacing="1em" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="bold">W</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">B</mml:mi><mml:mo>=</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mtext> argmin</mml:mtext><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mi>L</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold">W</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">B</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle><mml:msubsup><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>obs</mml:mtext></mml:msub></mml:mrow></mml:msubsup><mml:mo>‖</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>j</mml:mi></mml:msup><mml:msup><mml:mo>‖</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:mi>C</mml:mi><mml:msubsup><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>samples</mml:mtext></mml:msub></mml:mrow></mml:msubsup><mml:mi>L</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

            where <inline-formula><mml:math id="M40" display="inline"><mml:mi>C</mml:mi></mml:math></inline-formula> is a penalty parameter; and <inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:mi>L</mml:mi><mml:mo>(</mml:mo><mml:mi>u</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is a quadratic <inline-formula><mml:math id="M42" display="inline"><mml:mi mathvariant="italic">ε</mml:mi></mml:math></inline-formula>-insensitive loss function, expressed as

              <disp-formula id="Ch1.E6" content-type="numbered"><label>6</label><mml:math id="M43" display="block"><mml:mrow><mml:mi>L</mml:mi><mml:mo>(</mml:mo><mml:mi>u</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mfenced open="{" close=""><mml:mtable rowspacing="0.2ex" columnspacing="1em" class="cases" columnalign="left left" framespacing="0em"><mml:mtr><mml:mtd><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mspace linebreak="nobreak" width="0.33em"/><mml:mo>,</mml:mo><mml:mspace width="1em" linebreak="nobreak"/></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>u</mml:mi><mml:mo>&lt;</mml:mo><mml:mi mathvariant="italic">ε</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mo>(</mml:mo><mml:mi>u</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="italic">ε</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mspace linebreak="nobreak" width="0.33em"/><mml:mo>,</mml:mo><mml:mspace width="1em" linebreak="nobreak"/></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>u</mml:mi><mml:mo>≥</mml:mo><mml:mi mathvariant="italic">ε</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mfenced></mml:mrow></mml:math></disp-formula>

            where <inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>‖</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">e</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>‖</mml:mo><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:msubsup><mml:mi mathvariant="bold-italic">e</mml:mi><mml:mi>i</mml:mi><mml:mi mathvariant="normal">T</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="bold-italic">e</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msqrt></mml:mrow></mml:math></inline-formula>; <inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold-italic">e</mml:mi><mml:mi>i</mml:mi><mml:mi mathvariant="normal">T</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mi>i</mml:mi><mml:mi mathvariant="normal">T</mml:mi></mml:msubsup><mml:mo>-</mml:mo><mml:msup><mml:mi mathvariant="italic">φ</mml:mi><mml:mi mathvariant="normal">T</mml:mi></mml:msup><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mi mathvariant="bold">W</mml:mi><mml:mo>-</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">B</mml:mi><mml:mi mathvariant="normal">T</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>. For <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:mi mathvariant="italic">ε</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula>, this problem is equivalent to an independent regularized kernel least squares regression for each component. For <inline-formula><mml:math id="M47" display="inline"><mml:mrow><mml:mi mathvariant="italic">ε</mml:mi><mml:mo>≠</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula>, it becomes feasible to develop individual regression functions for each dimension based on the model outputs and to generate their corresponding support vectors. Solving the optimization problem directly is challenging, and the desired solutions for <inline-formula><mml:math id="M48" display="inline"><mml:mi mathvariant="bold">W</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M49" display="inline"><mml:mi mathvariant="bold-italic">B</mml:mi></mml:math></inline-formula> are determined using an iterative reweighted least squares (IRWLS) procedure, employing the quasi-Newton approach. During the IRWLS process, the term <inline-formula><mml:math id="M50" display="inline"><mml:mrow><mml:mi>L</mml:mi><mml:mo>(</mml:mo><mml:mi>u</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> in Eq. (5) is first transformed into a discrete first-order Taylor expansion, and the corresponding quadratic programming approximation is constructed. Meanwhile, a linear expression is derived based on the principle that the first-order derivatives of the objective function with respect to <inline-formula><mml:math id="M51" display="inline"><mml:mi mathvariant="bold">W</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M52" display="inline"><mml:mi mathvariant="bold-italic">B</mml:mi></mml:math></inline-formula> are zero. Finally, the optimal values of <inline-formula><mml:math id="M53" display="inline"><mml:mi mathvariant="bold">W</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M54" display="inline"><mml:mi mathvariant="bold-italic">B</mml:mi></mml:math></inline-formula> are obtained through a line search. Further details on the IRWLS procedure can be found in Sanchez-Fernandez et al. (2004).</p>
      <p id="d2e1593">The performance of the MSVR model is influenced by three hyperparameters: <inline-formula><mml:math id="M55" display="inline"><mml:mi>C</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M56" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math id="M57" display="inline"><mml:mi mathvariant="italic">ε</mml:mi></mml:math></inline-formula> (Ma et al., 2022). This study optimizes these hyperparameters by minimizing the root mean square error (RMSE) using the four metaheuristic algorithms introduced in this study.</p>
</sec>
<sec id="Ch1.S2.SS1.SSS2">
  <label>2.1.2</label><title>Deep-learning-based surrogate models</title>
</sec>
<sec id="Ch1.S2.SS1.SSSx1" specific-use="unnumbered">
  <title>(1) DNN architectures</title>
      <p id="d2e1631">The three DNN models are all feedforward neural networks, which are generally constructed by stacking multiple hidden layers. The structure can be expressed as <inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>DNN</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>DNN</mml:mtext></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mtext>NN</mml:mtext></mml:msub></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:msub><mml:mi>f</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:msub><mml:mi>f</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>)</mml:mo><mml:mo>)</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. Specifically, <inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>DNN</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>DNN</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> represent the DNN-based surrogate model operator and the corresponding trainable parameters, respectively. <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> denotes the non-linear transformation function of the <inline-formula><mml:math id="M62" display="inline"><mml:mi>l</mml:mi></mml:math></inline-formula>th layer, and <inline-formula><mml:math id="M63" display="inline"><mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mtext>NN</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> indicates the total number of neural network layers. In DNN model construction, various neural network layers can yield diverse DNN models, resulting in different predictive performances (LeCun et al., 2015). For the DNN models adopted in this study, the involved neural network types are the fully connected layer, the convolutional layer, and the residual block layer.</p>
      <p id="d2e1762">In fully connected layers, both input and output layers are in vector forms. Assume <inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mtext>input</mml:mtext></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is the input vector and <inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mtext>output</mml:mtext></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is the output vector of the <inline-formula><mml:math id="M66" display="inline"><mml:mi>l</mml:mi></mml:math></inline-formula>th fully connected layer <inline-formula><mml:math id="M67" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. The transformation in this fully connected layer is expressed as

              <disp-formula id="Ch1.E7" content-type="numbered"><label>7</label><mml:math id="M68" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mtext>output</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mtext>input</mml:mtext></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mi>l</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>-</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">W</mml:mi><mml:mtext>DNN</mml:mtext></mml:msub><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mtext>input</mml:mtext></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">B</mml:mi><mml:mtext>DNN</mml:mtext></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

            where <inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>-</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is a non-linear active function, <inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">W</mml:mi><mml:mtext>DNN</mml:mtext></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mo>×</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is the weight matrix, and <inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">B</mml:mi><mml:mtext>DNN</mml:mtext></mml:msub><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mo>×</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is the bias vector.</p>
      <p id="d2e1974">In a convolutional layer, both the input and output are in matrix forms. A convolutional layer transfers information through sparse connections by several convolution kernels, essentially small matrices. The mathematical formula of a convolutional layer is as follows (Wang et al., 2019; Jardani et al., 2022):

              <disp-formula id="Ch1.E8" content-type="numbered"><label>8</label><mml:math id="M72" display="block"><mml:mrow><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mi>q</mml:mi></mml:msubsup><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>-</mml:mo><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mfenced open="(" close=")"><mml:mrow><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mrow><mml:msubsup><mml:mi>k</mml:mi><mml:mi>i</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:munderover><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mrow><mml:msubsup><mml:mi>k</mml:mi><mml:mi>j</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:munderover><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mi>q</mml:mi></mml:msubsup><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>+</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi><mml:mo>+</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>b</mml:mi></mml:mrow></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

            where <inline-formula><mml:math id="M73" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the pixel value at position (<inline-formula><mml:math id="M74" display="inline"><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow></mml:math></inline-formula>) of the input matrix and <inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mi>q</mml:mi></mml:msubsup><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the output feature calculated by employing the <inline-formula><mml:math id="M76" display="inline"><mml:mi>q</mml:mi></mml:math></inline-formula>th (<inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:mi>q</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mtext>out</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>) convolutional kernel filter <inline-formula><mml:math id="M78" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">w</mml:mi><mml:mi>q</mml:mi></mml:msup><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:msubsup><mml:mi>k</mml:mi><mml:mi>i</mml:mi><mml:mo>′</mml:mo></mml:msubsup><mml:mo>×</mml:mo><mml:msubsup><mml:mi>k</mml:mi><mml:mi>j</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>. In a convolutional layer with <inline-formula><mml:math id="M79" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>out</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> filters, the output matrix contains <inline-formula><mml:math id="M80" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>out</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> feature layers. The output size (<inline-formula><mml:math id="M81" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mtext>out</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>) of each convolutional layer is determined by the input size (<inline-formula><mml:math id="M82" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mtext>in</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>) and the hyperparameters (i.e., zero padding <inline-formula><mml:math id="M83" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>, kernel size <inline-formula><mml:math id="M84" display="inline"><mml:mrow><mml:msup><mml:mi>k</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula>, and stride <inline-formula><mml:math id="M85" display="inline"><mml:mi>s</mml:mi></mml:math></inline-formula>). A pooling layer is often used after a convolutional layer to remove redundant information from the extracted features and to improve the efficiency of model training (Chen et al., 2021).</p>
      <p id="d2e2289">The residual block is a fundamental component of residual networks (ResNets), designed primarily to mitigate the vanishing and exploding gradient problems commonly encountered during DNN training. A residual block learns a residual mapping, defined as

              <disp-formula id="Ch1.E9" content-type="numbered"><label>9</label><mml:math id="M86" display="block"><mml:mrow><mml:mi>R</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mtext>input</mml:mtext></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mi>R</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi>H</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mtext>input</mml:mtext></mml:msub><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:mi>T</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mtext>input</mml:mtext></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

            where <inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mi>R</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> represents the trainable parameters of a residual block, <inline-formula><mml:math id="M88" display="inline"><mml:mrow><mml:mi>R</mml:mi><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the residual function, <inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:mi>H</mml:mi><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> denotes the target mapping that the residual block aims to approximate, and <inline-formula><mml:math id="M90" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is chosen as an identity transformation (i.e., <inline-formula><mml:math id="M91" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mtext>input</mml:mtext></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mtext>input</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>) or another suitable transformation depending on the network architecture. The output of the residual block is computed as

              <disp-formula id="Ch1.E10" content-type="numbered"><label>10</label><mml:math id="M92" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mtext>output</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>-</mml:mo><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:mi>R</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mtext>input</mml:mtext></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mi>R</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:mi>T</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mtext>input</mml:mtext></mml:msub><mml:mo>)</mml:mo><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

            where <inline-formula><mml:math id="M93" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>-</mml:mo><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the activation function of the rectified linear unit (ReLU). Such a design ensures that the output of the residual block at least approximates the input, effectively addressing the vanishing gradient problem. When stacking multiple residual blocks, the relationship between the <inline-formula><mml:math id="M94" display="inline"><mml:mi>L</mml:mi></mml:math></inline-formula>th residual block in a deeper layer and the <inline-formula><mml:math id="M95" display="inline"><mml:mi>l</mml:mi></mml:math></inline-formula>th residual block is expressed as follows (He et al., 2016):

              <disp-formula id="Ch1.E11" content-type="numbered"><label>11</label><mml:math id="M96" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mtext>output</mml:mtext><mml:mo>(</mml:mo><mml:mi>L</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mtext>input</mml:mtext><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:munderover><mml:mi>R</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mtext>output</mml:mtext><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

            where <inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mtext>input</mml:mtext><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> denote the input data and trainable parameters of the <inline-formula><mml:math id="M99" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>th residual block, respectively, and <inline-formula><mml:math id="M100" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mtext>output</mml:mtext><mml:mo>(</mml:mo><mml:mi>L</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the output from the <inline-formula><mml:math id="M101" display="inline"><mml:mi>L</mml:mi></mml:math></inline-formula>th residual block. According to the chain rule in derivatives, the gradient of the loss function <inline-formula><mml:math id="M102" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">J</mml:mi><mml:mtext>Res</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> with respect to <inline-formula><mml:math id="M103" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mtext>input</mml:mtext><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> can be given by

              <disp-formula id="Ch1.E12" content-type="numbered"><label>12</label><mml:math id="M104" display="block"><mml:mtable rowspacing="0.2ex" columnspacing="1em" class="aligned" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mo>∂</mml:mo><mml:msub><mml:mi mathvariant="bold">J</mml:mi><mml:mtext>Res</mml:mtext></mml:msub></mml:mrow><mml:mrow><mml:mo>∂</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mtext>input</mml:mtext><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mo>∂</mml:mo><mml:msub><mml:mi mathvariant="bold">J</mml:mi><mml:mtext>Res</mml:mtext></mml:msub></mml:mrow><mml:mrow><mml:mo>∂</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mtext>output</mml:mtext><mml:mo>(</mml:mo><mml:mi>L</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mspace width="1em" linebreak="nobreak"/><mml:mo>×</mml:mo><mml:mfenced open="(" close=")"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>+</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mo>∂</mml:mo><mml:mrow><mml:mo>∂</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mtext>input</mml:mtext><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:msubsup><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msubsup><mml:mi>R</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mtext>output</mml:mtext><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mfenced><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

            This formulation highlights two key properties of the residual network. First, the gradient does not vanish during network training processes because the term

              <disp-formula id="Ch1.Ex1"><mml:math id="M105" display="block"><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mo>∂</mml:mo><mml:mrow><mml:mo>∂</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mtext>input</mml:mtext><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:munderover><mml:mi>F</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mtext>input</mml:mtext><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>

            is never equal to <inline-formula><mml:math id="M106" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>. Second, the gradient of the deepest residual block <inline-formula><mml:math id="M107" display="inline"><mml:mrow><mml:mo>∂</mml:mo><mml:msub><mml:mi mathvariant="bold">J</mml:mi><mml:mtext>Res</mml:mtext></mml:msub><mml:mo>/</mml:mo><mml:mo>∂</mml:mo><mml:msub><mml:mi mathvariant="bold">X</mml:mi><mml:mrow><mml:mtext>output</mml:mtext><mml:mo>(</mml:mo><mml:mi>L</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> can directly affect all preceding layers, ensuring the effective transmission of gradients throughout the network (Chang et al., 2022).</p>
      <p id="d2e2967">Based on the three unique network layer structures described above, the FC-DNN, LeNet, and ResNet models are constructed. The FC-DNN of this study is constructed using fully connected layers, and each hidden layer consists of 512 neurons. The activation function for the output layer is the Sigmoid function, which constrains outputs within the range of 0 to 1. Note that other activation functions whose outputs ranges include <inline-formula><mml:math id="M108" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> as a subset, such as the hyperbolic tangent (<inline-formula><mml:math id="M109" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> to 1) and ReLU (0 to <inline-formula><mml:math id="M110" display="inline"><mml:mrow><mml:mo>+</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula>), could also be adopted. However, we specifically selected the Sigmoid function to strictly constrain initial model outputs within the target range (0 to 1), thereby reducing the risk of occasional extreme or anomalous predictions, particularly in the early stages of training. For hidden layers, the Swish activation function is adopted due to its smooth form with non-monotonic and continuously differentiable properties, which helps to improve the DNN training procedures (Elfwing et al., 2018). The performance of the FC-DNN is sensitive to the number of hidden layers, whose optimal value is determined based on specific case studies presented in the application section. For the LeNet and ResNet models, when dealing with low-dimensional scenarios, an initial processing module consisting of a fully connected layer followed by a reshaping operation is added to convert the input vector into a fixed-size matrix. In contrast, for high-dimensional parameter scenarios, the discrete grid matrix of the parameter field is directly input into the CNN architecture (see Fig. 1b). Specifically, LeNet consists of two convolutional blocks and two fully connected layers. Each convolutional block consists of a convolutional layer followed by a max-pooling layer. The fully connected layers have 1024 and 512 neurons, respectively. ResNet consists of four stages, and two different Res blocks are adopted. The first stage includes two residual units without down-sampling, while the remaining three stages each have one residual unit with down-sampling and one residual unit without down-sampling. Activation functions in all layers are rectified linear units (ReLUs), except for the output layer, where Sigmoid activation is used. Detailed architecture information for LeNet and ResNet is provided in Figs. S1 and S2 in the Supplement, respectively.</p>
</sec>
<sec id="Ch1.S2.SS1.SSSx2" specific-use="unnumbered">
  <title>(2) DNN model training</title>
      <p id="d2e3012">The surrogate models are trained by minimizing the difference between the predicted outputs <inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>DNN</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>DNN</mml:mtext></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and the corresponding numerical model outputs <inline-formula><mml:math id="M112" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> in the training datasets (<inline-formula><mml:math id="M113" display="inline"><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mtext>samples</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>). Following prior studies (Mo et al., 2019, 2020; Chen et al., 2021), the L1 norm-based loss function is adopted and formulated as

              <disp-formula id="Ch1.E13" content-type="numbered"><label>13</label><mml:math id="M114" display="block"><mml:mrow><mml:mtable class="aligned" columnspacing="1em" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msubsup><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>DNN</mml:mtext><mml:mo>∗</mml:mo></mml:msubsup><mml:mo>=</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mtext> argmin</mml:mtext><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>samples</mml:mtext></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:msubsup><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>samples</mml:mtext></mml:msub></mml:mrow></mml:msubsup><mml:mo>|</mml:mo><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>DNN</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>DNN</mml:mtext></mml:msub><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>|</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>+</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi mathvariant="normal">d</mml:mi></mml:msub></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle><mml:msubsup><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>DNN</mml:mtext><mml:mi mathvariant="normal">T</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>DNN</mml:mtext></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:math></disp-formula>

            where <inline-formula><mml:math id="M115" display="inline"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi mathvariant="normal">d</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the weight decay to avoid overfitting, referred to as the regularization coefficient. It should be noted that the L2 norm can also be employed as a loss function in constructing surrogate model tasks. Due to its squared-error formulation, the L2 norm provides smoother gradients and more stable parameter updates near convergence compared to the L1 norm; however, this formulation also makes it more sensitive to extreme outliers. When the sampled parameters sparsely cover the parameter space, adopting the L1 norm loss function can improve the robustness of surrogate model predictions. This study implemented the DNN models using PyTorch (<uri>https://pytorch.org/</uri>, last access: 10 September 2024). The neural network weights were initialized using the default initialization method of PyTorch and optimized using the stochastic gradient descent method via the Adam algorithm. Specifically, the hyperparameter of weight decay can be set directly in the Adam optimizer without including it explicitly in the loss function.</p>
      <p id="d2e3219">When conducting DNN training, the hyperparameter selection primarily influences the update process of trainable parameters. Besides the weight decay mentioned above, the learning rate and the number of epochs are two other crucial hyperparameters directly affecting the training stability and convergence speed. A higher learning rate accelerates initial convergence but may lead to oscillations near the optimal solution, whereas a lower learning rate tends to improve final accuracy but requires more epochs to achieve convergence. In this study, we first set a relatively high number of epochs to ensure that the trainable parameters are adequately updated. Subsequently, appropriate learning rates and weight decay values for different scenarios are determined through a trial-and-error approach.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Dimensionality reduction methods</title>
<sec id="Ch1.S2.SS2.SSS1">
  <label>2.2.1</label><title>Karhunen–Loève expansion for Gaussian random field</title>
      <p id="d2e3239">Let <inline-formula><mml:math id="M116" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">Y</mml:mi><mml:mi mathvariant="normal">G</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>)</mml:mo><mml:mo>∼</mml:mo><mml:mi mathvariant="bold">N</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">μ</mml:mi><mml:mi mathvariant="normal">G</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>)</mml:mo><mml:mi mathvariant="bold">C</mml:mi><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>,</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> represent a Gaussian random field, where <inline-formula><mml:math id="M117" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">μ</mml:mi><mml:mi mathvariant="normal">G</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> denotes the mean of the random field and <inline-formula><mml:math id="M118" display="inline"><mml:mrow><mml:mi mathvariant="bold">C</mml:mi><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>,</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> represents the exponential covariance function between two arbitrary spatial points <inline-formula><mml:math id="M119" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M120" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>′</mml:mo></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo>′</mml:mo></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. The covariance function for these two spatial locations is given by

              <disp-formula id="Ch1.E14" content-type="numbered"><label>14</label><mml:math id="M121" display="block"><mml:mrow><mml:mi mathvariant="bold">C</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>′</mml:mo></mml:msup><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">G</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mi>exp⁡</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:mo>-</mml:mo><mml:msqrt><mml:mrow><mml:msup><mml:mfenced open="(" close=")"><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msubsup><mml:mi>s</mml:mi><mml:mi>x</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mfenced open="(" close=")"><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msubsup><mml:mi>s</mml:mi><mml:mi>y</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt></mml:mrow></mml:mfenced><mml:mspace linebreak="nobreak" width="0.33em"/><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

            where <inline-formula><mml:math id="M122" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">G</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> is the variance and <inline-formula><mml:math id="M123" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M124" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are the correlation lengths along the <inline-formula><mml:math id="M125" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M126" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> directions, respectively. Since the covariance matrix is symmetric and positive definite, the exponential covariance function in Eq. (14) can be decomposed into an eigenvalue-eigenfunction representation. By solving the second-kind Fredholm integral equation and performing eigenvalue decomposition, the Gaussian random field can be expressed through the Karhunen–Loève expansion (KLE) as follows:

              <disp-formula id="Ch1.E15" content-type="numbered"><label>15</label><mml:math id="M127" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="bold">Y</mml:mi><mml:mi mathvariant="normal">G</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">μ</mml:mi><mml:mi mathvariant="normal">G</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold">s</mml:mi><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi mathvariant="normal">∞</mml:mi></mml:munderover><mml:msub><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msqrt><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:msqrt><mml:msub><mml:mi mathvariant="italic">ϕ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

            where <inline-formula><mml:math id="M128" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> represents a random variable following a Gaussian distribution of <inline-formula><mml:math id="M129" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>∼</mml:mo><mml:mi>N</mml:mi><mml:mo>(</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, also known as a KL term; and <inline-formula><mml:math id="M130" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ϕ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M131" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> denote the eigenfunction and eigenvalue, respectively. For discretized numerical models, the index <inline-formula><mml:math id="M132" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> takes values from 1 to <inline-formula><mml:math id="M133" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>, representing the number of discrete grid points (e.g., in Eq. 15, <inline-formula><mml:math id="M134" display="inline"><mml:mi mathvariant="normal">∞</mml:mi></mml:math></inline-formula> is replaced by <inline-formula><mml:math id="M135" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>). Dimensionality reduction using KLE is achieved through a truncated expansion (Loève, 1955; Zhang and Lu, 2004; Mariethoz and Caers, 2014).</p>
</sec>
<sec id="Ch1.S2.SS2.SSS2">
  <label>2.2.2</label><title>Octave convolution adversarial autoencoder for non-Gaussian random field</title>
      <p id="d2e3701">The octave convolutional adversarial autoencoder (OCAAE) is a generative machine learning approach that combines the variational autoencoder (VAE) with adversarial learning, leveraging octave convolution neural networks (Zhan et al., 2021). It consists of three main components: an encoder, a decoder, and a discriminator. The encoder maps a high-dimensional parameter field <inline-formula><mml:math id="M136" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi>I</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to a low-dimensional latent vector <inline-formula><mml:math id="M137" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. The distribution of the latent vectors <inline-formula><mml:math id="M138" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mi>N</mml:mi></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, obtained by mapping the <inline-formula><mml:math id="M139" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> prior model parameter samples <inline-formula><mml:math id="M140" display="inline"><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi>N</mml:mi></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>, is denoted as <inline-formula><mml:math id="M141" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>∼</mml:mo><mml:mi mathvariant="bold-italic">q</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. Specifically, the encoder outputs two low-dimensional vectors: the mean vector <inline-formula><mml:math id="M142" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">μ</mml:mi><mml:mi mathvariant="bold-italic">z</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and the log-variance vector <inline-formula><mml:math id="M143" display="inline"><mml:mrow><mml:mi>ln⁡</mml:mi><mml:mo>(</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">σ</mml:mi><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> of the latent vector <inline-formula><mml:math id="M144" display="inline"><mml:mi mathvariant="bold-italic">z</mml:mi></mml:math></inline-formula>. Then, a vector <inline-formula><mml:math id="M145" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> is randomly drawn from a standard normal distribution <inline-formula><mml:math id="M146" display="inline"><mml:mrow><mml:mi mathvariant="bold">N</mml:mi><mml:mo>(</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="bold">I</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, and the latent vector is produced as <inline-formula><mml:math id="M147" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">μ</mml:mi><mml:mi mathvariant="bold-italic">z</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">σ</mml:mi><mml:mi mathvariant="bold-italic">z</mml:mi></mml:msub><mml:mo>×</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula>. The decoder reconstructs the high-dimensional parameter field <inline-formula><mml:math id="M148" display="inline"><mml:mover accent="true"><mml:mi>m</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover></mml:math></inline-formula> by taking the latent vector <inline-formula><mml:math id="M149" display="inline"><mml:mi mathvariant="bold-italic">z</mml:mi></mml:math></inline-formula> as input. The discriminator enforces adversarial training, ensuring that the encoded latent vector distribution <inline-formula><mml:math id="M150" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>∼</mml:mo><mml:mi mathvariant="bold-italic">q</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> approximates a prior Gaussian distribution <inline-formula><mml:math id="M151" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>∼</mml:mo><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. It receives input from the latent vectors generated by the encoder <inline-formula><mml:math id="M152" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>∼</mml:mo><mml:mi mathvariant="bold-italic">q</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> or from the prior distribution <inline-formula><mml:math id="M153" display="inline"><mml:mrow><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>∼</mml:mo><mml:mi mathvariant="bold-italic">p</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, and it discriminates which distribution the input latent vector originates from.</p>
      <p id="d2e3990">This adversarial framework enhances the generative capability and ensures smooth transitions between different field realizations. In the adversarial autoencoder method, the encoder <inline-formula><mml:math id="M154" display="inline"><mml:mrow><mml:mi mathvariant="script">G</mml:mi><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> (which also acts as the generator of the adversarial network), decoder, and discriminator <inline-formula><mml:math id="M155" display="inline"><mml:mrow><mml:mi>D</mml:mi><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> are trained jointly in two phases during each iteration: the reconstruction phase and the regularization phase.</p>
      <p id="d2e4021">In the reconstruction phase, the encoder and decoder are updated using the following loss function:

              <disp-formula id="Ch1.E16" content-type="numbered"><label>16</label><mml:math id="M156" display="block"><mml:mtable rowspacing="0.2ex" columnspacing="1em" class="aligned" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msub><mml:mi mathvariant="script">L</mml:mi><mml:mtext>ED</mml:mtext></mml:msub><mml:mo>=</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:msubsup><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:mo>‖</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:msub><mml:mo>‖</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mo>-</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mtext>adv</mml:mtext></mml:msub><mml:mfenced close=")" open="("><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:msubsup><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:mi>log⁡</mml:mi><mml:mo mathvariant="italic">{</mml:mo><mml:mi>D</mml:mi><mml:mo>[</mml:mo><mml:mi mathvariant="script">G</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>]</mml:mo><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

            where <inline-formula><mml:math id="M157" display="inline"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mtext>adv</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> is a weight balancing the reconstruction and adversarial losses (set to 0.01 in this study), <inline-formula><mml:math id="M158" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the reconstructed sample of <inline-formula><mml:math id="M159" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M160" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> is the number of training samples.</p>
      <p id="d2e4186">In the regularization phase, the discriminator is trained to distinguish real latent vectors from the prior distribution based on the following loss function:

              <disp-formula id="Ch1.E17" content-type="numbered"><label>17</label><mml:math id="M161" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="script">L</mml:mi><mml:mi>D</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mo mathvariant="italic">{</mml:mo><mml:mi>log⁡</mml:mi><mml:mo>[</mml:mo><mml:mi>D</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>]</mml:mo><mml:mo>+</mml:mo><mml:mi>log⁡</mml:mi><mml:mo>[</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi>D</mml:mi><mml:mo>]</mml:mo><mml:mo>[</mml:mo><mml:mi mathvariant="script">G</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>]</mml:mo><mml:mo mathvariant="italic">}</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

            This loss function helps the discriminator to distinguish between the latent vector <inline-formula><mml:math id="M162" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (from the true distribution <inline-formula><mml:math id="M163" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>) and the fake latent vector produced by the encoder <inline-formula><mml:math id="M164" display="inline"><mml:mrow><mml:mi mathvariant="script">G</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d2e4319">The constraint loss functions in the adversarial autoencoder framework ensure that the reconstructed high-dimensional parameter field <inline-formula><mml:math id="M165" display="inline"><mml:mover accent="true"><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover></mml:math></inline-formula> closely matches the original field <inline-formula><mml:math id="M166" display="inline"><mml:mi mathvariant="bold-italic">m</mml:mi></mml:math></inline-formula>, while also making sure that the distribution of the low-dimensional latent vector <inline-formula><mml:math id="M167" display="inline"><mml:mi mathvariant="bold-italic">z</mml:mi></mml:math></inline-formula> approximates a predefined standard normal distribution <inline-formula><mml:math id="M168" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. After finishing the training process, it is possible to sample from the low-dimensional space of <inline-formula><mml:math id="M169" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and to use the decoder to generate corresponding high-dimensional parameter fields. Then, the high-dimensional parameter field can be reconstructed by indirectly estimating the low-dimensional latent vectors (Makhzani et al., 2015; Mo et al., 2020).</p>
</sec>
</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Optimization algorithms</title>
<sec id="Ch1.S2.SS3.SSS1">
  <label>2.3.1</label><title>Metaheuristic algorithms</title>
      <p id="d2e4391">The four metaheuristic algorithms used in this paper essentially update model parameters through distinct heuristic stochastic search strategies. Specifically, particle swarm optimization (PSO) updates the model parameters <inline-formula><mml:math id="M170" display="inline"><mml:mi mathvariant="bold-italic">m</mml:mi></mml:math></inline-formula> based on the personal best position of the particles and the global best position of the swarm (Eberhart and Kennedy, 1995). Genetic algorithm (GA) encodes the initial model parameter samples using binary encoding, then iteratively updates them through crossover (combining portions of encoded solutions to generate new candidate solutions), mutation (randomly altering encoded information to introduce diversity), and selection (choosing candidate solutions based on objective function evaluations) (Holland John, 1975). Differential evolution (DE) initializes a population of real-valued parameter vectors and iteratively updates them through differential mutation (generating trial solutions based on vector differences among population members), crossover (probabilistically combining components from original and mutated vectors), and greedy selection (retaining solutions with better objective function values) (Storn and Price, 1997; Tran et al., 2022). Simulated annealing (SA) starts from a random initial solution and iteratively explores neighboring solutions, accepting them probabilistically based on the Metropolis criterion, while gradually decreasing the temperature parameter until convergence (Metropolis et al., 1953; Kirkpatrick et al., 1983).</p>
      <p id="d2e4401">A common characteristic of all the methods described above is that each iterative update of model parameters requires multiple evaluations of the objective function, and sufficient iterations are necessary to balance local exploitation and global exploration. Detailed implementation procedures and theoretical foundations of these methods are provided in the supplementary materials. The metaheuristic algorithms used in this study were implemented using the open-source Python package scikit-opt (<uri>https://scikit-opt.github.io/</uri>, last access: 10 September 2024).</p>
</sec>
<sec id="Ch1.S2.SS3.SSS2">
  <label>2.3.2</label><title>TNNA algorithm</title>
      <p id="d2e4415">The TNNA algorithm aims to obtain a reverse network <inline-formula><mml:math id="M171" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> that maps the observation vector to model parameters, as shown in Eq. (18).

              <disp-formula id="Ch1.E18" content-type="numbered"><label>18</label><mml:math id="M172" display="block"><mml:mrow><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mtext>obs</mml:mtext></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

            where <inline-formula><mml:math id="M173" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> is the trainable parameters of <inline-formula><mml:math id="M174" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>. Since <inline-formula><mml:math id="M175" display="inline"><mml:mi mathvariant="bold-italic">m</mml:mi></mml:math></inline-formula> also serves as the input to the established surrogate model <inline-formula><mml:math id="M176" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Forward</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:mo>⋅</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, by substituting the parameter <inline-formula><mml:math id="M177" display="inline"><mml:mi mathvariant="bold-italic">m</mml:mi></mml:math></inline-formula> in the inversion objective function of Eq. (2) with the expression from Eq. (18), we obtain the objective function constraint for <inline-formula><mml:math id="M178" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">θ</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> (i.e., the loss function for training <inline-formula><mml:math id="M179" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>):

              <disp-formula id="Ch1.E19" content-type="numbered"><label>19</label><mml:math id="M180" display="block"><mml:mtable class="aligned" columnspacing="1em" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msubsup><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>Reverse</mml:mtext><mml:mo>∗</mml:mo></mml:msubsup><mml:mo>=</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mtext>argmin</mml:mtext><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msubsup><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>obs</mml:mtext></mml:msub></mml:mrow></mml:msubsup><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>[</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mtext>obs</mml:mtext></mml:msub><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:mo>-</mml:mo><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Forward</mml:mtext></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mo>×</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mtext>obs</mml:mtext></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub><mml:mo>)</mml:mo><mml:mo>)</mml:mo><mml:mo>[</mml:mo><mml:mi>i</mml:mi><mml:mo>]</mml:mo><mml:msup><mml:mo>]</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

            After obtaining the optimal trainable parameters <inline-formula><mml:math id="M181" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>Reverse</mml:mtext><mml:mo>∗</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula> through backpropagation-based stochastic gradient descent within the PyTorch framework, the final inversion results for the model parameters can be computed by <inline-formula><mml:math id="M182" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo>∗</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mtext>obs</mml:mtext></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>Reverse</mml:mtext><mml:mo>∗</mml:mo></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. The required training data here are the normalized observation data. Specifically, the reverse network for this study is designed using an FC-DNN with three hidden layers, each containing 512 neurons.</p>
      <p id="d2e4713">During the reverse network training processes, each iteration of updating the trainable parameters <inline-formula><mml:math id="M183" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> involves two main steps: first, the observation vector <inline-formula><mml:math id="M184" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mtext>obs</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> is input into the reverse network <inline-formula><mml:math id="M185" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> to obtain the parameter prediction vector <inline-formula><mml:math id="M186" display="inline"><mml:mover accent="true"><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo mathvariant="normal" stretchy="false">̃</mml:mo></mml:mover></mml:math></inline-formula>. Next, the predicted parameter <inline-formula><mml:math id="M187" display="inline"><mml:mover accent="true"><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mo stretchy="false" mathvariant="normal">̃</mml:mo></mml:mover></mml:math></inline-formula> is input into the forward network <inline-formula><mml:math id="M188" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Forward</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> to generate corresponding forward prediction results. Subsequently, the trainable parameters <inline-formula><mml:math id="M189" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> of the reverse network are updated through standard DNN model training based on the error feedback from the loss function in Eq. (19). This process demonstrates that <inline-formula><mml:math id="M190" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M191" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Forward</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> are integrated through a tandem connection, which is why this method is named TNNA. Upon completing the training of <inline-formula><mml:math id="M192" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>, the final optimal parameters are predicted by inputting observation data into <inline-formula><mml:math id="M193" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>. Further details on TNNA can be found in Chen et al. (2021).</p>
      <p id="d2e4836">In the above process, each backpropagation step involves only a single forward calculation of the loss function. After establishing the computational graph, gradients of the trainable parameters <inline-formula><mml:math id="M194" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> are computed through backpropagation combined with automatic differentiation. These gradients are then used to update the trainable parameters <inline-formula><mml:math id="M195" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">θ</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>. Thus, only one forward simulation is executed during each epoch of the reverse network <inline-formula><mml:math id="M196" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">F</mml:mi><mml:mtext>Reverse</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> training procedure. This presents a marked computational advantage of TNNA compared to the four selected metaheuristic algorithms, which require numerous forward simulations for parameter updates at each iteration.</p>
</sec>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Case study</title>
      <p id="d2e4882">This study considers three synthetic cases based on previous research, covering different model sizes and hydraulic gradient combinations (Jose et al., 2004; Zhang et al., 2018; Mo et al., 2019) to evaluate the performance of the TNNA algorithm against conventional metaheuristic algorithms. Both case 1 and case 2 are approximately tens of meters in size, with a simulation time of 60 d. Their hydraulic gradients are 0.05 and 0.1, respectively. These scenarios are typically found in large sand tank experiments, aquifers with natural slopes, or in-situ experimental areas where flow conditions are enhanced through pumping wells. Case 3 simulates contaminant plume migration, with a size of approximately 1 km and a simulation time of several years (up to 30 years). It uses a hydraulic gradient of 0.00625, representing a smaller natural gradient typically found in alluvial aquifers. Regarding the differences in heterogeneity conditions among these cases, case 1 features a low-dimensional zoned permeability field scenario, case 2 involves a high-dimensional Gaussian random permeability field parameterized through the Karhunen–Loève expansion (KLE), and case 3 uses a high-dimensional non-Gaussian binary random permeability field parameterized by a decoder trained with OCAAE. The numerical models of the three cases are established using TOUGHREACT, which employs an integral finite difference method with sequential iteration procedures and adaptive time stepping to solve the flow and transport equations. In all the three cases, the relative error tolerance for the conservation equations was uniformly set to <inline-formula><mml:math id="M197" display="inline"><mml:mrow><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, ensuring that the maximum imbalance of conserved quantities within each discrete grid cell remains below 1 part in 100 000 of the total quantity in that cell. Dispersion effects are inherently incorporated through molecular diffusion and numerical dispersion induced by upstream weighting and grid discretization (Xu et al., 2011).</p>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e4901">Flow domain of the solute transport model for the low-dimensional scenario.</p></caption>
        <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f02.png"/>

      </fig>

      <p id="d2e4910">After developing numerical models for the three scenarios, we first evaluate four surrogate models in case 1, and the optimal surrogate model will be integrated into the inversion framework. Subsequently, hypothetical observation scenarios are used to systematically compare the inversion accuracy of TNNA against four metaheuristic algorithms across the three cases. The observation data (hydraulic heads and solute concentrations) for the model parameter inversion are generated by adding Gaussian noise perturbations to the numerical model simulation results. Specifically, observational noise is introduced by multiplying the min–max normalized simulated data by the random noise factor <inline-formula><mml:math id="M198" display="inline"><mml:mrow><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>∼</mml:mo><mml:mi>N</mml:mi><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:msup><mml:mi mathvariant="italic">σ</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M199" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula> represents the ratio of observational noise to the observed values. In this study, we conduct a comparative analysis of inversion performance across the three cases under a noise level of <inline-formula><mml:math id="M200" display="inline"><mml:mrow><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.01</mml:mn></mml:mrow></mml:math></inline-formula>. Additionally, our previous study (Chen et al., 2021) examined the effects of higher observational noise levels (<inline-formula><mml:math id="M201" display="inline"><mml:mrow><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.05</mml:mn></mml:mrow></mml:math></inline-formula> and 0.1) and real-world noise conditions on inversion accuracy in low-dimensional parameter scenarios. To further investigate the impact of increased observational noise on inversion performance in high-dimensional parameter scenarios, we conducted an extended analysis on case 3 – the most complex scenario – by increasing the noise level to 10 % (<inline-formula><mml:math id="M202" display="inline"><mml:mrow><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.1</mml:mn></mml:mrow></mml:math></inline-formula>). This analysis also provides insights into the stability of the TNNA algorithm when integrated with a generative machine-learning-based inversion framework for high-dimensional parameter estimation. Here, we applied the multiplicative noise to ensure that all perturbed observation values remain non-negative, which is particularly important in regions near plume boundaries where concentrations are close to zero. Generally, observation errors are assumed to be independent of the measured values, whereas the multiplicative noise model introduces value-proportional perturbations, resulting in a positive correlation between the standard deviation of observation noise and the true values. This type of error dependence may also exist in real-world studies when certain measurement techniques are used. For example, in hydraulic head monitoring, pressure transducers may exhibit drift (i.e., a persistent deviation in output not caused by actual pressure changes) due to the aging and fatigue of components such as the diaphragm or strain gauge, leading to reduced measurement accuracy (Sorensen and Butcher, 2011). A variation in hydraulic pressure can lead to different levels of drift among transducers, with those installed at higher pressure (i.e., higher hydraulic head) environments tending to experience more significant drift, which in some cases may result in elevated observation noise. For the analysis of solute concentrations in laboratory settings, when the concentrations of water samples exceed the detection range of the instrument, a common approach is to dilute these samples prior to measurement. While analytical instruments may introduce additive errors at a relatively fixed level, the rescaling process following dilution (i.e., multiplying the measured value by the dilution factor) amplifies these errors. As a result, the final measurement error becomes approximately proportional to the original solute concentration (Kabala and Skaggs, 1998). Given that the goal of this study is to evaluate the robustness of five inversion algorithms under different noise levels, both additive and multiplicative noise models are suitable for representing observational uncertainty. Prior work by Neupauer et al. (2000) demonstrated that the choice between these two noise types has minimal influence on the comparative performance of inversion methods. The details of these three cases are provided in Sects. 3.1–3.3.</p>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Case 1: low-dimensional zoned permeability field scenario</title>
      <p id="d2e4989">As shown in Fig. 2, the numerical model for the low-dimensional scenario focuses on conservative solute transport in a zoned permeability field. The model domain is a two-dimensional rectangular area measuring <inline-formula><mml:math id="M203" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mrow class="unit"><mml:mi mathvariant="normal">m</mml:mi></mml:mrow><mml:mo>×</mml:mo><mml:mn mathvariant="normal">20</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mrow class="unit"><mml:mi mathvariant="normal">m</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula>. The left and right boundaries feature Dirichlet boundary conditions, with a hydraulic head difference of 1 m. The heterogeneous permeability is divided into eight homogeneous permeability zones, denoted as <inline-formula><mml:math id="M204" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M205" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">8</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>. The prior range for these eight permeabilities is from <inline-formula><mml:math id="M206" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">12</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M207" display="inline"><mml:mrow><mml:mn mathvariant="normal">9.9</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">12</mml:mn></mml:mrow></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:math></inline-formula>. The contaminant source is located at the left boundary, with a fixed release concentration ranging from <inline-formula><mml:math id="M208" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M209" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mrow class="unit"><mml:mi mathvariant="normal">mol</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">L</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:math></inline-formula>. The simulation area is uniformly discretized into 3200 (<inline-formula><mml:math id="M210" display="inline"><mml:mrow><mml:mn mathvariant="normal">40</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">80</mml:mn></mml:mrow></mml:math></inline-formula>) cells, and the simulation time is set to 20 d.</p>
      <p id="d2e5129">According to these model conditions, there are nine model parameters to be estimated: eight permeability parameters (<inline-formula><mml:math id="M211" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M212" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">8</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>) and the source release concentration. As shown in Fig. 2, these parameters will be estimated using the observation data of hydraulic heads and solute concentrations collected from 25 locations, denoted by black stars. Additionally, observation data from another 24 locations, denoted by orange hexagons and not included in the calibration process, will be used to evaluate the prediction accuracy of the calibrated numerical model.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Case 2: high-dimensional Gaussian random permeability field scenario</title>
      <p id="d2e5162">The numerical model for the high-dimensional scenario features a domain size of <inline-formula><mml:math id="M213" display="inline"><mml:mrow><mml:mn mathvariant="normal">10</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mrow class="unit"><mml:mi mathvariant="normal">m</mml:mi></mml:mrow><mml:mo>×</mml:mo><mml:mn mathvariant="normal">10</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mrow class="unit"><mml:mi mathvariant="normal">m</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula>, with impervious upper and lower boundaries and constant head boundaries on the left (1 m) and right (0 m) sides. The domain is discretized into 4096 (<inline-formula><mml:math id="M214" display="inline"><mml:mrow><mml:mn mathvariant="normal">64</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">64</mml:mn></mml:mrow></mml:math></inline-formula>) cells. The log-permeability field follows a Gaussian distribution, and the permeability value of the <inline-formula><mml:math id="M215" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>th mesh is defined as follows:

            <disp-formula id="Ch1.E20" content-type="numbered"><label>20</label><mml:math id="M216" display="block"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msub><mml:mi>k</mml:mi><mml:mtext>ref</mml:mtext></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M217" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mtext>ref</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> is the reference permeability, set to <inline-formula><mml:math id="M218" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">13</mml:mn></mml:mrow></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:math></inline-formula>. The heterogeneity of <inline-formula><mml:math id="M219" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is controlled by the modifier <inline-formula><mml:math id="M220" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. The geostatistical parameters for this Gaussian field are <inline-formula><mml:math id="M221" display="inline"><mml:mrow><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M222" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">G</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M223" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mi>x</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="italic">λ</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2.5</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mrow class="unit"><mml:mi mathvariant="normal">m</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula>. Under this heterogeneous condition, 100 KLE terms are used to preserve more than 92.67 % of the field variance. Consequently, estimating the permeability field is equivalent to identifying these 100 KLE terms.</p>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e5345">The reference log-permeability field and locations of observation stations for five scenarios. The observation stations are represented by black stars.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f03.jpg"/>

        </fig>

      <p id="d2e5354">The observational data used for inverse modeling include hydraulic heads from a stationary flow field and solute concentrations measured every 2 d over 40 d, starting from the second day to the 40th day (day: <inline-formula><mml:math id="M224" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mi>i</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M225" display="inline"><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mn mathvariant="normal">20</mml:mn></mml:mrow></mml:math></inline-formula>). It should be noted that in high-dimensional parameter scenarios, the increased degrees of freedom typically result in greater parameter uncertainty. Insufficient observational information may fail to effectively constrain parameter estimation, resulting in potential uncertainty and equifinality (Beven and Binley, 1992; McLaughlin and Townley, 1996; Zhang et al., 2015; Cao et al., 2025). Therefore, this study includes actual permeability values at observed locations as regularization constraints to mitigate inversion errors arising from equifinality. Since identical regularization conditions are uniformly applied across all algorithms, introducing these constraints ensures the stability and robustness of the inversion outcomes without affecting the inherent performance characteristics of the five optimization algorithms compared in this study.</p>
      <p id="d2e5392">As the degrees of freedom significantly increase in high-dimensional models, the influence of observation data on inversion results becomes increasingly significant. Five scenarios with different monitoring networks are considered to comprehensively evaluate the performance of different inversion algorithms using various observations. Figure 3 displays the monitoring station locations for each scenario.</p>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Case 3: high-dimensional non-Gaussian random permeability field scenario</title>
      <p id="d2e5403">This case focuses on an estimation of a binary non-Gaussian permeability field. The numerical model features a domain size of <inline-formula><mml:math id="M226" display="inline"><mml:mrow><mml:mn mathvariant="normal">800</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mrow class="unit"><mml:mi mathvariant="normal">m</mml:mi></mml:mrow><mml:mo>×</mml:mo><mml:mn mathvariant="normal">800</mml:mn><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mrow class="unit"><mml:mi mathvariant="normal">m</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula>, with impervious upper and lower boundaries and constant head boundaries on the left (5 m) and right (0 m) sides. The domain is discretized into 6400 (<inline-formula><mml:math id="M227" display="inline"><mml:mrow><mml:mn mathvariant="normal">80</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">80</mml:mn></mml:mrow></mml:math></inline-formula>) cells. The permeability field is a channelized random field composed of two lithofacies, with permeability values of <inline-formula><mml:math id="M228" display="inline"><mml:mrow><mml:mn mathvariant="normal">1.0</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">13</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M229" display="inline"><mml:mrow><mml:mn mathvariant="normal">5.46</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">12</mml:mn></mml:mrow></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:math></inline-formula> for the two media, respectively. The reference field (Fig. 4b) is generated from a training image (Fig. 4a) using the direct sampling (DS) method proposed by Mariethoz et al. (2010). The contaminant release source is located on the entire left boundary, with a concentration of 1 <inline-formula><mml:math id="M230" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mol</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">L</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>. The observational data used for inversion are generated through numerical simulation, including steady-state hydraulic head data and solute concentration data at 12 time points (from 2 to 24 years, with 2-year intervals). This case focuses on a high-dimensional binary inverse problem aimed at identifying the lithofacies type of each discrete grid cell within the domain. Note that the permeability values of the two lithofacies in this case are fixed.</p>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e5500"><bold>(a)</bold> The training image used to generate random realizations of the permeability field. <bold>(b)</bold> The reference field of the synthetic case (white symbols indicate observation locations).</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f04.png"/>

        </fig>

      <p id="d2e5514">To achieve a low-dimensional representation of permeability fields, a training dataset comprising 2000 stochastic realizations is generated using multi-point statistics (MPS). Then, an octave convolution adversarial autoencoder (OCAAE) is developed, where the decoder network learns non-linear mapping from 100-dimensional Gaussian latent vectors to 6400-dimensional binary non-Gaussian permeability fields. Thus, the non-Gaussian permeability field is indirectly reconstructed by estimating the 100-dimensional latent vector.</p>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Results and discussion</title>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Surrogate model evaluations</title>
      <p id="d2e5533">Surrogate models were first compared using case 1, with a low-dimensional parameter. For this scenario, the input parameters for the surrogate models consist of a nine-dimensional vector, including eight permeability parameters and the contaminant source release concentration. The output consists of the simulated hydraulic heads and solute concentrations at 25 observation points. Four training datasets <inline-formula><mml:math id="M231" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">D</mml:mi><mml:mtext>train</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi mathvariant="bold">M</mml:mi><mml:mtext>train</mml:mtext></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold">Y</mml:mi><mml:mtext>train</mml:mtext></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula> with 200, 500, 1000, and 2000 samples (represented as <inline-formula><mml:math id="M232" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">D</mml:mi><mml:mtext>train-200</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M233" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">D</mml:mi><mml:mtext>train-500</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M234" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">D</mml:mi><mml:mtext>train-1000</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M235" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">D</mml:mi><mml:mtext>train-2000</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>, respectively) and a testing dataset <inline-formula><mml:math id="M236" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">D</mml:mi><mml:mtext>test</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi mathvariant="bold">M</mml:mi><mml:mtext>test</mml:mtext></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold">Y</mml:mi><mml:mtext>test</mml:mtext></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula> with 100 samples (represented as <inline-formula><mml:math id="M237" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">D</mml:mi><mml:mtext>test-100</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>) are prepared. These datasets were generated using Latin hypercube sampling (LHS) and numerical simulations. The predictive accuracy of the surrogate models was quantitatively evaluated using root mean square error (RMSE) and determination coefficient (<inline-formula><mml:math id="M238" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>) metrics (Chen et al., 2022).</p>
      <p id="d2e5661">For solute transport inverse modeling problems, it is crucial to consider the observations of both hydraulic heads and solute concentrations simultaneously. Therefore, the surrogate model within an inversion framework should have accurate predictive capabilities for hydraulic heads and solute concentrations. This study calculates the RMSE and <inline-formula><mml:math id="M239" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values separately for hydraulic heads, solute concentrations, and all model response data, resulting in the following evaluation criteria: <inline-formula><mml:math id="M240" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mtext>ALL</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M241" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mtext>ALL</mml:mtext><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> for overall data, <inline-formula><mml:math id="M242" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mi mathvariant="normal">H</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M243" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> for hydraulic heads, and <inline-formula><mml:math id="M244" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mi mathvariant="normal">C</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M245" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">C</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> for solute concentrations. Additionally, it should be noted that the above RMSE and <inline-formula><mml:math id="M246" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> metrics are computed based on the normalized hydraulic head and solute concentration data.</p>
      <p id="d2e5759">Figures 5 and 6 display the RMSE and <inline-formula><mml:math id="M247" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values of each surrogate model, and Figs. S3–S6 in the Supplement present the pairwise comparison results. The optimal values for <inline-formula><mml:math id="M248" display="inline"><mml:mi>C</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M249" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math id="M250" display="inline"><mml:mi mathvariant="italic">ε</mml:mi></mml:math></inline-formula> in the MSVR method are provided in Table S1 in the Supplement. For the FC-DNN, the optimal number of hidden layers was separately determined for each of the four datasets. The candidate range for the number was set from 1 to 7. According to the <inline-formula><mml:math id="M251" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mtext>All</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M252" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mtext>All</mml:mtext><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> values in Tables S2 and S3 in the Supplement, the optimal number of hidden layers in the FC-DNN for <inline-formula><mml:math id="M253" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">D</mml:mi><mml:mtext>train-200</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M254" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">D</mml:mi><mml:mtext>train-500</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M255" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">D</mml:mi><mml:mtext>train-1000</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M256" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">D</mml:mi><mml:mtext>train-2000</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> are two, four, three, and three, respectively. When training the FC-DNN, LeNet, and ResNet for case 1, the hyperparameters for batch size and learning rate were consistently set to 50 and <inline-formula><mml:math id="M257" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, respectively. The weight decay values for LeNet and ResNet were both set to <inline-formula><mml:math id="M258" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, while FC-DNN used a weight decay of 0. The number of training epochs was uniformly set to 500 for all three models.</p>

      <fig id="F5" specific-use="star"><label>Figure 5</label><caption><p id="d2e5903">The RMSE results of surrogate model predictions. Panels <bold>(a)</bold>–<bold>(c)</bold> show respectively the RMSE values of hydraulic heads, solute concentrations, and all model outputs.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f05.png"/>

        </fig>

      <fig id="F6" specific-use="star"><label>Figure 6</label><caption><p id="d2e5920">The <inline-formula><mml:math id="M259" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> results of surrogate model predictions. Panels <bold>(a)</bold>–<bold>(c)</bold> show respectively the <inline-formula><mml:math id="M260" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values of hydraulic heads, solute concentrations, and all model outputs.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f06.png"/>

        </fig>

      <p id="d2e5957">According to the performance criteria in Figs. 5 and 6, the prediction accuracy of each surrogate model significantly improves with an increasing number of training samples. Based on the <inline-formula><mml:math id="M261" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mtext>All</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M262" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mtext>All</mml:mtext><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> values, their performance ranks as follows: ResNet, LeNet, FC-DNN, and MSVR. The MSVR method accurately predicts hydraulic heads but performs the worst in predicting solute concentration. Training MSVR with the four prepared datasets, the <inline-formula><mml:math id="M263" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mi mathvariant="normal">H</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> values are below 0.02, and the <inline-formula><mml:math id="M264" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> values are near 1. Notably, with a training sample size of 200, the prediction accuracy of MSVR for hydraulic heads is higher than that of FC-DNN and LeNet, as indicated by their <inline-formula><mml:math id="M265" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mi mathvariant="normal">H</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M266" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> values, closely matching that of ResNet. However, when using 200 training samples, the <inline-formula><mml:math id="M267" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mi mathvariant="normal">C</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> value for MSVR exceeds 0.08, and the <inline-formula><mml:math id="M268" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">C</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> value falls below 0.85. Even with a dataset size of 2000, the enhancement in the MSVR-based surrogate model is limited, as the <inline-formula><mml:math id="M269" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mi mathvariant="normal">C</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> value remains at around 0.05, and the <inline-formula><mml:math id="M270" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">C</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> value stays below 0.95. FC-DNN demonstrates a significant advantage over MSVR in predicting solute concentration, particularly with larger training sample sizes of 1000 or 2000. However, there are still some obvious biases between some surrogate modeling results and their numerical modeling results (see Fig. S2d). When adopting CNN-based surrogate models (LeNet and ResNet), the prediction accuracy for solute concentrations significantly improves (see Figs. 5b and 6b). With training datasets of 2000 samples, LeNet and ResNet achieve RMSE values below 0.02 and <inline-formula><mml:math id="M271" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values close to 1. It is worth noting that ResNet performs well even with smaller sample sizes. For example, with 200 training samples, the <inline-formula><mml:math id="M272" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mi mathvariant="normal">C</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M273" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">C</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> values for LeNet are around 0.06 and 0.9, respectively, while these criteria values for ResNet are around 0.04 and 0.95 (see Figs. 5b and 6b). As the number of training samples increases, the advantages of ResNet become more apparent. According to Fig. S4d, when the training sample size reaches 2000, the prediction results of ResNet are closely consistent with the numerical simulation results for both hydraulic heads and solute concentrations.</p>
      <p id="d2e6117">The comparison results of the surrogate models reflect a trend of enhanced robustness attributable to advancements in machine learning methodologies. Different machine learning approaches employ distinct strategies for achieving non-linear mappings in developing surrogate models. Generally, deeper or larger models contain more trainable parameters, resulting in higher degrees of freedom to capture more robust non-linear relationships. The essence of machine learning development lies in addressing the challenge of training these complex DNNs. Current state-of-the-art machine learning techniques have demonstrated proficiency in training each of the four selected surrogate modeling methods. With sufficient training samples, a surrogate model of greater complexity exhibits enhanced capability in representing higher levels of non-linearity (LeCun et al., 2015; He et al., 2016). This also explains why, despite having a sufficient number of training samples, the improvement in prediction accuracy of the MSVR for solute concentration is limited. In CNNs, sparse connections and weight sharing in convolutional layers reduce redundant weight parameters in DNNs, enhancing the feature extraction of hidden layers. Consequently, LeNet demonstrates better performance than FC-DNN. ResNet, which employs residual blocks in conjunction with convolutional layers, effectively addresses the issues of gradient vanishing and exploding, making the successful training of deeper CNNs possible.</p>
      <p id="d2e6120">According to Chen et al. (2021), a more globally accurate surrogate model can enhance the performance of TNNA inversion results. Thus, we selected the ResNet trained with 2000 samples for the subsequent inversion procedure. In the low-dimensional scenario, its RMSE values for hydraulic head and solute concentration data are less than 0.02, with <inline-formula><mml:math id="M274" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values greater than 0.99. We further extended ResNet for the surrogate model construction of both Gaussian and non-Gaussian random field scenarios. In the two high-dimensional scenarios, the input parameters for the surrogate models are single-channel matrix data representing the heterogeneous parameter field, while the output consists of a vector formed by flattening the multi-channel matrix data, representing the simulated hydraulic heads and solute concentrations at predefined time steps within the simulation domain. The training and testing datasets for these two case scenarios consist of 2000 and 500 samples, respectively. For ResNet training in case 2 (Gaussian random field), the hyperparameters were set as follows: batch <inline-formula><mml:math id="M275" display="inline"><mml:mrow><mml:mtext>size</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula>, learning <inline-formula><mml:math id="M276" display="inline"><mml:mrow><mml:mtext>rate</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, and weight <inline-formula><mml:math id="M277" display="inline"><mml:mrow><mml:mtext>decay</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">6</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>. For case 3 (non-Gaussian random field), the corresponding values were batch <inline-formula><mml:math id="M278" display="inline"><mml:mrow><mml:mtext>size</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">50</mml:mn></mml:mrow></mml:math></inline-formula>, learning <inline-formula><mml:math id="M279" display="inline"><mml:mrow><mml:mtext>rate</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, and weight <inline-formula><mml:math id="M280" display="inline"><mml:mrow><mml:mtext>decay</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">8</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>. In both cases, the number of training epochs was also set to 500. The RMSE values for hydraulic head and solute concentration data range from approximately 0.01 to 0.03, and the <inline-formula><mml:math id="M281" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values exceed 0.99 (as shown in Table 1). This level of accuracy indicates that the surrogate model meets the predictive accuracy requirements for inversion simulations in both of the designed Gaussian and non-Gaussian random field cases.</p>

<table-wrap id="T1" specific-use="star"><label>Table 1</label><caption><p id="d2e6262">The RMSE and <inline-formula><mml:math id="M282" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values for surrogate model predictions in five designed high-dimensional scenarios.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="7">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right" colsep="1"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry rowsep="1" namest="col2" nameend="col4" align="center" colsep="1">RMSE </oasis:entry>
         <oasis:entry rowsep="1" namest="col5" nameend="col7" align="center"><inline-formula><mml:math id="M283" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M284" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mi mathvariant="normal">H</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M285" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mi mathvariant="normal">C</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M286" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mtext>All</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M287" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M288" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mi mathvariant="normal">C</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M289" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mtext>All</mml:mtext><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Gaussian scenario 1</oasis:entry>
         <oasis:entry colname="col2">0.0108</oasis:entry>
         <oasis:entry colname="col3">0.0174</oasis:entry>
         <oasis:entry colname="col4">0.0172</oasis:entry>
         <oasis:entry colname="col5">0.9990</oasis:entry>
         <oasis:entry colname="col6">0.9980</oasis:entry>
         <oasis:entry colname="col7">0.9982</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Gaussian scenario 2</oasis:entry>
         <oasis:entry colname="col2">0.0102</oasis:entry>
         <oasis:entry colname="col3">0.0138</oasis:entry>
         <oasis:entry colname="col4">0.0136</oasis:entry>
         <oasis:entry colname="col5">0.9995</oasis:entry>
         <oasis:entry colname="col6">0.9989</oasis:entry>
         <oasis:entry colname="col7">0.9990</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Gaussian scenario 3</oasis:entry>
         <oasis:entry colname="col2">0.0120</oasis:entry>
         <oasis:entry colname="col3">0.0165</oasis:entry>
         <oasis:entry colname="col4">0.0163</oasis:entry>
         <oasis:entry colname="col5">0.9991</oasis:entry>
         <oasis:entry colname="col6">0.9981</oasis:entry>
         <oasis:entry colname="col7">0.9983</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Gaussian scenario 4</oasis:entry>
         <oasis:entry colname="col2">0.0123</oasis:entry>
         <oasis:entry colname="col3">0.0161</oasis:entry>
         <oasis:entry colname="col4">0.0159</oasis:entry>
         <oasis:entry colname="col5">0.9990</oasis:entry>
         <oasis:entry colname="col6">0.9984</oasis:entry>
         <oasis:entry colname="col7">0.9985</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Gaussian scenario 5</oasis:entry>
         <oasis:entry colname="col2">0.0137</oasis:entry>
         <oasis:entry colname="col3">0.0156</oasis:entry>
         <oasis:entry colname="col4">0.0155</oasis:entry>
         <oasis:entry colname="col5">0.9989</oasis:entry>
         <oasis:entry colname="col6">0.9985</oasis:entry>
         <oasis:entry colname="col7">0.9986</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Non-Gaussian scenario</oasis:entry>
         <oasis:entry colname="col2">0.0181</oasis:entry>
         <oasis:entry colname="col3">0.0280</oasis:entry>
         <oasis:entry colname="col4">0.0273</oasis:entry>
         <oasis:entry colname="col5">0.9952</oasis:entry>
         <oasis:entry colname="col6">0.9931</oasis:entry>
         <oasis:entry colname="col7">0.9932</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Parameter inversion method comparison results</title>
<sec id="Ch1.S4.SS2.SSS1">
  <label>4.2.1</label><title>Inversion results of the low-dimensional parameter scenario</title>
      <p id="d2e6572">For the low-dimensional parameter scenario, the performance of optimization algorithms is thoroughly evaluated across 100 parameter scenarios using the Monte Carlo strategy. The observation data for these scenarios are derived from the testing dataset after adding multiplicative Gaussian random noise <inline-formula><mml:math id="M290" display="inline"><mml:mrow><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>∼</mml:mo><mml:mi>N</mml:mi><mml:mo>(</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:msup><mml:mn mathvariant="normal">0.01</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. The population sizes of GA, DE, and PSO, along with the chain length in SA, are set in four distinct scenarios: 20, 40, 60, and 80. (These population size or chain length values are represented as <inline-formula><mml:math id="M291" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> in subsequent discussions.) These settings determine the number of forward modeling calls required for each iteration, significantly influencing the convergence rate and computational efficiency of optimization procedures. Maximum iterations for these four metaheuristic algorithms are set to 200. The learning rate, epoch number, and weight decay for the TNNA algorithm are set to (<inline-formula><mml:math id="M292" display="inline"><mml:mrow><mml:mn mathvariant="normal">6</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>), 1000, and <inline-formula><mml:math id="M293" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">6</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, respectively.</p>
      <p id="d2e6647">The performance of the five optimization algorithms is evaluated according to three aspects: average convergence efficiency and accuracy in inversion procedures, predictive accuracy of calibration models for hydraulic heads and solute concentrations, and statistical analysis of the estimated errors for each model parameter. Figure 7 presents the logarithmic average convergence curves (i.e., <inline-formula><mml:math id="M294" display="inline"><mml:mrow><mml:msub><mml:mi>log⁡</mml:mi><mml:mn mathvariant="normal">10</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> of the average objective value as a function of the inversion iterations) of four metaheuristic algorithms and the TNNA algorithm throughout 100 parameter scenarios. Specifically, panels (a)–(d) represent the <inline-formula><mml:math id="M295" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> values for metaheuristic algorithms set at 20, 40, 60, and 80, respectively. These figures clearly illustrate the average convergence speed and accuracy of five optimization algorithms. Figure 8 displays the comparison between simulated and observed values across all 100 parameter scenarios for both calibration and spatial predictive evaluation. Panels (a) and (b) illustrate the comparative prediction fit at the 25 observation locations used for model calibration, whereas panels (c) and (d) display the comparative prediction fit at the 24 independent observation locations. In this figure, distinct symbols are used to represent the five optimization algorithms. It should be noted that the <inline-formula><mml:math id="M296" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> values for the four metaheuristic algorithms are uniformly set to 80 during this comparison. Figure 9 illustrates the probability density curves of the estimation errors for nine model parameters across 100 parameter scenarios, with different colors representing the five optimization algorithms.</p>

      <fig id="F7" specific-use="star"><label>Figure 7</label><caption><p id="d2e6685">Comparative convergence trends (<inline-formula><mml:math id="M297" display="inline"><mml:mrow><mml:msub><mml:mi>log⁡</mml:mi><mml:mn mathvariant="normal">10</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> of the average objective value) of five optimization algorithms across 100 parameter scenarios. Panels <bold>(a)</bold>–<bold>(d)</bold> compare the four metaheuristic algorithms and TNNA under <inline-formula><mml:math id="M298" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">20</mml:mn></mml:mrow></mml:math></inline-formula>, 40, 60, and 80, respectively; TNNA was executed only once on the same 100 parameter scenarios, and its curve is identical across all panels. Markers indicate convergence values every 10 iterations.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f07.png"/>

          </fig>

      <fig id="F8" specific-use="star"><label>Figure 8</label><caption><p id="d2e6729">Comparison of predictive accuracy for hydraulic heads and solute concentrations simulated using parameters estimated by the four metaheuristic inversion algorithms (DE, SA, GA, PSO) and the TNNA method. Panels <bold>(a)</bold> and <bold>(b)</bold> show predictive comparisons at the 25 observation locations used for model calibration and panels <bold>(c)</bold> and <bold>(d)</bold> show predictive comparisons at the other 24 independent observation locations.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f08.png"/>

          </fig>

      <fig id="F9" specific-use="star"><label>Figure 9</label><caption><p id="d2e6752">Probability density curves of estimation errors for nine model parameters using five optimization methods. Each curve represents the distribution of estimation errors across 100 parameter scenarios, with their mean error values indicated in the legends.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f09.png"/>

          </fig>

      <p id="d2e6761">The results in Fig. 7 demonstrate that the TNNA algorithm achieves the best convergence accuracy, with its convergence logarithmic objective function value (approximately <inline-formula><mml:math id="M299" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4.4</mml:mn></mml:mrow></mml:math></inline-formula>) being smaller than those of the other four metaheuristic algorithms across these <inline-formula><mml:math id="M300" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> settings. The influence of <inline-formula><mml:math id="M301" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> on the convergence speeds of these four metaheuristic algorithms is not significant, exhibiting a distinct transition from rapid to slower convergence around the 75th iteration. As <inline-formula><mml:math id="M302" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> increased from 20 to 80, each metaheuristic algorithm showed distinct improvements in the accuracy of the final objective function. The DE algorithm showed the least improvement in final convergence accuracy as the <inline-formula><mml:math id="M303" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> value increased from 20 to 80, with the logarithmic value of its objective function dropping from just above <inline-formula><mml:math id="M304" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4.0</mml:mn></mml:mrow></mml:math></inline-formula> to slightly below <inline-formula><mml:math id="M305" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4.0</mml:mn></mml:mrow></mml:math></inline-formula>. The SA algorithm also showed limited improvement, with its logarithmic average convergence value increasing from around <inline-formula><mml:math id="M306" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4.1</mml:mn></mml:mrow></mml:math></inline-formula> at <inline-formula><mml:math id="M307" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">20</mml:mn></mml:mrow></mml:math></inline-formula> to slightly below <inline-formula><mml:math id="M308" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4.3</mml:mn></mml:mrow></mml:math></inline-formula> at <inline-formula><mml:math id="M309" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">80</mml:mn></mml:mrow></mml:math></inline-formula>, close to that of the TNNA algorithm. Among the four metaheuristic algorithms, SA exhibited the highest average convergence accuracy. Contrary to the SA and DE algorithms, the PSO and GA algorithms significantly enhanced average convergence accuracy as <inline-formula><mml:math id="M310" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> increased. Specifically, as <inline-formula><mml:math id="M311" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> increased from 20 to 80, the logarithmic convergence values of PSO and GA decreased by more than 0.5. While increasing <inline-formula><mml:math id="M312" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> values may help metaheuristic algorithms to reduce the gap in average convergence accuracy compared to the TNNA algorithm, larger <inline-formula><mml:math id="M313" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> settings also require additional computational burdens. The above results indicate that the TNNA algorithm has a significant efficiency advantage over the four metaheuristic algorithms in parameter optimization. For instance, when conducting the optimization procedure based on scikit-opt, the DE algorithm requires 32 000 forward model realizations (<inline-formula><mml:math id="M314" display="inline"><mml:mrow><mml:mn mathvariant="normal">80</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">200</mml:mn></mml:mrow></mml:math></inline-formula>) when <inline-formula><mml:math id="M315" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> is set to 80, while the other three metaheuristic algorithms (PSO, GA, and SA) each require 16 000 realizations (<inline-formula><mml:math id="M316" display="inline"><mml:mrow><mml:mn mathvariant="normal">80</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">200</mml:mn></mml:mrow></mml:math></inline-formula>). In significant contrast, the TNNA algorithm requires only one forward model realization per iteration, resulting in 200 realizations. These comparisons illustrated that the TNNA method is more effective than the other four metaheuristic algorithms in achieving robust convergence results. It is worth noting that the five optimization algorithms rely on stochastic processes for parameter updates. Therefore, the objective function values are not guaranteed to decrease monotonically with each iteration. According to Fig. 7, the DE algorithm exhibits more noticeable fluctuations compared to other algorithms. Nevertheless, these fluctuations remain within a reasonable range. For example, at <inline-formula><mml:math id="M317" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">80</mml:mn></mml:mrow></mml:math></inline-formula>, the objective function values after 150 iterations range between <inline-formula><mml:math id="M318" display="inline"><mml:mrow><mml:mn mathvariant="normal">9.05</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M319" display="inline"><mml:mrow><mml:mn mathvariant="normal">1.32</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> (corresponding to the logarithmic values between <inline-formula><mml:math id="M320" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4.04</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M321" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3.88</mml:mn></mml:mrow></mml:math></inline-formula> in Fig. 7d). Fluctuations between consecutive iterations typically remain within <inline-formula><mml:math id="M322" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> (mostly around <inline-formula><mml:math id="M323" display="inline"><mml:mrow><mml:mn mathvariant="normal">3</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">6</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>), which is considered reasonable for optimization algorithms.</p>
      <p id="d2e7082">The results presented in Fig. 8 indicate that, among the five optimization algorithms, the TNNA algorithm achieves the smallest RMSE and <inline-formula><mml:math id="M324" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values closest to 1.0 for both hydraulic heads and solute concentration during model calibration and spatial predictive evaluation. Furthermore, the distribution of comparison points demonstrates that the modeling results obtained from both calibration and independent prediction using the TNNA algorithm match the observed values more accurately than those of the other four metaheuristic algorithms, particularly for solute concentrations. Among the four metaheuristic algorithms, SA and DE outperform GA and PSO regarding RMSE and <inline-formula><mml:math id="M325" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values. During model calibration and predictive evaluation, PSO exhibits the worst predictive accuracy, recording the highest RMSE and <inline-formula><mml:math id="M326" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values for both hydraulic heads and solute concentrations. It is noteworthy that the RMSE and <inline-formula><mml:math id="M327" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values for SA during hydraulic head calibration are 0.0085 and 0.9992, respectively, while those for DE during solute concentration calibration are 0.0112 and 0.9969. These values are almost equal to those of the TNNA algorithm. The robustness of an inversion algorithm is determined by its accuracy in both calibration and predictive evaluation for hydraulic heads and solute concentrations. However, DE and SA demonstrate appropriate calibration accuracy for only one of the two simulation components. Overall, the TNNA algorithm provides more robust model calibration and predictive evaluation results than the other four metaheuristic algorithms.</p>
      <p id="d2e7129">Figure 9 indicates that the estimated error distributions for the nine model parameters derived from the TNNA algorithm are more concentrated than those obtained from the four metaheuristic algorithms. The mean estimated error values for the nine numerical model parameters using the TNNA algorithm are also the lowest. These results highlight the high accuracy and reliability of the TNNA inversion algorithm. Among the four metaheuristic algorithms, DE and SA outperform GA and PSO. This is because the probability density curves of estimation errors for the nine parameters using DE and SA are more concentrated around zero, with mean values lower than those of GA and PSO. The DE algorithm shows a more concentrated distribution of around zero for the overall estimation errors of parameters <inline-formula><mml:math id="M328" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M329" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">8</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>. In contrast, the SA reveals reduced estimation errors for the <inline-formula><mml:math id="M330" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">C</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> parameter in most cases, ranking just behind the TNNA algorithm. GA outperforms PSO in estimation accuracy for seven of the nine model parameters, with PSO matching its probability density curves to that of GA only for parameters <inline-formula><mml:math id="M331" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M332" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>. As a whole, the statistical results of the estimated model parameter errors illustrate that the machine-learning-based TNNA algorithm exhibits enhanced inversion performance compared to the four metaheuristic optimization algorithms. However, the findings also reveal that none of the five algorithms consistently offers completely reliable inversion solutions across all scenarios. For example, the TNNA algorithm, despite its generally better performance, demonstrates estimation errors as high as 0.4 for parameters <inline-formula><mml:math id="M333" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">4</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M334" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">6</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> in some scenarios. Such results are likely because the provided observational data cannot ensure equifinality in some scenarios. In these cases, it is essential to introduce additional regularization constraints to attenuate the equifinality (Wang and Chen, 2013; Arsenault and Brissette, 2014). These findings emphasize the importance of employing the Monte Carlo method in comparative studies of inversion algorithms to ensure comprehensive evaluations and to avoid misleading conclusions.</p>
      <p id="d2e7211">The above comparison results indicate that the machine-learning-based TNNA algorithm outperforms the other four metaheuristic algorithms in both inversion accuracy and computational efficiency. The primary advantage of the TNNA algorithm over the four metaheuristic algorithms is its well-defined updating direction of model parameters, guided by the loss function, which serves as the objective function for inverse modeling. Research on machine learning applications indicates that DNNs can approximate continuous functions by adjusting weights and biases (LeCun et al., 2015; Goodfellow et al., 2016). The TNNA algorithm leverages this capability by transforming the model parameter inversion issue into the training of a reverse network to achieve reverse mappings. By establishing a loss function based on inversion constraints from the Bayesian theorem, the TNNA algorithm ensures that training the reverse network brings each parameter update closer to the optimal solution during each epoch, thereby improving accuracy and convergence speed. In contrast, the four metaheuristic algorithms require numerous forward simulations for each parameter update. The optimization direction for model parameters is determined by evaluating the objective function. This process is governed by the exploration and exploitation strategies inherent in metaheuristic algorithms. However, these approaches introduce randomness in the direction of model parameter updates, making it challenging to ensure that updates move towards the direction of fastest convergence under specific hyperparameter settings. This also explains why the TNNA algorithm can update model parameters more efficiently and achieve higher convergence accuracy despite requiring only one forward realization in each training epoch.</p>
</sec>
<sec id="Ch1.S4.SS2.SSS2">
  <label>4.2.2</label><title>Inversion results of the high-dimensional Gaussian scenario</title>
      <p id="d2e7222">For estimating the permeability field under five designed observational scenarios, the iteration number for the four metaheuristic algorithms was set at 200, with <inline-formula><mml:math id="M335" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> values of 100, 500, and 1000. The learning rate and weight decay for training reverse networks within the TNNA framework were set to <inline-formula><mml:math id="M336" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M337" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, respectively.</p>
      <p id="d2e7272">Figures 10 and 11 illustrate the log-permeability field estimation results and error distributions for the four metaheuristic algorithms and the TNNA algorithm under the most densely observed scenario (i.e., scenario 5). The corresponding results for scenarios 1–4 are presented in Figs. S7–S14 in the Supplement. Figure 12 compares the RMSE values for the log-permeability fields estimated by the four metaheuristic algorithms and the TNNA algorithm across all five scenarios. These detailed RMSE values can be found in Table 2 (scenario 5) and Table S4 in the Supplement (scenarios 1–4). For scenario 5, the accuracy of permeability estimations by each metaheuristic algorithm improves as the <inline-formula><mml:math id="M338" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> value increases (see Fig. 10 and Table 2). Notably, the GA achieves the best results with an <inline-formula><mml:math id="M339" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> of 1000, recording an RMSE of 0.1057. The DE and SA algorithms yield their most accurate permeability estimations with RMSE values of 0.1597 (<inline-formula><mml:math id="M340" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula>) and 0.1549 (<inline-formula><mml:math id="M341" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1000</mml:mn></mml:mrow></mml:math></inline-formula>), respectively. The PSO method is the least effective, achieving an RMSE of 0.3334 at <inline-formula><mml:math id="M342" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1000</mml:mn></mml:mrow></mml:math></inline-formula>. As shown in Fig. 11 and Table 2, the TNNA algorithm provides inversion results with an RMSE of 0.1063 after training the reverse network for 200 epochs. This suggests that the TNNA algorithm can estimate high-dimensional permeability fields with accuracy comparable to that of the GA method (<inline-formula><mml:math id="M343" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1000</mml:mn></mml:mrow></mml:math></inline-formula>), with significantly fewer forward model realizations (200 compared to 200 000), reducing the computational burden by 99.9 % and improving inversion efficiency by a factor of 1000. Increasing the training epochs of the reverse network to 1000 further reduces the RMSE of the TNNA method to 0.0595, demonstrating its advantages over the four metaheuristic algorithms in this scenario. Across all scenarios, the accuracy of the estimated permeability fields correlates positively with the density of observation wells, and estimation errors are generally higher in areas not covered by monitoring wells (see Figs. S7–S14). Figure 12 further demonstrates that the RMSE values for permeability estimation using the TNNA algorithm are consistently lower than those of the four metaheuristic algorithms across scenarios 1–4, indicating that the TNNA algorithm exhibits greater robustness compared to the metaheuristic algorithms in all five scenarios.</p>

      <fig id="F10" specific-use="star"><label>Figure 10</label><caption><p id="d2e7360">Spatial distribution of log-permeability field estimation results (row 1, 3, and 5 for <inline-formula><mml:math id="M344" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula>, 500, and 1000, respectively) and absolute errors (row 2, 4, and 6 for <inline-formula><mml:math id="M345" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula>, 500, and 1000, respectively) for scenario 5, achieved by four metaheuristic algorithms (panels <bold>a–d</bold> correspond to GA, DE, PSO, and SA, respectively).</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f10.jpg"/>

          </fig>

      <fig id="F11" specific-use="star"><label>Figure 11</label><caption><p id="d2e7405">Spatial distribution of log-permeability field estimation results and absolute errors for scenario 5, achieved by TNNA. Panels <bold>(a)</bold> and <bold>(c)</bold> show the log-permeability fields estimated using 1000 (TNNA-1000) and 200 (TNNA-200) training samples, respectively; panels <bold>(b)</bold> and <bold>(d)</bold> present the corresponding absolute error distributions.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f11.jpg"/>

          </fig>

<table-wrap id="T2" specific-use="star"><label>Table 2</label><caption><p id="d2e7429">RMSE values of estimated log-permeability fields for the four metaheuristic algorithms and the TNNA algorithm under scenario 5.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="7">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right" colsep="1"/>
     <oasis:colspec colnum="6" colname="col6" align="left"/>
     <oasis:colspec colnum="7" colname="col7" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry namest="col1" nameend="col5" align="center" colsep="1">Metaheuristic algorithms </oasis:entry>
         <oasis:entry namest="col6" nameend="col7" align="center">TNNA </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">GA</oasis:entry>
         <oasis:entry colname="col3">DE</oasis:entry>
         <oasis:entry colname="col4">PSO</oasis:entry>
         <oasis:entry colname="col5">SA</oasis:entry>
         <oasis:entry colname="col6"/>
         <oasis:entry colname="col7"/>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M346" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">100</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">0.1940</oasis:entry>
         <oasis:entry colname="col3">0.1597</oasis:entry>
         <oasis:entry colname="col4">0.5399</oasis:entry>
         <oasis:entry colname="col5">0.2071</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M347" display="inline"><mml:mrow><mml:mtext>epoch</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">200</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col7">0.1063</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M348" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">500</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">0.1142</oasis:entry>
         <oasis:entry colname="col3">0.1904</oasis:entry>
         <oasis:entry colname="col4">0.3810</oasis:entry>
         <oasis:entry colname="col5">0.1781</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M349" display="inline"><mml:mrow><mml:mtext>epoch</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1000</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col7">0.0595</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M350" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1000</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">0.1057</oasis:entry>
         <oasis:entry colname="col3">0.1748</oasis:entry>
         <oasis:entry colname="col4">0.3334</oasis:entry>
         <oasis:entry colname="col5">0.1549</oasis:entry>
         <oasis:entry colname="col6"/>
         <oasis:entry colname="col7"/>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <fig id="F12" specific-use="star"><label>Figure 12</label><caption><p id="d2e7623">Comparison of RMSE in estimating log-permeability fields using four metaheuristic algorithms and the TNNA algorithm across the five scenarios (S-1 to S-5).</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f12.png"/>

          </fig>

      <p id="d2e7632">To evaluate the predictive performance of the numerical model calibrated by various inversion methods, simulations of hydraulic heads and solute concentrations were conducted over 60 d, starting on the second day with recordings every 2 d, using the permeability fields with the lowest RMSE values identified by each inversion method. Observation data from the second day to the 40th day were used for model calibration, while additional data from the 42nd to the 60th day were employed to evaluate the future predictions of the calibrated numerical models. The RMSE values for the calibrated hydraulic heads and time series solute concentrations are presented in Table 3 and Fig. 13. Figure 14 displays the spatial distribution of the calibrated numerical simulation results and errors for hydraulic heads and solute concentration simulation results at three specific times (<inline-formula><mml:math id="M351" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mtext>4th</mml:mtext></mml:mrow></mml:math></inline-formula>, 20th, and 52nd days). Results for the entire 60 d period are presented in Figs. S15–S44 in the Supplement. Note that in Fig. 14, <inline-formula><mml:math id="M352" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi mathvariant="bold">H</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M353" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi mathvariant="bold">C</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> represent the simulated spatial distributions of hydraulic heads and solute concentrations based on the estimated permeability fields through inverse modeling, while <inline-formula><mml:math id="M354" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M355" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> represent the spatial distributions simulated using the true permeability field.</p>

<table-wrap id="T3" specific-use="star"><label>Table 3</label><caption><p id="d2e7701">RMSE values of calibrated hydraulic heads for the four metaheuristic algorithms and the TNNA algorithm.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="6">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">TNNA</oasis:entry>
         <oasis:entry colname="col3">DE</oasis:entry>
         <oasis:entry colname="col4">GA</oasis:entry>
         <oasis:entry colname="col5">PSO</oasis:entry>
         <oasis:entry colname="col6">SA</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">RMSE</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M356" display="inline"><mml:mrow><mml:mn mathvariant="normal">6.8537</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M357" display="inline"><mml:mrow><mml:mn mathvariant="normal">1.2181</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M358" display="inline"><mml:mrow><mml:mn mathvariant="normal">7.4837</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M359" display="inline"><mml:mrow><mml:mn mathvariant="normal">2.1683</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M360" display="inline"><mml:mrow><mml:mn mathvariant="normal">1.0316</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e7851">According to Fig. 14a, the calibrated simulation errors for hydraulic heads did not exceed 0.02 m for the TNNA method and three of the four considered metaheuristic algorithms, except the PSO method, which exhibited hydraulic head errors larger than 0.06 m in certain areas. Among the four metaheuristic algorithms, the GA method achieved the lowest RMSE in hydraulic head simulations, with a value of <inline-formula><mml:math id="M361" display="inline"><mml:mrow><mml:mn mathvariant="normal">7.4837</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>. For solute concentrations, the GA algorithm consistently has the highest prediction accuracy among the metaheuristic algorithms, with RMSE values generally around 0.005 (Fig. 13). The TNNA algorithm achieved a similar level of accuracy to GA in the calibrated numerical model predictions. Specifically, during the initial 10 d and from the 41st day to the 60th day, the TNNA algorithm showed slightly higher prediction accuracy than the GA-calibrated model. However, during the intermediate period from the 10th day to the 40th day, the GA-calibrated model had a slight advantage over the TNNA algorithm. The normalized absolute errors in the solute transport simulation results obtained using the TNNA algorithm remained consistently below 0.02 throughout the simulation period (Fig. 14b and c). These results indicate that in high-dimensional settings, the TNNA algorithm provides inversion outcomes that enable the calibrated model to deliver simulation results comparable to those of the best-performing metaheuristic algorithm. Overall, the TNNA method also demonstrates advantages over the four metaheuristic optimization algorithms in the designed high-dimensional scenarios, excelling in both inversion efficiency and accuracy.</p>

      <fig id="F13"><label>Figure 13</label><caption><p id="d2e7874">RMSE values of calibrated solute concentrations over 60 d for the four metaheuristic algorithms and the TNNA algorithm.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f13.png"/>

          </fig>

      <fig id="F14" specific-use="star"><label>Figure 14</label><caption><p id="d2e7885">Spatial distribution of calibrated numerical simulation results and absolute errors for hydraulic heads and solute concentrations at three dynamic times (<inline-formula><mml:math id="M362" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>, 20, and 50 d) using the TNNA algorithm and four metaheuristic algorithms.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f14.jpg"/>

          </fig>

</sec>
<sec id="Ch1.S4.SS2.SSS3">
  <label>4.2.3</label><title>Inversion results of the high-dimensional non-Gaussian scenario</title>
      <p id="d2e7914">In this scenario, the iteration number for the four metaheuristic algorithms was set at 200, with <inline-formula><mml:math id="M363" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mtext>PC</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> values of 1000. For the TNNA method, the reverse network is trained for 1000 epochs. Thus, each metaheuristic algorithm had 100 times more forward model evaluations than the TNNA algorithm. Figures 14 and 15 show the permeability fields estimated by the five optimization algorithms and their error distributions compared to the true field (i.e., the error fields). Figures 16a and 17a present the comparison between calibrated simulations and hydraulic head observations, as well as solute concentration observations. Figures 16b and 17b compare the solute concentration simulations for the 26th, 28th, and 30th years based on the estimated parameter field and the designed true field.</p>

      <fig id="F15" specific-use="star"><label>Figure 15</label><caption><p id="d2e7930">Reconstructed non-Gaussian binary channelized fields and their error distributions (1 % observation noise).</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f15.png"/>

          </fig>

      <fig id="F16" specific-use="star"><label>Figure 16</label><caption><p id="d2e7941">Reconstructed non-Gaussian binary channelized fields and their error distributions (10 % observation noise).</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f16.png"/>

          </fig>

      <fig id="F17" specific-use="star"><label>Figure 17</label><caption><p id="d2e7953">Pair-wise comparison between the calibrated simulation results and the observational data <bold>(a)</bold>; and the true parameter-based predictions (1 % observation noise).</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f17.png"/>

          </fig>

      <p id="d2e7965">According to Figs. 15 and 16, the binary channel fields reconstructed by each inversion algorithm are highly consistent with their corresponding true fields, with the estimated errors primarily concentrated at the interfaces between high-permeability channels and low-permeability regions. It is found that increasing the observation noise level from 1 % to 10 % does not lead to a noticeable increase in the number of grid cells exhibiting differences between the estimated parameter fields and the true field. One potential reason for this is that the least squares objective function used in the inversion framework of this study is based on the assumption that the observation noise follows a zero-mean Gaussian distribution. With adequate regularization constraints, such as the dense monitoring network design used in this study, the model responses corresponding to the optimal parameter estimates obtained through global optimization algorithms statistically converge to the mean of the observed data. It can also be evaluated by the calibration simulations. Specifically, the pairwise scatter plots in Figs. 17a and 18a indicate that the calibrated simulation results from different methods are closely distributed around the reference diagonal. This suggests that even with increased observational noise, the inversion-derived calibration results do not exhibit noticeable bias. Furthermore, the predictions based on inversion results remain highly consistent with those of the true permeability field (Figs. 17b and 18b). The <inline-formula><mml:math id="M364" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mtext>All</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M365" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mtext>All</mml:mtext><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> values for the predictions beyond the observational period range from 0.018 to 0.044 and from 0.962 to 0.994, respectively. This indicates that even under relatively high Gaussian noise conditions, the non-linear inversion framework used in this study can reliably reconstruct the non-Gaussian permeability field, ensuring high predictive accuracy. Nevertheless, it is important to note that while the inversion accuracy at a 10 % noise level remains comparable to that in the 1 % noise scenario, increasing the observational noise inevitably raises the convergence value of the least squares loss function. This trend is evident from the RMSE values in Figs. 17a and 18a. Moreover, since the observational noise here is assumed to follow a Gaussian distribution, real-world scenarios with more complex noise characteristics may further exacerbate equifinality in the inversion results. In such cases, incorporating additional system information such as regularization constraints is essential to enhance the robustness of the objective function and to mitigate ill-posedness.</p>
      <p id="d2e7992">Compared to the four metaheuristic algorithms, TNNA demonstrates advantages in computational efficiency and accuracy for non-Gaussian random field inversion. In the low-noise scenario, TNNA achieves an inversion convergence accuracy with an <inline-formula><mml:math id="M366" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mtext>All</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> of 0.021 and an <inline-formula><mml:math id="M367" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mtext>All</mml:mtext><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> of 0.996 (Fig. 17a). In contrast, the two best-performing metaheuristic methods, GA and SA, yield <inline-formula><mml:math id="M368" display="inline"><mml:mrow><mml:msub><mml:mtext>RMSE</mml:mtext><mml:mtext>All</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> values of 0.027 and 0.029, with <inline-formula><mml:math id="M369" display="inline"><mml:mrow><mml:msubsup><mml:mi>R</mml:mi><mml:mtext>All</mml:mtext><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:math></inline-formula> values of 0.994 and 0.993, respectively (Fig. 17a). Moreover, TNNA achieves the highest fitting accuracy for predictive results among the five optimization algorithms, with an RMSE of 0.018 and an <inline-formula><mml:math id="M370" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> of 0.994 (Fig. 17b). Even in high-noise scenarios, TNNA continues to exhibit an advantage over the four metaheuristic algorithms in both inversion convergence accuracy (Fig. 18a) and predictive accuracy (Fig. 18b). Additionally, considering the number of forward simulation calls required by each inversion algorithm, TNNA proves to be a more efficient approach in this case study.</p>

      <fig id="F18" specific-use="star"><label>Figure 18</label><caption><p id="d2e8056">Pair-wise comparison between the calibrated simulation results and the observational data <bold>(a)</bold>; and the true parameter-based predictions (10 % observation noise).</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/29/4251/2025/hess-29-4251-2025-f18.png"/>

          </fig>

</sec>
</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Parameter inversion method comparison results</title>
      <p id="d2e8077">This study evaluates the computational efficiency and inversion reliability of the TNNA algorithm under three different heterogeneous conditions. In optimization-based inversion studies, the primary challenge is to establish non-linear inversion constraints and design efficient algorithms to find optimal parameter solutions. The main difference between cases lies in how the constraint conditions are formulated, while the optimization algorithm itself remains generally applicable across different optimization tasks if these conditions are properly defined. Therefore, the fundamental challenge in applying well-performing inversion methods to real-world cases lies in whether robust non-linear optimization constraints can be effectively established for inversion tasks. Given the complexities of subsurface systems, three key aspects should be considered to extend the TNNA method to real-world applications: (1) representing complex heterogeneous model parameter fields, (2) maximizing the effective observational information while optimizing monitoring costs, and (3) integrating multi-source data and accounting for uncertainties in the model process to better address complex observational noise scenarios and uncertainties in physical mechanisms. Detailed considerations for these issues are as follows. <list list-type="bullet"><list-item>
      <p id="d2e8082"><italic>Heterogeneity in aquifer parameter structures</italic>. This study developed a dimensionality-reduction framework using OCAAE for high-dimensional parameter field inversion. Generative machine learning methods, including state-of-the-art variants, have the potential to characterize complex non-Gaussian fields. However, obtaining representative parameter field datasets that accurately capture the spatial variability and heterogeneous geostatistical characteristics of the target aquifer remains challenging in practical research. For instance, spatial variations in non-stationary stochastic aquifer systems may result in significant discrepancies in geostatistical parameters across sampling windows (Mariethoz and Caers, 2014). Therefore, developing appropriate generator-training strategies is essential for these practical scenarios.</p></list-item><list-item>
      <p id="d2e8088"><italic>Monitoring network optimization</italic>. The inversion performance of the TNNA and four metaheuristic algorithms is evaluated based on a non-linear optimization model with dense distributed monitoring networks. This monitoring strategy is commonly employed in the evaluation of inversion algorithms to ensure sufficient observational information, thereby reducing non-uniqueness in parameter inversion results (Bao et al., 2020; Mo et al., 2020; Zhang et al., 2024). Such monitoring strategies for comparing inversion methods also aim to minimize external interferences, ensuring that differences in performance are primarily determined by inversion algorithms themselves. However, the number and locations of monitoring stations are constrained by financial budgets. Thus, optimizing the monitoring network design to minimize monitoring costs without compromising constraint information quality is indispensable for practical applications (Keum et al., 2018; Chen et al., 2022; Cao et al., 2025).</p></list-item><list-item>
      <p id="d2e8094"><italic>Considering multi-source data and uncertainties in model processes</italic>. This study considers only hydraulic head and solute concentration data, assuming ideal white Gaussian noises. However, in real-world scenarios, observational noise is often more complex and may exhibit non-Gaussian characteristics. For instance, some solute concentrations cannot be measured in situ, and unavoidable perturbations may be included during sample collection and laboratory analysis. Similarly, hydraulic head measurements may be influenced by other factors, including meteorological conditions, human groundwater extraction, and engineering disturbances, among others. Moreover, all observational data in this study are constrained by a single predetermined process model. However, if significant uncertainties exist in the actual aquifer model processes or if the conceptual model deviates substantially from real-world conditions, even an advanced optimization algorithm may produce incorrect inversion results. Therefore, it is crucial to integrate multi-source data (e.g., geophysical measurements or isotope data) and to develop multi-process coupled models to establish more robust inversion frameworks (Dai and Samper, 2006; Botto et al., 2018; Chang and Zhang, 2019). Specifically, parameterizing model process uncertainties to enable the simultaneous identification of both model processes and unknown parameters could be a promising direction for real-world studies.</p></list-item></list></p>
</sec>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <label>5</label><title>Summary and conclusions</title>
      <p id="d2e8109">This study systematically evaluates the performance of tandem neural network architecture (TNNA) in comparison to four widely used metaheuristic algorithms (GA, PSO, DE, and SA) across three inversion frameworks designed for different heterogeneous groundwater conditions. The results demonstrate that TNNA consistently outperforms the four conventional metaheuristic algorithms across the designed scenarios, covering both low-dimensional and high-dimensional cases. It provides more accurate inversion results while significantly reducing computational costs. Moreover, it has been verified that the TNNA algorithm consistently delivers reliable inversion results with just a single forward simulation per iteration step in scenarios featuring various complex and uncertain model parameters. This characteristic offers a practical approach to balancing exploration and exploitation with a reduced computational burden, contrasting with conventional metaheuristic algorithms that require increasing forward simulations as the inversion problem grows more complex. Furthermore, this study introduces a novel framework that integrates TNNA, along with optimization algorithms, with generative machine-learning-based parameterization methods for dimensionality reduction in complex heterogeneous parameter fields.</p>
      <p id="d2e8112">In summary, training reverse network through the TNNA method provides significant advantages over conventional metaheuristic algorithms. The proposed integrated framework, which combines the TNNA method with dimensionality reduction techniques, further enhances its applicability and demonstrates strong potential for high-dimensional inversion problems. Developing specialized inversion algorithm frameworks based on state-of-the-art machine learning methods tailored to specific problem scenarios represents a promising research direction. Furthermore, hyperparameters can significantly influence neural network performance in certain scenarios. It is necessary for future research to explore hyperparameter optimization and sensitivity analysis to identify the optimal neural network structures and training strategies, ultimately enhancing model performance across diverse hydrological conditions.</p>
</sec>

      
      </body>
    <back><notes notes-type="dataavailability"><title>Data availability</title>

      <p id="d2e8119">The data and codes for four surrogate models and five optimization algorithms are available online at <ext-link xlink:href="https://doi.org/10.5281/zenodo.10499582" ext-link-type="DOI">10.5281/zenodo.10499582</ext-link> (Chen et al., 2024).</p>
  </notes><app-group>
        <supplementary-material position="anchor"><p id="d2e8125">The supplement related to this article is available online at <inline-supplementary-material xlink:href="https://doi.org/10.5194/hess-29-4251-2025-supplement" xlink:title="pdf">https://doi.org/10.5194/hess-29-4251-2025-supplement</inline-supplementary-material>.</p></supplementary-material>
        </app-group><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e8134">JC: conceptualization, methodology, writing (original draft), formal analysis, and funding acquisition. ZD: supervision, funding acquisition, and writing (review and editing). SY: writing (review and editing). MZ: writing (review and editing). MRS: writing (review and editing).</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e8140">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e8146">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e8152">This work is supported by the National Natural Science Foundation of China (NSFC: 42402241, U2267217, 42141011, and 42002254) and the Fundamental Research Funds for the Central Universities (2024QN11066). The authors would like to thank the editor and reviewers for their constructive comments and suggestions, and they also thank Chuanjun Zhan from Qingdao University of Technology for providing the source code for OCAAE.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e8157">This research has been supported by the National Natural Science Foundation of China (grant nos. 42402241, U2267217, 42141011, and 42002254) and the Fundamental Research Funds for the Central Universities (grant no. 2024QN11066).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e8163">This paper was edited by Mauro Giudici and reviewed by two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bib1"><label>1</label><mixed-citation>Abbas, S. A., Bailey, R. T., White, J. T., Arnold, J. G., White, M. J., Čerkasova, N., and Gao, J.: A framework for parameter estimation, sensitivity analysis, and uncertainty analysis for holistic hydrologic modeling using SWAT+, Hydrol. Earth Syst. Sci., 28, 21–48, <ext-link xlink:href="https://doi.org/10.5194/hess-28-21-2024" ext-link-type="DOI">10.5194/hess-28-21-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib2"><label>2</label><mixed-citation>Adler, J. and Öktem, O.: Solving ill-posed inverse problems using iterative deep neural networks, Inverse Probl., 33, 124007, <ext-link xlink:href="https://doi.org/10.1088/1361-6420/aa9581" ext-link-type="DOI">10.1088/1361-6420/aa9581</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib3"><label>3</label><mixed-citation>Arsenault, R. and Brissette, F. P.: Continuous streamflow prediction in ungauged basins: The effects of equifinality and parameter set selection on uncertainty in regionalization approaches, Water Resour. Res., 50, 6135–6153, <ext-link xlink:href="https://doi.org/10.1002/2013wr014898" ext-link-type="DOI">10.1002/2013wr014898</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bib4"><label>4</label><mixed-citation>Bandai, T. and Ghezzehei, T. A.: Forward and inverse modeling of water flow in unsaturated soils with discontinuous hydraulic conductivities using physics-informed neural networks with domain decomposition, Hydrol. Earth Syst. Sci., 26, 4469–4495, <ext-link xlink:href="https://doi.org/10.5194/hess-26-4469-2022" ext-link-type="DOI">10.5194/hess-26-4469-2022</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib5"><label>5</label><mixed-citation>Bao, J., Li, L., and Redoloza, F.: Coupling ensemble smoother and deep learning with generative adversarial networks to deal with non-Gaussianity in flow and transport data assimilation, J. Hydrol., 590, 125443, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2020.125443" ext-link-type="DOI">10.1016/j.jhydrol.2020.125443</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib6"><label>6</label><mixed-citation>Bentivoglio, R., Isufi, E., Jonkman, S. N., and Taormina, R.: Deep learning methods for flood mapping: a review of existing applications and future research directions, Hydrol. Earth Syst. Sci., 26, 4345–4378, <ext-link xlink:href="https://doi.org/10.5194/hess-26-4345-2022" ext-link-type="DOI">10.5194/hess-26-4345-2022</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib7"><label>7</label><mixed-citation>Beven, K. and Binley, A.: The future of distributed models: Model calibration and uncertainty prediction, Hydrol. Process., 6, 279–298, <ext-link xlink:href="https://doi.org/10.1002/hyp.3360060305" ext-link-type="DOI">10.1002/hyp.3360060305</ext-link>, 1992.</mixed-citation></ref>
      <ref id="bib1.bib8"><label>8</label><mixed-citation>Blasone, R.-S., Madsen, H., and Rosbjerg, D.: Parameter estimation in distributed hydrological modelling: comparison of global and local optimisation techniques, Hydrol. Res., 38, 451–476, <ext-link xlink:href="https://doi.org/10.2166/nh.2007.024" ext-link-type="DOI">10.2166/nh.2007.024</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bib9"><label>9</label><mixed-citation>Botto, A., Belluco, E., and Camporese, M.: Multi-source data assimilation for physically based hydrological modeling of an experimental hillslope, Hydrol. Earth Syst. Sci., 22, 4251–4266, <ext-link xlink:href="https://doi.org/10.5194/hess-22-4251-2018" ext-link-type="DOI">10.5194/hess-22-4251-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib10"><label>10</label><mixed-citation>Cao, M., Dai, Z., Chen, J., Yin, H., Zhang, X., Wu, J., Thanh, H. V., and Soltanian, M. R.: An integrated framework of deep learning and entropy theory for enhanced high-dimensional permeability field identification in heterogeneous aquifers, Water Res., 268, 122706, <ext-link xlink:href="https://doi.org/10.1016/j.watres.2024.122706" ext-link-type="DOI">10.1016/j.watres.2024.122706</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bib11"><label>11</label><mixed-citation>Carrera, J. and Glorioso, L.: On geostatistical formulations of the groundwater flow inverse problem, Adv. Water Resour., 14, 273–283, <ext-link xlink:href="https://doi.org/10.1016/0309-1708(91)90039-Q" ext-link-type="DOI">10.1016/0309-1708(91)90039-Q</ext-link>, 1991.</mixed-citation></ref>
      <ref id="bib1.bib12"><label>12</label><mixed-citation>Castaings, W., Dartus, D., Le Dimet, F.-X., and Saulnier, G.-M.: Sensitivity analysis and parameter estimation for distributed hydrological modeling: potential of variational methods, Hydrol. Earth Syst. Sci., 13, 503–517, <ext-link xlink:href="https://doi.org/10.5194/hess-13-503-2009" ext-link-type="DOI">10.5194/hess-13-503-2009</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bib13"><label>13</label><mixed-citation>Chang, H. and Zhang, D.: Identification of physical processes via combined data-driven and data-assimilation methods, J. Comput. Phys., 393, 337–350, <ext-link xlink:href="https://doi.org/10.1016/j.jcp.2019.05.008" ext-link-type="DOI">10.1016/j.jcp.2019.05.008</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib14"><label>14</label><mixed-citation>Chang, Z., Lu, W., and Wang, Z.: Study on source identification and source-sink relationship of LNAPLs pollution in groundwater by the adaptive cyclic improved iterative process and Monte Carlo stochastic simulation, J. Hydrol., 612, 128109, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2022.128109" ext-link-type="DOI">10.1016/j.jhydrol.2022.128109</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib15"><label>15</label><mixed-citation>Chen, J., Dai, Z., Yang, Z., Pan, Y., Zhang, X., Wu, J., and Reza Soltanian, M.: An improved tandem neural network architecture for inverse modeling of multicomponent reactive transport in porous media, Water Resour. Res., 57, 2021WR030595, <ext-link xlink:href="https://doi.org/10.1029/2021wr030595" ext-link-type="DOI">10.1029/2021wr030595</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib16"><label>16</label><mixed-citation>Chen, J., Dai, Z., Dong, S., Zhang, X., Sun, G., Wu, J., Ershadnia, R., Yin, S., and Soltanian, M. R.: Integration of deep learning and information theory for designing monitoring networks in heterogeneous aquifer systems, Water Resour. Res., 58, 2022WR032429, <ext-link xlink:href="https://doi.org/10.1029/2022wr032429" ext-link-type="DOI">10.1029/2022wr032429</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib17"><label>17</label><mixed-citation>Chen, J., Dai, Z., Yin, S., Zhang, M., and Soltanian, M. R.: Enhancing Inverse Modeling in Groundwater Systems through Machine Learning: A Comprehensive Comparative Study, Zenodo [data set], <ext-link xlink:href="https://doi.org/10.5281/zenodo.10499582" ext-link-type="DOI">10.5281/zenodo.10499582</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib18"><label>18</label><mixed-citation>Chen, X., Hammond, G. E., Murray, C. J., Rockhold, M. L., Vermeul, V. R., and Zachara, J. M.: Application of ensemble-based data assimilation techniques for aquifer characterization using tracer data at Hanford 300 area, Water Resour. Res., 49, 7064–7076, <ext-link xlink:href="https://doi.org/10.1002/2012wr013285" ext-link-type="DOI">10.1002/2012wr013285</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib19"><label>19</label><mixed-citation>Dai, Z. and Samper, J.: Inverse problem of multicomponent reactive chemical transport in porous media: Formulation and applications, Water Resour. Res., 40, W07407, <ext-link xlink:href="https://doi.org/10.1029/2004wr003248" ext-link-type="DOI">10.1029/2004wr003248</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bib20"><label>20</label><mixed-citation>Dai, Z. and Samper, J.: Inverse modeling of water flow and multicomponent reactive transport in coastal aquifer systems, J. Hydrol., 447–461, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2005.11.052" ext-link-type="DOI">10.1016/j.jhydrol.2005.11.052</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bib21"><label>21</label><mixed-citation>Dragonetti, G., Comegna, A., Ajeel, A., Deidda, G. P., Lamaddalena, N., Rodriguez, G., Vignoli, G., and Coppola, A.: Calibrating electromagnetic induction conductivities with time-domain reflectometry measurements, Hydrol. Earth Syst. Sci., 22, 1509–1523, <ext-link xlink:href="https://doi.org/10.5194/hess-22-1509-2018" ext-link-type="DOI">10.5194/hess-22-1509-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib22"><label>22</label><mixed-citation>Eberhart, R. and Kennedy, J.: Particle swarm optimization, in: Proceedings of the IEEE international conference on neural networks, Perth, Western Australia, Australia, 27 November 1995, 1942–1948,  <ext-link xlink:href="https://doi.org/10.1109/ICNN.1995.488968" ext-link-type="DOI">10.1109/ICNN.1995.488968</ext-link>, 1995.</mixed-citation></ref>
      <ref id="bib1.bib23"><label>23</label><mixed-citation>Elfwing, S., Uchibe, E., and Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning, Neural Networks, 107, 3–11, <ext-link xlink:href="https://doi.org/10.1016/j.neunet.2017.12.012" ext-link-type="DOI">10.1016/j.neunet.2017.12.012</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib24"><label>24</label><mixed-citation>Epp, R., Schmid, F., and Jenny, P.: Fast convergence strategy for ambiguous inverse problems based on hierarchical regularization, J. Comput. Phys., 489, 112264, <ext-link xlink:href="https://doi.org/10.1016/j.jcp.2023.112264" ext-link-type="DOI">10.1016/j.jcp.2023.112264</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib25"><label>25</label><mixed-citation>Ghelichkhan, S., Gibson, A., Davies, D. R., Kramer, S. C., and Ham, D. A.: Automatic adjoint-based inversion schemes for geodynamics: reconstructing the evolution of Earth's mantle in space and time, Geosci. Model Dev., 17, 5057–5086, <ext-link xlink:href="https://doi.org/10.5194/gmd-17-5057-2024" ext-link-type="DOI">10.5194/gmd-17-5057-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib26"><label>26</label><mixed-citation>Ginn, T. R. and Cushman, J. H.: Inverse methods for subsurface flow: A critical review of stochastic techniques, Stoch. Hydrol. Hydraul., 4, 1–26, <ext-link xlink:href="https://doi.org/10.1007/BF01547729" ext-link-type="DOI">10.1007/BF01547729</ext-link>, 1990.</mixed-citation></ref>
      <ref id="bib1.bib27"><label>27</label><mixed-citation>Giudici, M.: Some Remarks About Forward and Inverse Modelling in Hydrology, Within a General Conceptual Framework, Hydrology, 11, 189, <ext-link xlink:href="https://doi.org/10.3390/hydrology11110189" ext-link-type="DOI">10.3390/hydrology11110189</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib28"><label>28</label><mixed-citation> Goodfellow, I., Bengio, Y., and Courville, A.: Deep learning, MIT press, Cambridge, MA, USA, 800 pp., ISBN 9780262337373, 2016.</mixed-citation></ref>
      <ref id="bib1.bib29"><label>29</label><mixed-citation>Guo, Q., Liu, M., and Luo, J.: Predictive Deep Learning for High-Dimensional Inverse Modeling of Hydraulic Tomography in Gaussian and Non-Gaussian Fields, Water Resour. Res., 59, 2023WR035408, <ext-link xlink:href="https://doi.org/10.1029/2023wr035408" ext-link-type="DOI">10.1029/2023wr035408</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib30"><label>30</label><mixed-citation>He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27-30 June 2016, 770–778, <ext-link xlink:href="https://doi.org/10.1109/CVPR.2016.90" ext-link-type="DOI">10.1109/CVPR.2016.90</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib31"><label>31</label><mixed-citation>Hinton, G. E. and Salakhutdinov, R. R.: Reducing the dimensionality of data with neural networks, Science, 313, 504–507, <ext-link xlink:href="https://doi.org/10.1126/science.1127647" ext-link-type="DOI">10.1126/science.1127647</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bib32"><label>32</label><mixed-citation> Holland John, H.: Adaptation in natural and artificial systems, University of Michigan Press, Ann Arbor,  ISBN 10 0472084607,  13 978-0472084609, 1975.</mixed-citation></ref>
      <ref id="bib1.bib33"><label>33</label><mixed-citation>Hopmans, J. W., Šimùnek, J., Romano, N., and Durner, W.: 3.6.2. Inverse Methods, in: Methods of Soil Analysis, edited by:  Dane, J. H. and Clarke Topp, G., Wiley, 963–1008, <ext-link xlink:href="https://doi.org/10.2136/sssabookser5.4.c40" ext-link-type="DOI">10.2136/sssabookser5.4.c40</ext-link>, 2002.</mixed-citation></ref>
      <ref id="bib1.bib34"><label>34</label><mixed-citation>Ines, A. V. M. and Droogers, P.: Inverse modelling in estimating soil hydraulic functions: a Genetic Algorithm approach, Hydrol. Earth Syst. Sci., 6, 49–66, <ext-link xlink:href="https://doi.org/10.5194/hess-6-49-2002" ext-link-type="DOI">10.5194/hess-6-49-2002</ext-link>, 2002.</mixed-citation></ref>
      <ref id="bib1.bib35"><label>35</label><mixed-citation>Jardani, A., Vu, T. M., and Fischer, P.: Use of convolutional neural networks with encoder-decoder structure for predicting the inverse operator in hydraulic tomography, J. Hydrol., 604, 127233, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2021.127233" ext-link-type="DOI">10.1016/j.jhydrol.2021.127233</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib36"><label>36</label><mixed-citation>Jaumann, S. and Roth, K.: Soil hydraulic material properties and layered architecture from time-lapse GPR, Hydrol. Earth Syst. Sci., 22, 2551–2573, <ext-link xlink:href="https://doi.org/10.5194/hess-22-2551-2018" ext-link-type="DOI">10.5194/hess-22-2551-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib37"><label>37</label><mixed-citation>Jose, S. C., Rahman, M. A., and Cirpka, O. A.: Large-scale sandbox experiment on longitudinal effective dispersion in heterogeneous porous media, Water Resour. Res., 40, W12415, <ext-link xlink:href="https://doi.org/10.1029/2004wr003363" ext-link-type="DOI">10.1029/2004wr003363</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bib38"><label>38</label><mixed-citation>Kabala, Z. J. and Skaggs, T. H.: Comment on “Minimum relative entropy inversion: Theory and application to recovering the release history of a groundwater contaminant” by Allan, D. Woodbury and Tadeusz, J. Ulrych, Water Resour. Res., 34, 2077–2079, <ext-link xlink:href="https://doi.org/10.1029/98WR01337" ext-link-type="DOI">10.1029/98WR01337</ext-link>, 1998.</mixed-citation></ref>
      <ref id="bib1.bib39"><label>39</label><mixed-citation>Kapsoulis, D., Tsiakas, K., Trompoukis, X., Asouti, V., and Giannakoglou, K.: A PCA-assisted hybrid algorithm combining EAs and adjoint methods for CFD-based optimization, Appl. Soft Comput., 73, 520–529, <ext-link xlink:href="https://doi.org/10.1016/j.asoc.2018.09.002" ext-link-type="DOI">10.1016/j.asoc.2018.09.002</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib40"><label>40</label><mixed-citation>Keum, J., Coulibaly, P., Razavi, T., Tapsoba, D., Gobena, A., Weber, F., and Pietroniro, A.: Application of SNODAS and hydrologic models to enhance entropy-based snow monitoring network design, J. Hydrol., 561, 688–701, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2018.04.037" ext-link-type="DOI">10.1016/j.jhydrol.2018.04.037</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib41"><label>41</label><mixed-citation>Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P.: Optimization by Simulated Annealing, Science, 220, 671–680, <ext-link xlink:href="https://doi.org/10.1126/science.220.4598.671" ext-link-type="DOI">10.1126/science.220.4598.671</ext-link>, 1983.</mixed-citation></ref>
      <ref id="bib1.bib42"><label>42</label><mixed-citation>Kool, J. B., Parker, J. C., and van Genuchten, M. T.: Parameter estimation for unsaturated flow and transport models-A review, J. Hydrol., 91, 255–293, <ext-link xlink:href="https://doi.org/10.1016/0022-1694(87)90207-1" ext-link-type="DOI">10.1016/0022-1694(87)90207-1</ext-link>, 1987.</mixed-citation></ref>
      <ref id="bib1.bib43"><label>43</label><mixed-citation>Kuang, W., Yuan, C., and Zhang, J.: Real-time determination of earthquake focal mechanism via deep learning, Nat. Commun., 12, 1432, <ext-link xlink:href="https://doi.org/10.1038/s41467-021-21670-x" ext-link-type="DOI">10.1038/s41467-021-21670-x</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib44"><label>44</label><mixed-citation>LeCun, Y., Bengio, Y., and Hinton, G.: Deep learning, Nature, 521, 436–444, <ext-link xlink:href="https://doi.org/10.1038/nature14539" ext-link-type="DOI">10.1038/nature14539</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bib45"><label>45</label><mixed-citation>Li, E.: An adaptive surrogate assisted differential evolutionary algorithm for high dimensional constrained problems, Appl. Soft Comput., 85, 105752, <ext-link xlink:href="https://doi.org/10.1016/j.asoc.2019.105752" ext-link-type="DOI">10.1016/j.asoc.2019.105752</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib46"><label>46</label><mixed-citation>Lindsay, A., McCloskey, J., and Bhloscaidh, M. N.: Using a genetic algorithm to estimate the details of earthquake slip distributions from point surface displacements, J. Geophys. Res.-Sol. Ea., 121, 1796–1820, <ext-link xlink:href="https://doi.org/10.1002/2015jb012181" ext-link-type="DOI">10.1002/2015jb012181</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib47"><label>47</label><mixed-citation>Liu, D., Tan, Y., Khoram, E., and Yu, Z.: Training deep neural networks for the inverse design of nanophotonic structures, ACS Photonics, 5, 1365–1369, <ext-link xlink:href="https://doi.org/10.1021/acsphotonics.7b01377" ext-link-type="DOI">10.1021/acsphotonics.7b01377</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib48"><label>48</label><mixed-citation>Liu, M., Ahmad, R., Cai, W., and Mukerji, T.: Hierarchical Homogenization With Deep-Learning-Based Surrogate Model for Rapid Estimation of Effective Permeability From Digital Rocks, J. Geophys. Res.-Sol. Ea., 128, e2022JB025378, <ext-link xlink:href="https://doi.org/10.1029/2022jb025378" ext-link-type="DOI">10.1029/2022jb025378</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib49"><label>49</label><mixed-citation> Loève, M.: Probability Theory, Van Nostrand, New York, 1955.</mixed-citation></ref>
      <ref id="bib1.bib50"><label>50</label><mixed-citation>Long, Y., Ren, J., Li, Y., and Chen, H.: Inverse design of photonic topological state via machine learning, Appl. Phys. Lett., 114, 181105, <ext-link xlink:href="https://doi.org/10.1063/1.5094838" ext-link-type="DOI">10.1063/1.5094838</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib51"><label>51</label><mixed-citation>Luo, J., Ma, X., Ji, Y., Li, X., Song, Z., and Lu, W.: Review of machine learning-based surrogate models of groundwater contaminant modeling, Environ. Res., 238, 117268, <ext-link xlink:href="https://doi.org/10.1016/j.envres.2023.117268" ext-link-type="DOI">10.1016/j.envres.2023.117268</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib52"><label>52</label><mixed-citation>Ma, J., Xia, D., Guo, H., Wang, Y., Niu, X., Liu, Z., and Jiang, S.: Metaheuristic-based support vector regression for landslide displacement prediction: a comparative study, Landslides, 19, 2489–2511, <ext-link xlink:href="https://doi.org/10.1007/s10346-022-01923-6" ext-link-type="DOI">10.1007/s10346-022-01923-6</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib53"><label>53</label><mixed-citation>Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and Frey, B.: Adversarial autoencoders, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.1511.05644" ext-link-type="DOI">10.48550/arXiv.1511.05644</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bib54"><label>54</label><mixed-citation>Mariethoz, G. and Caers, J.: Multiple-point geostatistics: stochastic modeling with training images, Wiley Blackwell, 364 pp., <ext-link xlink:href="https://doi.org/10.1002/9781118662953" ext-link-type="DOI">10.1002/9781118662953</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bib55"><label>55</label><mixed-citation>Mariethoz, G., Renard, P., and Straubhaar, J.: The Direct Sampling method to perform multiple-point geostatistical simulations, Water Resour. Res., 46, W11536, <ext-link xlink:href="https://doi.org/10.1029/2008WR007621" ext-link-type="DOI">10.1029/2008WR007621</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bib56"><label>56</label><mixed-citation>McLaughlin, D. and Townley, L. R.: A Reassessment of the Groundwater Inverse Problem, 32, 1131–1161, <ext-link xlink:href="https://doi.org/10.1029/96WR00160" ext-link-type="DOI">10.1029/96WR00160</ext-link>, 1996.</mixed-citation></ref>
      <ref id="bib1.bib57"><label>57</label><mixed-citation>Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E.: Equation of state calculations by fast computing machines, J. Chem. Phys., 21, 1087–1092, <ext-link xlink:href="https://doi.org/10.1063/1.1699114" ext-link-type="DOI">10.1063/1.1699114</ext-link>, 1953.</mixed-citation></ref>
      <ref id="bib1.bib58"><label>58</label><mixed-citation>Mo, S., Zabaras, N., Shi, X., and Wu, J.: Deep autoregressive neural networks for high-dimensional inverse problems in groundwater contaminant source identification, Water Resour. Res., 55, 3856–3881, <ext-link xlink:href="https://doi.org/10.1029/2018wr024638" ext-link-type="DOI">10.1029/2018wr024638</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib59"><label>59</label><mixed-citation>Mo, S., Zabaras, N., Shi, X., and Wu, J.: Integration of adversarial autoencoders with residual dense convolutional networks for estimation of non-Gaussian hydraulic conductivities, Water Resour. Res., 56, 2019WR026082, <ext-link xlink:href="https://doi.org/10.1029/2019WR026082" ext-link-type="DOI">10.1029/2019WR026082</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib60"><label>60</label><mixed-citation>Neupauer, R. M., Borchers, B., and Wilson, J. L.: Comparison of inverse methods for reconstructing the release history of a groundwater contamination source, Water Resour. Res., 36, 2469–2475, <ext-link xlink:href="https://doi.org/10.1029/2000WR900176" ext-link-type="DOI">10.1029/2000WR900176</ext-link>, 2000.</mixed-citation></ref>
      <ref id="bib1.bib61"><label>61</label><mixed-citation>Nhu, V. H.: Levenberg–Marquardt method for ill-posed inverse problems with possibly non-smooth forward mappings between Banach spaces, Inverse Probl., 38, 015007, <ext-link xlink:href="https://doi.org/10.1088/1361-6420/ac38b7" ext-link-type="DOI">10.1088/1361-6420/ac38b7</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib62"><label>62</label><mixed-citation>Pérez-Cruz, F., Camps-Valls, G., Soria-Olivas, E., Pérez-Ruixo, J. J., Figueiras-Vidal, A. R., and Artés-Rodríguez, A.: Multi-dimensional Function Approximation and Regression Estimation, Artificial Neural Networks  – ICANN 2002, Berlin, Heidelberg, 757–762, <ext-link xlink:href="https://doi.org/10.1007/3-540-46084-5_123" ext-link-type="DOI">10.1007/3-540-46084-5_123</ext-link>, 2002.</mixed-citation></ref>
      <ref id="bib1.bib63"><label>63</label><mixed-citation>Plessix, R.: A review of the adjoint-state method for computing the gradient of a functional with geophysical applications, Geophys. J. Int., 167, 495–503, <ext-link xlink:href="https://doi.org/10.1111/j.1365-246X.2006.02978.x" ext-link-type="DOI">10.1111/j.1365-246X.2006.02978.x</ext-link>, 2006.</mixed-citation></ref>
      <ref id="bib1.bib64"><label>64</label><mixed-citation>Qin, Y., Kavetski, D., Kuczera, G., McInerney, D., Yang, T., and Guo, Y.: Can Gauss-Newton Algorithms Outperform Stochastic Optimization Algorithms When Calibrating a Highly Parameterized Hydrological Model? A Case Study Using SWAT, Water Resour. Res., 58, e2021WR031532, <ext-link xlink:href="https://doi.org/10.1029/2021wr031532" ext-link-type="DOI">10.1029/2021wr031532</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib65"><label>65</label><mixed-citation>Rafiei, V., Nejadhashemi, A. P., Mushtaq, S., Bailey, R. T., and An-Vo, D.-A.: An improved calibration technique to address high dimensionality and non-linearity in integrated groundwater and surface water models, Environ. Modell. Softw., 149, 105312, <ext-link xlink:href="https://doi.org/10.1016/j.envsoft.2022.105312" ext-link-type="DOI">10.1016/j.envsoft.2022.105312</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib66"><label>66</label><mixed-citation>Razavi, S., Tolson, B. A., and Burn, D. H.: Review of surrogate modeling in water resources, Water Resour. Res., 48, W07401, <ext-link xlink:href="https://doi.org/10.1029/2011wr011527" ext-link-type="DOI">10.1029/2011wr011527</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bib67"><label>67</label><mixed-citation>Sanchez-Fernandez, M., de-Prado-Cumplido, M., Arenas-Garcia, J., and Perez-Cruz, F.: SVM multiregression for nonlinear channel estimation in multiple-input multiple-output systems, IEEE T. Signal Proces., 52, 2298–2307, <ext-link xlink:href="https://doi.org/10.1109/tsp.2004.831028" ext-link-type="DOI">10.1109/tsp.2004.831028</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bib68"><label>68</label><mixed-citation>Sanchez-Vila, X., Donado, L. D., Guadagnini, A., and Carrera, J.: A solution for multicomponent reactive transport under equilibrium and kinetic reactions, Water Resour. Res., 46, W07539, <ext-link xlink:href="https://doi.org/10.1029/2009wr008439" ext-link-type="DOI">10.1029/2009wr008439</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bib69"><label>69</label><mixed-citation>Scharnagl, B., Vrugt, J. A., Vereecken, H., and Herbst, M.: Inverse modelling of in situ soil water dynamics: investigating the effect of different prior distributions of the soil hydraulic parameters, Hydrol. Earth Syst. Sci., 15, 3043–3059, <ext-link xlink:href="https://doi.org/10.5194/hess-15-3043-2011" ext-link-type="DOI">10.5194/hess-15-3043-2011</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bib70"><label>70</label><mixed-citation>Schneider-Zapp, K., Ippisch, O., and Roth, K.: Numerical study of the evaporation process and parameter estimation analysis of an evaporation experiment, Hydrol. Earth Syst. Sci., 14, 765–781, <ext-link xlink:href="https://doi.org/10.5194/hess-14-765-2010" ext-link-type="DOI">10.5194/hess-14-765-2010</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bib71"><label>71</label><mixed-citation>Shen, C., Appling, A. P., Gentine, P., Bandai, T., Gupta, H., Tartakovsky, A., Baity-Jesi, M., Fenicia, F., Kifer, D., Li, L., Liu, X., Ren, W., Zheng, Y., Harman, C. J., Clark, M., Farthing, M., Feng, D., Kumar, P., Aboelyazeed, D., Rahmani, F., Song, Y., Beck, H. E., Bindas, T., Dwivedi, D., Fang, K., Höge, M., Rackauckas, C., Mohanty, B., Roy, T., Xu, C., and Lawson, K.: Differentiable modelling to unify machine learning and physical models for geosciences, Nature Reviews Earth and Environment, 4, 552–567, <ext-link xlink:href="https://doi.org/10.1038/s43017-023-00450-9" ext-link-type="DOI">10.1038/s43017-023-00450-9</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib72"><label>72</label><mixed-citation>Sorensen, J. P. R. and Butcher, A. S.: Water Level Monitoring Pressure Transducers – A Need for Industry-Wide Standards, Groundwater Monitoring and Remediation, 31, 56–62, <ext-link xlink:href="https://doi.org/10.1111/j.1745-6592.2011.01346.x" ext-link-type="DOI">10.1111/j.1745-6592.2011.01346.x</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bib73"><label>73</label><mixed-citation>Steefel, C., Depaolo, D., and Lichtner, P.: Reactive transport modeling: An essential tool and a new research approach for the earth sciences, Earth Planet. Sc. Lett., 240, 539–558, <ext-link xlink:href="https://doi.org/10.1016/j.epsl.2005.09.017" ext-link-type="DOI">10.1016/j.epsl.2005.09.017</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bib74"><label>74</label><mixed-citation>Sternagel, A., Loritz, R., Klaus, J., Berkowitz, B., and Zehe, E.: Simulation of reactive solute transport in the critical zone: a Lagrangian model for transient flow and preferential transport, Hydrol. Earth Syst. Sci., 25, 1483–1508, <ext-link xlink:href="https://doi.org/10.5194/hess-25-1483-2021" ext-link-type="DOI">10.5194/hess-25-1483-2021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib75"><label>75</label><mixed-citation>Storn, R. and Price, K.: Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces, J. Global Optim., 11, 341–359, <ext-link xlink:href="https://doi.org/10.1023/A:1008202821328" ext-link-type="DOI">10.1023/A:1008202821328</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bib76"><label>76</label><mixed-citation>Sun, A. Y.: Discovering state-parameter mappings in subsurface models using generative adversarial networks, Geophys. Res. Lett., 45, 11137–11146, <ext-link xlink:href="https://doi.org/10.1029/2018gl080404" ext-link-type="DOI">10.1029/2018gl080404</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib77"><label>77</label><mixed-citation>Sun, N.: Inverse problems in groundwater modeling, Springer Science and Business Media, 338 pp., <ext-link xlink:href="https://doi.org/10.1007/978-94-017-1970-4" ext-link-type="DOI">10.1007/978-94-017-1970-4</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib78"><label>78</label><mixed-citation>Tran, H.-N., Phan, G. T. T., Do, Q. B., and Tran, V.-P.: Comparative evaluation of the performance of improved genetic algorithms and differential evolution for in-core fuel management of a research reactor, Nucl. Eng. Des., 398, 111953, <ext-link xlink:href="https://doi.org/10.1016/j.nucengdes.2022.111953" ext-link-type="DOI">10.1016/j.nucengdes.2022.111953</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib79"><label>79</label><mixed-citation>Travaš, V., Zaharija, L., Stipanić, D., and Družeta, S.: Estimation of hydraulic conductivity functions in karst regions by particle swarm optimization with application to Lake Vrana, Croatia, Hydrol. Earth Syst. Sci., 27, 1343–1359, <ext-link xlink:href="https://doi.org/10.5194/hess-27-1343-2023" ext-link-type="DOI">10.5194/hess-27-1343-2023</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib80"><label>80</label><mixed-citation>Tsai, F. T. C., Sun, N., and Yeh, W. W. G.: Global-local optimization for parameter structure identification in three-dimensional groundwater modeling, Water Resour. Res., 39, 1043, <ext-link xlink:href="https://doi.org/10.1029/2001wr001135" ext-link-type="DOI">10.1029/2001wr001135</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bib81"><label>81</label><mixed-citation>Tuia, D., Verrelst, J., Alonso, L., Perez-Cruz, F., and Camps-Valls, G.: Multioutput support vector regression for remote sensing biophysical parameter estimation, IEEE Geosci. Remote S., 8, 804–808, <ext-link xlink:href="https://doi.org/10.1109/lgrs.2011.2109934" ext-link-type="DOI">10.1109/lgrs.2011.2109934</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bib82"><label>82</label><mixed-citation>Vrugt, J. A.: Markov chain Monte Carlo simulation using the DREAM software package: Theory, concepts, and MATLAB implementation, Environ. Modell. Softw., 75, 273–316, <ext-link xlink:href="https://doi.org/10.1016/j.envsoft.2015.08.013" ext-link-type="DOI">10.1016/j.envsoft.2015.08.013</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib83"><label>83</label><mixed-citation>Wang, G. S. and Chen, S. L.: Evaluation of a soil greenhouse gas emission model based on Bayesian inference and MCMC: Parameter identifiability and equifinality, Ecol. Model., 253, 107–116, <ext-link xlink:href="https://doi.org/10.1016/j.ecolmodel.2012.09.011" ext-link-type="DOI">10.1016/j.ecolmodel.2012.09.011</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bib84"><label>84</label><mixed-citation>Wang, N., Chang, H., and Zhang, D.: Deep-learning-based inverse modeling approaches: A subsurface flow example, J. Geophys. Res.-Sol. Ea., 126, 2020JB020549, <ext-link xlink:href="https://doi.org/10.1029/2020jb020549" ext-link-type="DOI">10.1029/2020jb020549</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib85"><label>85</label><mixed-citation>Wang, Y., Fang, Z., and Hong, H.: Comparison of convolutional neural networks for landslide susceptibility mapping in Yanshan County, China, Sci. Total Environ., 666, 975–993, <ext-link xlink:href="https://doi.org/10.1016/j.scitotenv.2019.02.263" ext-link-type="DOI">10.1016/j.scitotenv.2019.02.263</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib86"><label>86</label><mixed-citation>Xia, C.-A., Luo, X., Hu, B. X., Riva, M., and Guadagnini, A.: Data assimilation with multiple types of observation boreholes via the ensemble Kalman filter embedded within stochastic moment equations, Hydrol. Earth Syst. Sci., 25, 1689–1709, <ext-link xlink:href="https://doi.org/10.5194/hess-25-1689-2021" ext-link-type="DOI">10.5194/hess-25-1689-2021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib87"><label>87</label><mixed-citation>Xiao, C., Deng, Y., and Wang, G.: Deep-Learning-Based Adjoint State Method: Methodology and Preliminary Application to Inverse Modeling, Water Resour. Res., 57, 2020WR027400, <ext-link xlink:href="https://doi.org/10.1029/2020wr027400" ext-link-type="DOI">10.1029/2020wr027400</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib88"><label>88</label><mixed-citation>Xu, T., Spycher, N., Sonnenthal, E., Zhang, G., Zheng, L., and Pruess, K.: TOUGHREACT Version 2.0: A simulator for subsurface reactive transport under non-isothermal multiphase flow conditions, Comput. Geosci., 37, 763–774, <ext-link xlink:href="https://doi.org/10.1016/j.cageo.2010.10.007" ext-link-type="DOI">10.1016/j.cageo.2010.10.007</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bib89"><label>89</label><mixed-citation>Xu, Z., Serata, R., Wainwright, H., Denham, M., Molins, S., Gonzalez-Raymat, H., Lipnikov, K., Moulton, J. D., and Eddy-Dilek, C.: Reactive transport modeling for supporting climate resilience at groundwater contamination sites, Hydrol. Earth Syst. Sci., 26, 755–773, <ext-link xlink:href="https://doi.org/10.5194/hess-26-755-2022" ext-link-type="DOI">10.5194/hess-26-755-2022</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib90"><label>90</label><mixed-citation>Yan, Z., Ran, J., Xiao, Y., Xu, Z., Wu, H., Deng, X. L., Du, L., and Zhong, M.: The Temporal Improvement of Earth's Mass Transport Estimated by Coupling GRACE-FO With a Chinese Polar Gravity Satellite Mission, J. Geophys. Res.-Sol. Ea., 128, e2023JB027157, <ext-link xlink:href="https://doi.org/10.1029/2023jb027157" ext-link-type="DOI">10.1029/2023jb027157</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib91"><label>91</label><mixed-citation>Yang, X., Chen, X., and Smith, M. M.: Deep learning inversion of gravity data for detection of <inline-formula><mml:math id="M371" display="inline"><mml:mrow class="chem"><mml:msub><mml:mi mathvariant="normal">CO</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> plumes in overlying aquifers, J. Appl. Geophys., 196, 104507, <ext-link xlink:href="https://doi.org/10.1016/j.jappgeo.2021.104507" ext-link-type="DOI">10.1016/j.jappgeo.2021.104507</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib92"><label>92</label><mixed-citation>Yeh, W. W.-G.: Review of Parameter Identification Procedures in Groundwater Hydrology: The Inverse Problem, Water Resour. Res., 22, 95–108, <ext-link xlink:href="https://doi.org/10.1029/WR022i002p00095" ext-link-type="DOI">10.1029/WR022i002p00095</ext-link>, 1986.</mixed-citation></ref>
      <ref id="bib1.bib93"><label>93</label><mixed-citation>Yeung, C., Tsai, J.-M., King, B., Pham, B., Ho, D., Liang, J., Knight, M. W., and Raman, A. P.: Multiplexed supercell metasurface design and optimization with tandem residual networks, Nanophotonics, 10, 1133–1143, <ext-link xlink:href="https://doi.org/10.1515/nanoph-2020-0549" ext-link-type="DOI">10.1515/nanoph-2020-0549</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib94"><label>94</label><mixed-citation>Yu, S. and Ma, J.: Deep Learning for Geophysics: Current and Future Trends, Rev. Geophys., 59, e2021RG000742, <ext-link xlink:href="https://doi.org/10.1029/2021rg000742" ext-link-type="DOI">10.1029/2021rg000742</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib95"><label>95</label><mixed-citation>Zhan, C., Dai, Z., Soltanian, M. R., and Zhang, X.: Stage-wise stochastic deep learning inversion framework for subsurface sedimentary structure identification, Geophys. Res. Lett., 49, 2021GL095823, <ext-link xlink:href="https://doi.org/10.1029/2021gl095823" ext-link-type="DOI">10.1029/2021gl095823</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib96"><label>96</label><mixed-citation>Zhan, C., Dai, Z., Yang, Z., Zhang, X., Ma, Z., Thanh, H. V., and Soltanian, M. R.: Subsurface sedimentary structure identification using deep learning: A review, Earth-Sci. Rev., 239, 104370, <ext-link xlink:href="https://doi.org/10.1016/j.earscirev.2023.104370" ext-link-type="DOI">10.1016/j.earscirev.2023.104370</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib97"><label>97</label><mixed-citation>Zhang, D. X. and Lu, Z. M.: An efficient, high-order perturbation approach for flow in random porous media via Karhunen–Loeve and polynomial expansions, J. Comput. Phys., 194, 773–794, <ext-link xlink:href="https://doi.org/10.1016/j.jcp.2003.09.015" ext-link-type="DOI">10.1016/j.jcp.2003.09.015</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bib98"><label>98</label><mixed-citation>Zhang, J., Zeng, L., Chen, C., Chen, D., and Wu, L.: Efficient Bayesian experimental design for contaminant source identification, Water Resour. Res., 51, 576–598, <ext-link xlink:href="https://doi.org/10.1002/2014wr015740" ext-link-type="DOI">10.1002/2014wr015740</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bib99"><label>99</label><mixed-citation>Zhang, J., Lin, G., Li, W., Wu, L., and Zeng, L.: An iterative local updating ensemble smoother for estimation and uncertainty assessment of hydrologic model parameters with multimodal distributions, Water Resour. Res., 54, 1716–1733, <ext-link xlink:href="https://doi.org/10.1002/2017wr020906" ext-link-type="DOI">10.1002/2017wr020906</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib100"><label>100</label><mixed-citation>Zhang, J., Cao, C., Nan, T., Ju, L., Zhou, H., and Zeng, L.: A Novel Deep Learning Approach for Data Assimilation of Complex Hydrological Systems, Water Resour. Res., 60, e2023WR035389, <ext-link xlink:href="https://doi.org/10.1029/2023WR035389" ext-link-type="DOI">10.1029/2023WR035389</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib101"><label>101</label><mixed-citation>Zheng, L. and Samper, J.: Formulation of the inverse problem of non-isothermal multiphase flow and reactive transport in porous media, in: Developments in Water Science, edited by: Miller, C. T. and Pinder, G. F., Elsevier, 1317–1327, <ext-link xlink:href="https://doi.org/10.1016/S0167-5648(04)80146-1" ext-link-type="DOI">10.1016/S0167-5648(04)80146-1</ext-link>, 2004. </mixed-citation></ref>
      <ref id="bib1.bib102"><label>102</label><mixed-citation>Zhou, H., Gómez-Hernández, J. J., and Li, L.: Inverse methods in hydrogeology: Evolution and recent trends, Adv. Water Resour., 63, 22–37, <ext-link xlink:href="https://doi.org/10.1016/j.advwatres.2013.10.014" ext-link-type="DOI">10.1016/j.advwatres.2013.10.014</ext-link>, 2014.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Enhancing inverse modeling in groundwater systems through machine learning: a comprehensive comparative study</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>1</label><mixed-citation>
       Abbas, S. A., Bailey, R. T., White, J. T., Arnold, J. G., White, M. J., Čerkasova, N., and Gao, J.: A
framework for parameter estimation, sensitivity analysis, and uncertainty analysis for holistic hydrologic modeling
using SWAT+, Hydrol. Earth Syst. Sci., 28, 21–48, <a href="https://doi.org/10.5194/hess-28-21-2024" target="_blank">https://doi.org/10.5194/hess-28-21-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>2</label><mixed-citation>
       Adler, J. and Öktem, O.: Solving ill-posed inverse problems using iterative deep neural networks,
Inverse Probl., 33, 124007, <a href="https://doi.org/10.1088/1361-6420/aa9581" target="_blank">https://doi.org/10.1088/1361-6420/aa9581</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>3</label><mixed-citation>
       Arsenault, R. and Brissette, F. P.: Continuous streamflow prediction in ungauged basins: The effects of
equifinality and parameter set selection on uncertainty in regionalization approaches, Water Resour. Res., 50,
6135–6153, <a href="https://doi.org/10.1002/2013wr014898" target="_blank">https://doi.org/10.1002/2013wr014898</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>4</label><mixed-citation>
       Bandai, T. and Ghezzehei, T. A.: Forward and inverse modeling of water flow in unsaturated soils with
discontinuous hydraulic conductivities using physics-informed neural networks with domain decomposition, Hydrol. Earth
Syst. Sci., 26, 4469–4495, <a href="https://doi.org/10.5194/hess-26-4469-2022" target="_blank">https://doi.org/10.5194/hess-26-4469-2022</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>5</label><mixed-citation>
       Bao, J., Li, L., and Redoloza, F.: Coupling ensemble smoother and deep learning with generative adversarial
networks to deal with non-Gaussianity in flow and transport data assimilation, J. Hydrol., 590, 125443,
<a href="https://doi.org/10.1016/j.jhydrol.2020.125443" target="_blank">https://doi.org/10.1016/j.jhydrol.2020.125443</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>6</label><mixed-citation>
       Bentivoglio, R., Isufi, E., Jonkman, S. N., and Taormina, R.: Deep learning methods for flood mapping: a
review of existing applications and future research directions, Hydrol. Earth Syst. Sci., 26, 4345–4378,
<a href="https://doi.org/10.5194/hess-26-4345-2022" target="_blank">https://doi.org/10.5194/hess-26-4345-2022</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>7</label><mixed-citation>
       Beven, K. and Binley, A.: The future of distributed models: Model calibration and uncertainty prediction,
Hydrol. Process., 6, 279–298, <a href="https://doi.org/10.1002/hyp.3360060305" target="_blank">https://doi.org/10.1002/hyp.3360060305</a>, 1992.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>8</label><mixed-citation>
       Blasone, R.-S., Madsen, H., and Rosbjerg, D.: Parameter estimation in distributed hydrological modelling:
comparison of global and local optimisation techniques, Hydrol. Res., 38, 451–476, <a href="https://doi.org/10.2166/nh.2007.024" target="_blank">https://doi.org/10.2166/nh.2007.024</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>9</label><mixed-citation>
       Botto, A., Belluco, E., and Camporese, M.: Multi-source data assimilation for physically based hydrological
modeling of an experimental hillslope, Hydrol. Earth Syst. Sci., 22, 4251–4266, <a href="https://doi.org/10.5194/hess-22-4251-2018" target="_blank">https://doi.org/10.5194/hess-22-4251-2018</a>,
2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>10</label><mixed-citation>
       Cao, M., Dai, Z., Chen, J., Yin, H., Zhang, X., Wu, J., Thanh, H. V., and Soltanian, M. R.: An integrated
framework of deep learning and entropy theory for enhanced high-dimensional permeability field identification in
heterogeneous aquifers, Water Res., 268, 122706, <a href="https://doi.org/10.1016/j.watres.2024.122706" target="_blank">https://doi.org/10.1016/j.watres.2024.122706</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>11</label><mixed-citation>
       Carrera, J. and Glorioso, L.: On geostatistical formulations of the groundwater flow inverse problem,
Adv. Water Resour., 14, 273–283, <a href="https://doi.org/10.1016/0309-1708(91)90039-Q" target="_blank">https://doi.org/10.1016/0309-1708(91)90039-Q</a>, 1991.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>12</label><mixed-citation>
       Castaings, W., Dartus, D., Le Dimet, F.-X., and Saulnier, G.-M.: Sensitivity analysis and parameter
estimation for distributed hydrological modeling: potential of variational methods, Hydrol. Earth Syst. Sci., 13,
503–517, <a href="https://doi.org/10.5194/hess-13-503-2009" target="_blank">https://doi.org/10.5194/hess-13-503-2009</a>, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>13</label><mixed-citation>
       Chang, H. and Zhang, D.: Identification of physical processes via combined data-driven and
data-assimilation methods, J. Comput. Phys., 393, 337–350, <a href="https://doi.org/10.1016/j.jcp.2019.05.008" target="_blank">https://doi.org/10.1016/j.jcp.2019.05.008</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>14</label><mixed-citation>
       Chang, Z., Lu, W., and Wang, Z.: Study on source identification and source-sink relationship of LNAPLs
pollution in groundwater by the adaptive cyclic improved iterative process and Monte Carlo stochastic simulation,
J. Hydrol., 612, 128109, <a href="https://doi.org/10.1016/j.jhydrol.2022.128109" target="_blank">https://doi.org/10.1016/j.jhydrol.2022.128109</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>15</label><mixed-citation>
       Chen, J., Dai, Z., Yang, Z., Pan, Y., Zhang, X., Wu, J., and Reza Soltanian, M.: An improved tandem neural
network architecture for inverse modeling of multicomponent reactive transport in porous media, Water Resour. Res.,
57, 2021WR030595, <a href="https://doi.org/10.1029/2021wr030595" target="_blank">https://doi.org/10.1029/2021wr030595</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>16</label><mixed-citation>
       Chen, J., Dai, Z., Dong, S., Zhang, X., Sun, G., Wu, J., Ershadnia, R., Yin, S., and Soltanian, M. R.:
Integration of deep learning and information theory for designing monitoring networks in heterogeneous aquifer
systems, Water Resour. Res., 58, 2022WR032429, <a href="https://doi.org/10.1029/2022wr032429" target="_blank">https://doi.org/10.1029/2022wr032429</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>17</label><mixed-citation>
      
Chen, J., Dai, Z., Yin, S., Zhang, M., and Soltanian, M. R.: Enhancing Inverse Modeling in Groundwater Systems through Machine Learning: A Comprehensive Comparative Study, Zenodo [data set], <a href="https://doi.org/10.5281/zenodo.10499582" target="_blank">https://doi.org/10.5281/zenodo.10499582</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>18</label><mixed-citation>
       Chen, X., Hammond, G. E., Murray, C. J., Rockhold, M. L., Vermeul, V. R., and Zachara, J. M.: Application
of ensemble-based data assimilation techniques for aquifer characterization using tracer data at Hanford 300 area,
Water Resour. Res., 49, 7064–7076, <a href="https://doi.org/10.1002/2012wr013285" target="_blank">https://doi.org/10.1002/2012wr013285</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>19</label><mixed-citation>
       Dai, Z. and Samper, J.: Inverse problem of multicomponent reactive chemical transport in porous media:
Formulation and applications, Water Resour. Res., 40, W07407, <a href="https://doi.org/10.1029/2004wr003248" target="_blank">https://doi.org/10.1029/2004wr003248</a>, 2004.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>20</label><mixed-citation>
       Dai, Z. and Samper, J.: Inverse modeling of water flow and multicomponent reactive transport in coastal
aquifer systems, J. Hydrol., 447–461, <a href="https://doi.org/10.1016/j.jhydrol.2005.11.052" target="_blank">https://doi.org/10.1016/j.jhydrol.2005.11.052</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>21</label><mixed-citation>
       Dragonetti, G., Comegna, A., Ajeel, A., Deidda, G. P., Lamaddalena, N., Rodriguez, G., Vignoli, G., and
Coppola, A.: Calibrating electromagnetic induction conductivities with time-domain reflectometry measurements,
Hydrol. Earth Syst. Sci., 22, 1509–1523, <a href="https://doi.org/10.5194/hess-22-1509-2018" target="_blank">https://doi.org/10.5194/hess-22-1509-2018</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>22</label><mixed-citation>
       Eberhart, R. and Kennedy, J.: Particle swarm optimization, in: Proceedings of the IEEE international conference
on neural networks, Perth, Western Australia, Australia, 27 November 1995, 1942–1948,  <a href="https://doi.org/10.1109/ICNN.1995.488968" target="_blank">https://doi.org/10.1109/ICNN.1995.488968</a>, 1995.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>23</label><mixed-citation>
       Elfwing, S., Uchibe, E., and Doya, K.: Sigmoid-weighted linear units for neural network function
approximation in reinforcement learning, Neural Networks, 107, 3–11, <a href="https://doi.org/10.1016/j.neunet.2017.12.012" target="_blank">https://doi.org/10.1016/j.neunet.2017.12.012</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>24</label><mixed-citation>
       Epp, R., Schmid, F., and Jenny, P.: Fast convergence strategy for ambiguous inverse problems based on
hierarchical regularization, J. Comput. Phys., 489, 112264, <a href="https://doi.org/10.1016/j.jcp.2023.112264" target="_blank">https://doi.org/10.1016/j.jcp.2023.112264</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>25</label><mixed-citation>
       Ghelichkhan, S., Gibson, A., Davies, D. R., Kramer, S. C., and Ham, D. A.: Automatic adjoint-based
inversion schemes for geodynamics: reconstructing the evolution of Earth's mantle in space and time, Geosci. Model
Dev., 17, 5057–5086, <a href="https://doi.org/10.5194/gmd-17-5057-2024" target="_blank">https://doi.org/10.5194/gmd-17-5057-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>26</label><mixed-citation>
       Ginn, T. R. and Cushman, J. H.: Inverse methods for subsurface flow: A critical review of stochastic
techniques, Stoch. Hydrol. Hydraul., 4, 1–26, <a href="https://doi.org/10.1007/BF01547729" target="_blank">https://doi.org/10.1007/BF01547729</a>, 1990.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>27</label><mixed-citation>
       Giudici, M.: Some Remarks About Forward and Inverse Modelling in Hydrology, Within a General Conceptual
Framework, Hydrology, 11, 189, <a href="https://doi.org/10.3390/hydrology11110189" target="_blank">https://doi.org/10.3390/hydrology11110189</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>28</label><mixed-citation>
       Goodfellow, I., Bengio, Y., and Courville, A.: Deep learning, MIT press, Cambridge, MA, USA,
800 pp., ISBN 9780262337373, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>29</label><mixed-citation>
       Guo, Q., Liu, M., and Luo, J.: Predictive Deep Learning for High-Dimensional Inverse Modeling of Hydraulic
Tomography in Gaussian and Non-Gaussian Fields, Water Resour. Res., 59, 2023WR035408, <a href="https://doi.org/10.1029/2023wr035408" target="_blank">https://doi.org/10.1029/2023wr035408</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>30</label><mixed-citation>
       He, K., Zhang, X., Ren, S., and Sun, J.: Deep residual learning for image recognition, in: Proceedings of
the IEEE conference on computer vision and pattern recognition, Las Vegas, NV, USA, 27-30 June 2016, 770–778,
<a href="https://doi.org/10.1109/CVPR.2016.90" target="_blank">https://doi.org/10.1109/CVPR.2016.90</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>31</label><mixed-citation>
       Hinton, G. E. and Salakhutdinov, R. R.: Reducing the dimensionality of data with neural networks, Science,
313, 504–507, <a href="https://doi.org/10.1126/science.1127647" target="_blank">https://doi.org/10.1126/science.1127647</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>32</label><mixed-citation>
       Holland John, H.: Adaptation in natural and artificial systems, University of Michigan Press, Ann
Arbor,  ISBN 10 0472084607,  13 978-0472084609, 1975.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>33</label><mixed-citation>
       Hopmans, J. W., Šimùnek, J., Romano, N., and Durner, W.: 3.6.2. Inverse Methods, in: Methods of
Soil Analysis, edited by:  Dane, J. H. and Clarke Topp, G., Wiley, 963–1008,
<a href="https://doi.org/10.2136/sssabookser5.4.c40" target="_blank">https://doi.org/10.2136/sssabookser5.4.c40</a>, 2002.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>34</label><mixed-citation>
       Ines, A. V. M. and Droogers, P.: Inverse modelling in estimating soil hydraulic functions: a Genetic
Algorithm approach, Hydrol. Earth Syst. Sci., 6, 49–66, <a href="https://doi.org/10.5194/hess-6-49-2002" target="_blank">https://doi.org/10.5194/hess-6-49-2002</a>, 2002.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>35</label><mixed-citation>
       Jardani, A., Vu, T. M., and Fischer, P.: Use of convolutional neural networks with encoder-decoder
structure for predicting the inverse operator in hydraulic tomography, J. Hydrol., 604, 127233,
<a href="https://doi.org/10.1016/j.jhydrol.2021.127233" target="_blank">https://doi.org/10.1016/j.jhydrol.2021.127233</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>36</label><mixed-citation>
       Jaumann, S. and Roth, K.: Soil hydraulic material properties and layered architecture from time-lapse GPR,
Hydrol. Earth Syst. Sci., 22, 2551–2573, <a href="https://doi.org/10.5194/hess-22-2551-2018" target="_blank">https://doi.org/10.5194/hess-22-2551-2018</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>37</label><mixed-citation>
       Jose, S. C., Rahman, M. A., and Cirpka, O. A.: Large-scale sandbox experiment on longitudinal effective
dispersion in heterogeneous porous media, Water Resour. Res., 40, W12415, <a href="https://doi.org/10.1029/2004wr003363" target="_blank">https://doi.org/10.1029/2004wr003363</a>, 2004.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>38</label><mixed-citation>
       Kabala, Z. J. and Skaggs, T. H.: Comment on “Minimum relative entropy inversion: Theory and application to
recovering the release history of a groundwater contaminant” by Allan, D. Woodbury and Tadeusz, J. Ulrych, Water
Resour. Res., 34, 2077–2079, <a href="https://doi.org/10.1029/98WR01337" target="_blank">https://doi.org/10.1029/98WR01337</a>, 1998.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>39</label><mixed-citation>
       Kapsoulis, D., Tsiakas, K., Trompoukis, X., Asouti, V., and Giannakoglou, K.: A PCA-assisted hybrid
algorithm combining EAs and adjoint methods for CFD-based optimization, Appl. Soft Comput., 73, 520–529,
<a href="https://doi.org/10.1016/j.asoc.2018.09.002" target="_blank">https://doi.org/10.1016/j.asoc.2018.09.002</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>40</label><mixed-citation>
       Keum, J., Coulibaly, P., Razavi, T., Tapsoba, D., Gobena, A., Weber, F., and Pietroniro, A.: Application of
SNODAS and hydrologic models to enhance entropy-based snow monitoring network design, J. Hydrol., 561, 688–701,
<a href="https://doi.org/10.1016/j.jhydrol.2018.04.037" target="_blank">https://doi.org/10.1016/j.jhydrol.2018.04.037</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>41</label><mixed-citation>
       Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P.: Optimization by Simulated Annealing, Science, 220,
671–680, <a href="https://doi.org/10.1126/science.220.4598.671" target="_blank">https://doi.org/10.1126/science.220.4598.671</a>, 1983.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>42</label><mixed-citation>
       Kool, J. B., Parker, J. C., and van Genuchten, M. T.: Parameter estimation for unsaturated flow and
transport models-A review, J. Hydrol., 91, 255–293, <a href="https://doi.org/10.1016/0022-1694(87)90207-1" target="_blank">https://doi.org/10.1016/0022-1694(87)90207-1</a>, 1987.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>43</label><mixed-citation>
       Kuang, W., Yuan, C., and Zhang, J.: Real-time determination of earthquake focal mechanism via deep
learning, Nat. Commun., 12, 1432, <a href="https://doi.org/10.1038/s41467-021-21670-x" target="_blank">https://doi.org/10.1038/s41467-021-21670-x</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>44</label><mixed-citation>
       LeCun, Y., Bengio, Y., and Hinton, G.: Deep learning, Nature, 521, 436–444, <a href="https://doi.org/10.1038/nature14539" target="_blank">https://doi.org/10.1038/nature14539</a>,
2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>45</label><mixed-citation>
       Li, E.: An adaptive surrogate assisted differential evolutionary algorithm for high dimensional constrained
problems, Appl. Soft Comput., 85, 105752, <a href="https://doi.org/10.1016/j.asoc.2019.105752" target="_blank">https://doi.org/10.1016/j.asoc.2019.105752</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>46</label><mixed-citation>
       Lindsay, A., McCloskey, J., and Bhloscaidh, M. N.: Using a genetic algorithm to estimate the details of
earthquake slip distributions from point surface displacements, J. Geophys. Res.-Sol. Ea., 121, 1796–1820,
<a href="https://doi.org/10.1002/2015jb012181" target="_blank">https://doi.org/10.1002/2015jb012181</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>47</label><mixed-citation>
       Liu, D., Tan, Y., Khoram, E., and Yu, Z.: Training deep neural networks for the inverse design of
nanophotonic structures, ACS Photonics, 5, 1365–1369, <a href="https://doi.org/10.1021/acsphotonics.7b01377" target="_blank">https://doi.org/10.1021/acsphotonics.7b01377</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>48</label><mixed-citation>
       Liu, M., Ahmad, R., Cai, W., and Mukerji, T.: Hierarchical Homogenization With Deep-Learning-Based
Surrogate Model for Rapid Estimation of Effective Permeability From Digital Rocks, J. Geophys. Res.-Sol. Ea., 128,
e2022JB025378, <a href="https://doi.org/10.1029/2022jb025378" target="_blank">https://doi.org/10.1029/2022jb025378</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>49</label><mixed-citation>
       Loève, M.: Probability Theory, Van Nostrand, New York, 1955.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>50</label><mixed-citation>
       Long, Y., Ren, J., Li, Y., and Chen, H.: Inverse design of photonic topological state via machine learning,
Appl. Phys. Lett., 114, 181105, <a href="https://doi.org/10.1063/1.5094838" target="_blank">https://doi.org/10.1063/1.5094838</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>51</label><mixed-citation>
       Luo, J., Ma, X., Ji, Y., Li, X., Song, Z., and Lu, W.: Review of machine learning-based surrogate models of
groundwater contaminant modeling, Environ. Res., 238, 117268, <a href="https://doi.org/10.1016/j.envres.2023.117268" target="_blank">https://doi.org/10.1016/j.envres.2023.117268</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>52</label><mixed-citation>
       Ma, J., Xia, D., Guo, H., Wang, Y., Niu, X., Liu, Z., and Jiang, S.: Metaheuristic-based support vector
regression for landslide displacement prediction: a comparative study, Landslides, 19, 2489–2511,
<a href="https://doi.org/10.1007/s10346-022-01923-6" target="_blank">https://doi.org/10.1007/s10346-022-01923-6</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>53</label><mixed-citation>
       Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., and Frey, B.: Adversarial autoencoders, arXiv
[preprint], <a href="https://doi.org/10.48550/arXiv.1511.05644" target="_blank">https://doi.org/10.48550/arXiv.1511.05644</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>54</label><mixed-citation>
       Mariethoz, G. and Caers, J.: Multiple-point geostatistics: stochastic modeling with training images, Wiley
Blackwell, 364 pp., <a href="https://doi.org/10.1002/9781118662953" target="_blank">https://doi.org/10.1002/9781118662953</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>55</label><mixed-citation>
       Mariethoz, G., Renard, P., and Straubhaar, J.: The Direct Sampling method to perform multiple-point
geostatistical simulations, Water Resour. Res., 46, W11536, <a href="https://doi.org/10.1029/2008WR007621" target="_blank">https://doi.org/10.1029/2008WR007621</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>56</label><mixed-citation>
       McLaughlin, D. and Townley, L. R.: A Reassessment of the Groundwater Inverse Problem, 32, 1131–1161,
<a href="https://doi.org/10.1029/96WR00160" target="_blank">https://doi.org/10.1029/96WR00160</a>, 1996.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>57</label><mixed-citation>
       Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E.: Equation of state
calculations by fast computing machines, J. Chem. Phys., 21, 1087–1092, <a href="https://doi.org/10.1063/1.1699114" target="_blank">https://doi.org/10.1063/1.1699114</a>, 1953.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>58</label><mixed-citation>
       Mo, S., Zabaras, N., Shi, X., and Wu, J.: Deep autoregressive neural networks for high-dimensional inverse
problems in groundwater contaminant source identification, Water Resour. Res., 55, 3856–3881,
<a href="https://doi.org/10.1029/2018wr024638" target="_blank">https://doi.org/10.1029/2018wr024638</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>59</label><mixed-citation>
       Mo, S., Zabaras, N., Shi, X., and Wu, J.: Integration of adversarial autoencoders with residual dense
convolutional networks for estimation of non-Gaussian hydraulic conductivities, Water Resour. Res., 56, 2019WR026082,
<a href="https://doi.org/10.1029/2019WR026082" target="_blank">https://doi.org/10.1029/2019WR026082</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib60"><label>60</label><mixed-citation>
       Neupauer, R. M., Borchers, B., and Wilson, J. L.: Comparison of inverse methods for reconstructing the
release history of a groundwater contamination source, Water Resour. Res., 36, 2469–2475, <a href="https://doi.org/10.1029/2000WR900176" target="_blank">https://doi.org/10.1029/2000WR900176</a>,
2000.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib61"><label>61</label><mixed-citation>
       Nhu, V. H.: Levenberg–Marquardt method for ill-posed inverse problems with possibly non-smooth forward
mappings between Banach spaces, Inverse Probl., 38, 015007, <a href="https://doi.org/10.1088/1361-6420/ac38b7" target="_blank">https://doi.org/10.1088/1361-6420/ac38b7</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib62"><label>62</label><mixed-citation>
       Pérez-Cruz, F., Camps-Valls, G., Soria-Olivas, E., Pérez-Ruixo, J. J., Figueiras-Vidal, A. R., and
Artés-Rodríguez, A.: Multi-dimensional Function Approximation and Regression Estimation, Artificial Neural
Networks  – ICANN 2002, Berlin, Heidelberg, 757–762, <a href="https://doi.org/10.1007/3-540-46084-5_123" target="_blank">https://doi.org/10.1007/3-540-46084-5_123</a>, 2002.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib63"><label>63</label><mixed-citation>
       Plessix, R.: A review of the adjoint-state method for computing the gradient of a functional with
geophysical applications, Geophys. J. Int., 167, 495–503, <a href="https://doi.org/10.1111/j.1365-246X.2006.02978.x" target="_blank">https://doi.org/10.1111/j.1365-246X.2006.02978.x</a>, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib64"><label>64</label><mixed-citation>
       Qin, Y., Kavetski, D., Kuczera, G., McInerney, D., Yang, T., and Guo, Y.: Can Gauss-Newton Algorithms
Outperform Stochastic Optimization Algorithms When Calibrating a Highly Parameterized Hydrological Model? A Case Study
Using SWAT, Water Resour. Res., 58, e2021WR031532, <a href="https://doi.org/10.1029/2021wr031532" target="_blank">https://doi.org/10.1029/2021wr031532</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib65"><label>65</label><mixed-citation>
       Rafiei, V., Nejadhashemi, A. P., Mushtaq, S., Bailey, R. T., and An-Vo, D.-A.: An improved calibration
technique to address high dimensionality and non-linearity in integrated groundwater and surface water models,
Environ. Modell. Softw., 149, 105312, <a href="https://doi.org/10.1016/j.envsoft.2022.105312" target="_blank">https://doi.org/10.1016/j.envsoft.2022.105312</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib66"><label>66</label><mixed-citation>
       Razavi, S., Tolson, B. A., and Burn, D. H.: Review of surrogate modeling in water resources, Water
Resour. Res., 48, W07401, <a href="https://doi.org/10.1029/2011wr011527" target="_blank">https://doi.org/10.1029/2011wr011527</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib67"><label>67</label><mixed-citation>
       Sanchez-Fernandez, M., de-Prado-Cumplido, M., Arenas-Garcia, J., and Perez-Cruz, F.: SVM multiregression
for nonlinear channel estimation in multiple-input multiple-output systems, IEEE T. Signal Proces., 52, 2298–2307,
<a href="https://doi.org/10.1109/tsp.2004.831028" target="_blank">https://doi.org/10.1109/tsp.2004.831028</a>, 2004.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib68"><label>68</label><mixed-citation>
       Sanchez-Vila, X., Donado, L. D., Guadagnini, A., and Carrera, J.: A solution for multicomponent reactive
transport under equilibrium and kinetic reactions, Water Resour. Res., 46, W07539, <a href="https://doi.org/10.1029/2009wr008439" target="_blank">https://doi.org/10.1029/2009wr008439</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib69"><label>69</label><mixed-citation>
       Scharnagl, B., Vrugt, J. A., Vereecken, H., and Herbst, M.: Inverse modelling of in situ soil water
dynamics: investigating the effect of different prior distributions of the soil hydraulic parameters, Hydrol. Earth
Syst. Sci., 15, 3043–3059, <a href="https://doi.org/10.5194/hess-15-3043-2011" target="_blank">https://doi.org/10.5194/hess-15-3043-2011</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib70"><label>70</label><mixed-citation>
       Schneider-Zapp, K., Ippisch, O., and Roth, K.: Numerical study of the evaporation process and parameter
estimation analysis of an evaporation experiment, Hydrol. Earth Syst. Sci., 14, 765–781,
<a href="https://doi.org/10.5194/hess-14-765-2010" target="_blank">https://doi.org/10.5194/hess-14-765-2010</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib71"><label>71</label><mixed-citation>
       Shen, C., Appling, A. P., Gentine, P., Bandai, T., Gupta, H., Tartakovsky, A., Baity-Jesi, M., Fenicia, F.,
Kifer, D., Li, L., Liu, X., Ren, W., Zheng, Y., Harman, C. J., Clark, M., Farthing, M., Feng, D., Kumar, P.,
Aboelyazeed, D., Rahmani, F., Song, Y., Beck, H. E., Bindas, T., Dwivedi, D., Fang, K., Höge, M., Rackauckas, C.,
Mohanty, B., Roy, T., Xu, C., and Lawson, K.: Differentiable modelling to unify machine learning and physical models
for geosciences, Nature Reviews Earth and Environment, 4, 552–567, <a href="https://doi.org/10.1038/s43017-023-00450-9" target="_blank">https://doi.org/10.1038/s43017-023-00450-9</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib72"><label>72</label><mixed-citation>
       Sorensen, J. P. R. and Butcher, A. S.: Water Level Monitoring Pressure Transducers – A Need for
Industry-Wide Standards, Groundwater Monitoring and Remediation, 31, 56–62, <a href="https://doi.org/10.1111/j.1745-6592.2011.01346.x" target="_blank">https://doi.org/10.1111/j.1745-6592.2011.01346.x</a>,
2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib73"><label>73</label><mixed-citation>
       Steefel, C., Depaolo, D., and Lichtner, P.: Reactive transport modeling: An essential tool and a new
research approach for the earth sciences, Earth Planet. Sc. Lett., 240, 539–558, <a href="https://doi.org/10.1016/j.epsl.2005.09.017" target="_blank">https://doi.org/10.1016/j.epsl.2005.09.017</a>,
2005.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib74"><label>74</label><mixed-citation>
       Sternagel, A., Loritz, R., Klaus, J., Berkowitz, B., and Zehe, E.: Simulation of reactive solute transport
in the critical zone: a Lagrangian model for transient flow and preferential transport, Hydrol. Earth Syst. Sci., 25,
1483–1508, <a href="https://doi.org/10.5194/hess-25-1483-2021" target="_blank">https://doi.org/10.5194/hess-25-1483-2021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib75"><label>75</label><mixed-citation>
       Storn, R. and Price, K.: Differential Evolution – A Simple and Efficient Heuristic for global Optimization
over Continuous Spaces, J. Global Optim., 11, 341–359, <a href="https://doi.org/10.1023/A:1008202821328" target="_blank">https://doi.org/10.1023/A:1008202821328</a>, 1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib76"><label>76</label><mixed-citation>
       Sun, A. Y.: Discovering state-parameter mappings in subsurface models using generative adversarial
networks, Geophys. Res. Lett., 45, 11137–11146, <a href="https://doi.org/10.1029/2018gl080404" target="_blank">https://doi.org/10.1029/2018gl080404</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib77"><label>77</label><mixed-citation>
       Sun, N.: Inverse problems in groundwater modeling, Springer Science and Business Media, 338 pp.,
<a href="https://doi.org/10.1007/978-94-017-1970-4" target="_blank">https://doi.org/10.1007/978-94-017-1970-4</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib78"><label>78</label><mixed-citation>
       Tran, H.-N., Phan, G. T. T., Do, Q. B., and Tran, V.-P.: Comparative evaluation of the performance of
improved genetic algorithms and differential evolution for in-core fuel management of a research reactor, Nucl.
Eng. Des., 398, 111953, <a href="https://doi.org/10.1016/j.nucengdes.2022.111953" target="_blank">https://doi.org/10.1016/j.nucengdes.2022.111953</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib79"><label>79</label><mixed-citation>
       Travaš, V., Zaharija, L., Stipanić, D., and Družeta, S.: Estimation of hydraulic conductivity
functions in karst regions by particle swarm optimization with application to Lake Vrana, Croatia, Hydrol. Earth
Syst. Sci., 27, 1343–1359, <a href="https://doi.org/10.5194/hess-27-1343-2023" target="_blank">https://doi.org/10.5194/hess-27-1343-2023</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib80"><label>80</label><mixed-citation>
       Tsai, F. T. C., Sun, N., and Yeh, W. W. G.: Global-local optimization for parameter structure
identification in three-dimensional groundwater modeling, Water Resour. Res., 39, 1043, <a href="https://doi.org/10.1029/2001wr001135" target="_blank">https://doi.org/10.1029/2001wr001135</a>,
2003.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib81"><label>81</label><mixed-citation>
       Tuia, D., Verrelst, J., Alonso, L., Perez-Cruz, F., and Camps-Valls, G.: Multioutput support vector
regression for remote sensing biophysical parameter estimation, IEEE Geosci. Remote S., 8, 804–808,
<a href="https://doi.org/10.1109/lgrs.2011.2109934" target="_blank">https://doi.org/10.1109/lgrs.2011.2109934</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib82"><label>82</label><mixed-citation>
       Vrugt, J. A.: Markov chain Monte Carlo simulation using the DREAM software package: Theory, concepts, and
MATLAB implementation, Environ. Modell. Softw., 75, 273–316, <a href="https://doi.org/10.1016/j.envsoft.2015.08.013" target="_blank">https://doi.org/10.1016/j.envsoft.2015.08.013</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib83"><label>83</label><mixed-citation>
       Wang, G. S. and Chen, S. L.: Evaluation of a soil greenhouse gas emission model based on Bayesian inference
and MCMC: Parameter identifiability and equifinality, Ecol. Model., 253, 107–116,
<a href="https://doi.org/10.1016/j.ecolmodel.2012.09.011" target="_blank">https://doi.org/10.1016/j.ecolmodel.2012.09.011</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib84"><label>84</label><mixed-citation>
       Wang, N., Chang, H., and Zhang, D.: Deep-learning-based inverse modeling approaches: A subsurface flow
example, J. Geophys. Res.-Sol. Ea., 126, 2020JB020549, <a href="https://doi.org/10.1029/2020jb020549" target="_blank">https://doi.org/10.1029/2020jb020549</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib85"><label>85</label><mixed-citation>
       Wang, Y., Fang, Z., and Hong, H.: Comparison of convolutional neural networks for landslide susceptibility
mapping in Yanshan County, China, Sci. Total Environ., 666, 975–993, <a href="https://doi.org/10.1016/j.scitotenv.2019.02.263" target="_blank">https://doi.org/10.1016/j.scitotenv.2019.02.263</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib86"><label>86</label><mixed-citation>
       Xia, C.-A., Luo, X., Hu, B. X., Riva, M., and Guadagnini, A.: Data assimilation with multiple types of
observation boreholes via the ensemble Kalman filter embedded within stochastic moment equations, Hydrol. Earth
Syst. Sci., 25, 1689–1709, <a href="https://doi.org/10.5194/hess-25-1689-2021" target="_blank">https://doi.org/10.5194/hess-25-1689-2021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib87"><label>87</label><mixed-citation>
       Xiao, C., Deng, Y., and Wang, G.: Deep-Learning-Based Adjoint State Method: Methodology and Preliminary
Application to Inverse Modeling, Water Resour. Res., 57, 2020WR027400, <a href="https://doi.org/10.1029/2020wr027400" target="_blank">https://doi.org/10.1029/2020wr027400</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib88"><label>88</label><mixed-citation>
       Xu, T., Spycher, N., Sonnenthal, E., Zhang, G., Zheng, L., and Pruess, K.: TOUGHREACT Version 2.0: A
simulator for subsurface reactive transport under non-isothermal multiphase flow conditions, Comput. Geosci., 37,
763–774, <a href="https://doi.org/10.1016/j.cageo.2010.10.007" target="_blank">https://doi.org/10.1016/j.cageo.2010.10.007</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib89"><label>89</label><mixed-citation>
       Xu, Z., Serata, R., Wainwright, H., Denham, M., Molins, S., Gonzalez-Raymat, H., Lipnikov, K.,
Moulton, J. D., and Eddy-Dilek, C.: Reactive transport modeling for supporting climate resilience at groundwater
contamination sites, Hydrol. Earth Syst. Sci., 26, 755–773, <a href="https://doi.org/10.5194/hess-26-755-2022" target="_blank">https://doi.org/10.5194/hess-26-755-2022</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib90"><label>90</label><mixed-citation>
       Yan, Z., Ran, J., Xiao, Y., Xu, Z., Wu, H., Deng, X. L., Du, L., and Zhong, M.: The Temporal Improvement of
Earth's Mass Transport Estimated by Coupling GRACE-FO With a Chinese Polar Gravity Satellite Mission,
J. Geophys. Res.-Sol. Ea., 128, e2023JB027157, <a href="https://doi.org/10.1029/2023jb027157" target="_blank">https://doi.org/10.1029/2023jb027157</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib91"><label>91</label><mixed-citation>
       Yang, X., Chen, X., and Smith, M. M.: Deep learning inversion of gravity data for detection of CO<sub>2</sub>
plumes in overlying aquifers, J. Appl. Geophys., 196, 104507, <a href="https://doi.org/10.1016/j.jappgeo.2021.104507" target="_blank">https://doi.org/10.1016/j.jappgeo.2021.104507</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib92"><label>92</label><mixed-citation>
       Yeh, W. W.-G.: Review of Parameter Identification Procedures in Groundwater Hydrology: The Inverse Problem,
Water Resour. Res., 22, 95–108, <a href="https://doi.org/10.1029/WR022i002p00095" target="_blank">https://doi.org/10.1029/WR022i002p00095</a>, 1986.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib93"><label>93</label><mixed-citation>
       Yeung, C., Tsai, J.-M., King, B., Pham, B., Ho, D., Liang, J., Knight, M. W., and Raman, A. P.: Multiplexed
supercell metasurface design and optimization with tandem residual networks, Nanophotonics, 10, 1133–1143,
<a href="https://doi.org/10.1515/nanoph-2020-0549" target="_blank">https://doi.org/10.1515/nanoph-2020-0549</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib94"><label>94</label><mixed-citation>
       Yu, S. and Ma, J.: Deep Learning for Geophysics: Current and Future Trends, Rev. Geophys., 59,
e2021RG000742, <a href="https://doi.org/10.1029/2021rg000742" target="_blank">https://doi.org/10.1029/2021rg000742</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib95"><label>95</label><mixed-citation>
       Zhan, C., Dai, Z., Soltanian, M. R., and Zhang, X.: Stage-wise stochastic deep learning inversion framework
for subsurface sedimentary structure identification, Geophys. Res. Lett., 49, 2021GL095823,
<a href="https://doi.org/10.1029/2021gl095823" target="_blank">https://doi.org/10.1029/2021gl095823</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib96"><label>96</label><mixed-citation>
       Zhan, C., Dai, Z., Yang, Z., Zhang, X., Ma, Z., Thanh, H. V., and Soltanian, M. R.: Subsurface sedimentary
structure identification using deep learning: A review, Earth-Sci. Rev., 239, 104370,
<a href="https://doi.org/10.1016/j.earscirev.2023.104370" target="_blank">https://doi.org/10.1016/j.earscirev.2023.104370</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib97"><label>97</label><mixed-citation>
       Zhang, D. X. and Lu, Z. M.: An efficient, high-order perturbation approach for flow in random porous media
via Karhunen–Loeve and polynomial expansions, J. Comput. Phys., 194, 773–794, <a href="https://doi.org/10.1016/j.jcp.2003.09.015" target="_blank">https://doi.org/10.1016/j.jcp.2003.09.015</a>, 2004.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib98"><label>98</label><mixed-citation>
       Zhang, J., Zeng, L., Chen, C., Chen, D., and Wu, L.: Efficient Bayesian experimental design for contaminant
source identification, Water Resour. Res., 51, 576–598, <a href="https://doi.org/10.1002/2014wr015740" target="_blank">https://doi.org/10.1002/2014wr015740</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib99"><label>99</label><mixed-citation>
       Zhang, J., Lin, G., Li, W., Wu, L., and Zeng, L.: An iterative local updating ensemble smoother for
estimation and uncertainty assessment of hydrologic model parameters with multimodal distributions, Water
Resour. Res., 54, 1716–1733, <a href="https://doi.org/10.1002/2017wr020906" target="_blank">https://doi.org/10.1002/2017wr020906</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib100"><label>100</label><mixed-citation>
       Zhang, J., Cao, C., Nan, T., Ju, L., Zhou, H., and Zeng, L.: A Novel Deep Learning Approach for Data
Assimilation of Complex Hydrological Systems, Water Resour. Res., 60, e2023WR035389, <a href="https://doi.org/10.1029/2023WR035389" target="_blank">https://doi.org/10.1029/2023WR035389</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib101"><label>101</label><mixed-citation>
       Zheng, L. and Samper, J.: Formulation of the inverse problem of non-isothermal multiphase flow and
reactive transport in porous media, in: Developments in Water Science, edited by: Miller, C. T. and Pinder, G. F.,
Elsevier, 1317–1327, <a href="https://doi.org/10.1016/S0167-5648(04)80146-1" target="_blank">https://doi.org/10.1016/S0167-5648(04)80146-1</a>, 2004.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib102"><label>102</label><mixed-citation>
       Zhou, H., Gómez-Hernández, J. J., and Li, L.: Inverse methods in hydrogeology: Evolution and
recent trends, Adv. Water Resour., 63, 22–37, <a href="https://doi.org/10.1016/j.advwatres.2013.10.014" target="_blank">https://doi.org/10.1016/j.advwatres.2013.10.014</a>, 2014.

    </mixed-citation></ref-html>--></article>
