<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">HESS</journal-id><journal-title-group>
    <journal-title>Hydrology and Earth System Sciences</journal-title>
    <abbrev-journal-title abbrev-type="publisher">HESS</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Hydrol. Earth Syst. Sci.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1607-7938</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/hess-30-3165-2026</article-id><title-group><article-title>A hybrid Kolmogorov–Arnold Networks-based model with attention for predicting Arctic river streamflow</article-title><alt-title>A hybrid Kolmogorov–Arnold Networks-based model</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Zhou</surname><given-names>Renjie</given-names></name>
          <email>renjie.zhou@shsu.edu</email>
        <ext-link>https://orcid.org/0000-0003-4696-0915</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Liu</surname><given-names>Shiqi</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>Department of Environmental and Geosciences, Sam Houston State University, Huntsville, TX 77340, USA</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, 11A, Datun Road, Chaoyang District, Beijing 100101, China</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Renjie Zhou (renjie.zhou@shsu.edu)</corresp></author-notes><pub-date><day>22</day><month>May</month><year>2026</year></pub-date>
      
      <volume>30</volume>
      <issue>10</issue>
      <fpage>3165</fpage><lpage>3183</lpage>
      <history>
        <date date-type="received"><day>24</day><month>July</month><year>2025</year></date>
           <date date-type="rev-request"><day>8</day><month>September</month><year>2025</year></date>
           <date date-type="rev-recd"><day>1</day><month>May</month><year>2026</year></date>
           <date date-type="accepted"><day>9</day><month>May</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Renjie Zhou</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://hess.copernicus.org/articles/30/3165/2026/hess-30-3165-2026.html">This article is available from https://hess.copernicus.org/articles/30/3165/2026/hess-30-3165-2026.html</self-uri><self-uri xlink:href="https://hess.copernicus.org/articles/30/3165/2026/hess-30-3165-2026.pdf">The full text article is available as a PDF file from https://hess.copernicus.org/articles/30/3165/2026/hess-30-3165-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e95">Arctic rivers represent important components of the Arctic and global hydrological and climate systems, serving as dynamic conduits between terrestrial and marine environments in some rapidly changing regions. They transport freshwater, sediments, nutrients, and carbon from vast watersheds to the Arctic Ocean and affect ocean circulation patterns and regional climate dynamics. Despite their importance, modeling Arctic rivers remains challenging because of sparse data networks, unique cryospheric dynamics, and complex responses to hydrometeorological variables. In this study, a novel hybrid deep learning model is developed to address these challenges and predict Arctic river discharge by incorporating Kolmogorov–Arnold Networks (KAN), Long Short-Term Memory, and the attention mechanism with seasonal trigonometry encoding and physics-based constraints.  It integrates several novel components: (1) A KAN-based deep learning component learns and captures intricate temporal patterns from nonlinear hydrometeorological data; (2) Explicit physical constraints designed for the characteristics of permafrost-dominated watersheds govern snow accumulation and melt processes through the architectural design and loss function; (3) Seasonal variations are accounted for using trigonometry functions to represent cyclical patterns; (4) A residual compensation structure allows the proposed model to revisit systematic errors in initial predictions and helps capture complex nonlinear processes that are not fully represented. The Kolyma River, which is dominated by permafrost, is adopted to test the performance of the newly developed model. It obtains more robust and accurate predictive performance compared to baseline models. The role of physical constraints, the residual compensated architecture, and the trigonometry encoding are assessed by ablation analysis. The results indicate that these components improve the predictive performance. This novel approach offers a new pathway for addressing key challenges of hydrological forecasting in cold, permafrost-dominated regions and provides a robust framework for improving Arctic river discharge prediction.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>National Science Foundation</funding-source>
<award-id>2407963</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e107">Arctic rivers are integral to the Arctic's hydrological cycle and global climate systems and have undergone significant changes in recent years (Rawlins and Karmalkar, 2024). They are essential for transporting vast amounts of freshwater, sediments, and organic matter from terrestrial sources to the Arctic Ocean and sustaining the biodiversity of the region and supporting unique ecosystems (Tank et al., 2023; Liu et al., 2025; Vonk et al., 2025). The intricate connections between Arctic rivers and other cryospheric and atmospheric components make them highly sensitive to climate change (Feng et al., 2021). The response to climatic shifts, including changes in precipitation patterns, temperature regimes, snowmelt timing, and evapotranspiration rates in Arctic watersheds, has far-reaching implications for ecosystem stability and introduces significant uncertainties into future climate projections (Peterson et al., 2002).</p>
      <p id="d2e110">Predicting hydrodynamics of Arctic rivers remains challenging due to the region's unique environmental conditions, data scarcity, complex feedback mechanisms, and their nonlinear responses to temperature, rainfall, and evapotranspiration. For example, warming temperatures can accelerate permafrost thaw and alter hydrological cycles in Arctic regions. Temperature thresholds play a crucial role, particularly around the 0 <inline-formula><mml:math id="M1" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula> mark, where phase changes in precipitation and surface water create abrupt shifts in river dynamics (Prowse et al., 2011; Walvoord and Kurylyk, 2016). These temperature-dependent transitions are further complicated by permafrost thawing, which destabilizes riverbanks, modifies groundwater flow paths, changes groundwater-surface water interactions, and increases sediment and nutrient loads, creating intricate feedback loops and complicates flow predictions (McClelland et al., 2004; Wang et al., 2021).</p>
      <p id="d2e123">Over the last several decades, significant efforts have been directed towards forecasting the responses of river discharge to hydrometeorological conditions and understanding the underlying driving mechanisms (Gelfan et al., 2017; Jin et al., 2024a; Wang et al., 2021; Zhang et al., 2023; Zhou and Zhang, 2023a). These approaches can be broadly categorized into process-based models and empirical models. Process-based models simulate detailed physical and chemical processes within hydrological systems. For example, Gelfan et al. (2017) employed process-based hydrological models, including the HYdrological Predictions for the Environment (HYPE) and ECOlogical Model for Applied Geophysics (ECOMAG), to simulate the hydrodynamics of the Lena and Mackenzie Rivers and assessed the impacts of climate change. Similarly, Krogh et al. (2017) developed a physics-based hydrological model that accounted for key hydrological processes for quantifying water losses at the tundra-taiga transition in a small Arctic basin. While these process-based approaches provide valuable insights into the underlying hydrological processes and mechanisms, their successful implementation usually requires extensive parameterization and detailed characterization of environmental conditions, such as topography, spatially distributed hydrological parameters, and vegetation patterns. Such comprehensive data requirements pose significant challenges in Arctic regions, where remote locations, limited infrastructure, and harsh climatic conditions constrain field measurements and sustained monitoring campaigns (Gao et al., 2020). In contrast, empirical models, particularly data-driven approaches, focus on establishing direct mappings between input and output variables without requiring comprehensive understanding of the underlying hydrological systems (Zhou and Zhang, 2022b).</p>
      <p id="d2e126">Recently, data-driven models have been increasingly developed and used to simulate hydrodynamics and characterize hydrological systems in Arctic regions. For instance, Zhang et al. (2023) simulated the streamflow changes of several major Arctic rivers with meteorological conditions using a Support Vector Regression model. This machine learning model was then used to estimate responses of these rivers to the elevated temperature and precipitation conditions. Singh et al. (2020) implemented several convolutional neural networks models (CNN), including UNet, SegNet, Deeplab and DenseNet, to estimate surface concentration of river ice. Their approach demonstrated improved estimation performance compared to existing methods by addressing the key challenge of noise and errors in the limited available training data.  Sergeev et al. (2024) developed a hybrid model integrating wavelet transform with long short-term memory (LSTM) networks for predicting Arctic methane concentration with greenhouse gases data monitored from the Belyy Island in Russia.</p>
      <p id="d2e130">Despite these advances, significant challenges remain in modeling intricate river systems. Current deep learning approaches often struggle to capture complex and nonlinear relationships between meteorological variables and river discharge (Jin et al., 2024b; Zhou et al., 2024a). To improve the performance with nonlinear data such as rainfall-runoff relationship, many technologies have been developed. For example, Basu et al. (2022) proposed a nonlinear autoregressive model with exogenous variables for flood prediction in Ireland. Bakhshi Ostadkalayeh et al. (2023) used Kalman Filter (KF) to manage nonlinear systems and improve LSTM performance for forecasting streamflow. Zhou et al. (2024b) integrated the ensemble empirical model decomposition technology with temporal fusion transformers and developed a new hybrid deep learning model for discharge prediction, which outperformed baseline models. Liu et al.  (2024) proposed Kolmogorov–Arnold Networks (KAN) based on the theoretical foundation in the Kolmogorov–Arnold theorem. Unlike traditional neural networks that use fixed activation functions, the KAN model parameterized learnable activation functions on the connections between nodes, which enhances the model's capacity to capture complex nonlinear relationships in data. Beyond predictive skill, unlike conventional MLPs with fixed node activations, KANs parameterize learnable univariate functions on edges, enabling direct visualization and interrogation of the learned input–output relationships (Liu et al., 2024). This property makes KANs attractive in hydrology, where model usefulness includes both accuracy and ability to extract physically meaningful patterns from data.</p>
      <p id="d2e133">In addition, the scarcity of training data in Arctic regions limits the generalization of traditional deep learning models, leading to less satisfying performance (Alzubaidi et al., 2023). Physics-informed neural networks (PINN) and physics-guided deep learning approaches offer a promising solution by incorporating physical constraints and domain knowledge into the learning process (Karniadakis et al., 2021). By embedding physical laws into the loss function, these hybrid approaches can improve prediction accuracy while ensuring physically consistent results (Zhong et al., 2024). A variety of physics-informed deep learning models have been developed and demonstrated promising results in various hydrological applications. For example, Yang et al. (2020) proposed a hydrological model that integrated the physical process with a machine learning model for simulating daily streamflow. This hybrid model obtained accurate predictions for long-term daily streamflow with limited training data and demonstrated the effectiveness of this approach for reducing data requirements. Xie et al.  (2021) integrated physical mechanisms into a deep learning model through both modified loss functions and synthetically generated training samples for forecasting streamflow. Their model outperformed traditional models and highlighted the value of incorporating physical constraints into deep learning frameworks for hydrological modeling.</p>
      <p id="d2e136">To address these challenges and improve predictive performance in permafrost-dominated Arctic rivers, a novel residual compensated physics-informed KAN-LSTM with attention model (RCPIKLA) that integrates seasonal patterns, physics-based constraints, KAN, LSTM and attention is proposed for forecasting Arctic river discharge in this study. This newly proposed model introduces several key innovations that serve specific purposes: (1) a KAN-based deep learning model coupled with LSTM and the attention mechanism, which enables sophisticated feature representation and temporal patterns recognition for nonlinear hydrometeorological data; (2) physical constraints that explicitly govern snow accumulation and melt processes, which improve physical consistency through the architectural design and loss function; (3) a residual compensation structure that combines a physics-informed main network with a specialized residual network, which allows the model to capture physically governed patterns and local anomalies; and (4) a temporal pattern recognition system that incorporates cyclical encoding of seasonal features for seasonal variations.  This integrated approach is specifically designed to address the challenges of hydrological forecasting in cold, permafrost-dominated regions, where snow accumulation and melt play a crucial role in seasonal discharge patterns. The innovative components are integrated to enhance its predictive accuracy, physical consistency, and ability to handle complex seasonal dynamics and hydrological processes that characterize Arctic river systems.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Study area and data acquisition</title>
      <p id="d2e147">To assess its performance, the newly developed model is tested on the Kolyma River located in the northeaster Siberia (Fig. 1). The Kolyma River is one of the major Arctic rivers with a mean annual discharge of 136 <inline-formula><mml:math id="M2" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">yr</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and the largest river system draining into the East Siberian Sea. The Kolyma watershed is Earth's largest watershed that is 100 % underlain by continuous permafrost (Holmes et al., 2012). The extensive permafrost coverage makes the Kolyma watershed particularly sensitive to climate warming, leading to its unique hydrological behaviors (Spencer et al., 2015). With a drainage basin of approximately 647 000 <inline-formula><mml:math id="M3" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, the Kolyma River flows through diverse landscapes including the Kolyma Mountains, permafrost regions, and tundra ecosystems. The river's discharge regime is characterized by a distinctive seasonal pattern, with peak flows occurring during the spring snowmelt period (May–June) and low flows during the winter months when the river is ice-covered (Bring et al., 2016).</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e183">The geographic location and topography of the Kolyma catchment.</p></caption>
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3165/2026/hess-30-3165-2026-f01.png"/>

      </fig>

      <p id="d2e192">In this study, monthly temperature (<inline-formula><mml:math id="M4" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula>), precipitation (<inline-formula><mml:math id="M5" display="inline"><mml:mi>P</mml:mi></mml:math></inline-formula>) and potential evapotranspiration (PET) are used as input variables for forecasting discharge values of the Kolyma River. The Kolyma discharge records (1978–2020) at the Kolymsk gauge station (68.73° N, 158.72° E) are obtained from the ArcticGRO Discharge Dataset Version 20231204 (<uri>https://arcticgreatrivers.org/data</uri>, last access: 5 September 2025). Note that the historical discharge data of the Kolyma River is not used as input variables in this study, which allows the model to establish direct relationships between hydrometeorological drivers and river discharge without incorporating autoregressive components, thereby focusing specifically on how climatic factors influence discharge patterns in permafrost-dominated watersheds.  Gridded monthly average 2 <inline-formula><mml:math id="M6" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">m</mml:mi></mml:mrow></mml:math></inline-formula> temperature and potential evapotranspiration with a resolution of 0.5° are obtained from CRU TS v. 4.07 (Harris et al., 2020). Additionally, monthly precipitation data at a 0.5° resolution are obtained from the Global Precipitation Climatology Centre (GPCC) dataset (Schneider et al., 2022). The complete dataset spans from January 1978 to December 2020, which is partitioned into training (80 %) and testing (20 %) datasets for model development and performance assessment.</p>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Methodology</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Kolmogorov–Arnold Networks</title>
      <p id="d2e235">In the Kolmogorov–Arnold representation theorem, it states that any continuous multivariate function can be represented as a superposition of continuous functions of a single variable (Kůrková, 1992). Based on this theoretical foundation and the mechanism of decomposing the multivariate function into various univariate functions, the Kolmogorov–Arnold Networks model (KAN) was developed by replacing all weight parameters with univariate functions parameterized as splines, rather than using Multi-Layer Perceptrons (MLPs) in traditional neural networks, as illustrated in Fig. 2 (Liu et al., 2024). This structure allows the KAN model to dynamically adapt its processing to various aspects of the data and emphasize finer details by modulating the granularity of these splines (Granata et al., 2024). With learnable activation functions and structured transformations, it can effectively extract nonlinear relationships and capture intricate patterns, making it well-suited for modeling complex hydrological systems like Arctic river discharge.</p>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e240">The structure of Kolmogorov–Arnold Networks (KAN) compared to MLP.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3165/2026/hess-30-3165-2026-f02.png"/>

        </fig>

      <p id="d2e249">In this newly developed hybrid model, the KAN module is used as an advanced feature transformation block and a nonlinear feature extractor that processes raw hydrological and meteorological inputs before the sequential modeling stage. The architecture of the KAN module is composed of several parts: (1) Input expansion: the raw input features including precipitation, temperature and evapotranspiration are first projected into a higher dimensional space by a fully connected layer that increases the representational capacity. The dimension expansion of the input features allows the model to isolate some nonlinear interactions between variables, such as temperature-driven snowmelt thresholds or precipitation-phase transitions; (2) Nonlinear activation: a Gaussian Error Linear Unit (GELU) activation is then applied to the expanded features. The GELU function introduces smooth nonlinearity and enables the network to capture intricate patterns in the input data, which approximates the role of univariate functions in the Kolmogorov–Arnold theorem while avoiding the computational overhead of spline optimization; (3) Dimensionality reduction: a second linear layer then compresses the activated features down to a lower-dimensional space which is then fused with physics-based constraints, such as snowpack dynamics and fed into the LSTM-Attention network for temporal integration. It aims at effectively distilling the information into a compact, yet expressive representation that is more amenable for subsequent processing. The KAN transformation and processing steps can be expressed as the following equations accordingly:

                <disp-formula specific-use="gather" content-type="numbered"><mml:math id="M7" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E1"><mml:mtd><mml:mtext>1</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold">W</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mi>X</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">b</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E2"><mml:mtd><mml:mtext>2</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>=</mml:mo><mml:mtext>GELU</mml:mtext><mml:mo>(</mml:mo><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>)</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E3"><mml:mtd><mml:mtext>3</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mtext>KAN</mml:mtext><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi mathvariant="bold">W</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:msub><mml:mi>H</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">b</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

          where <inline-formula><mml:math id="M8" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> is the input features; <inline-formula><mml:math id="M9" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">W</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M10" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold">W</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> refer to the expansion and compression weight matrices; <inline-formula><mml:math id="M11" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">b</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M12" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">b</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> are the corresponding bias vectors; GELU is the Gaussian Error Linear Unit activation function.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Long Short-Term Memory</title>
      <p id="d2e415">Following the Kolmogorov–Arnold transformation, the processed input features will enter the Long Short-Term Memory (LSTM) module. LSTM is a modified variant of recurrent neural networks (RNNs), specifically designed to address the vanishing gradient problem while learning long-term dependencies in sequential data (Hochreiter and Schmidhuber, 1997). By incorporating the gating mechanism and a hidden state, the LSTM model can efficiently regulate information flow through the network and selectively remember or forget information in long sequences. Because of its ability to capture temporal dependencies inherent in river systems, the LSTM model has been widely used in a variety of hydrological models (Gao et al., 2020; Zhou and Zhang, 2023b). It aims at learning and identifying important historical patterns in meteorological variables (such as temperature and precipitation) that influence current river discharge, while simultaneously recognizing the varying time lags between these inputs and their hydrological responses. This capability makes LSTMs especially suitable for modeling Arctic river systems, where discharge patterns are influenced by both immediate meteorological conditions and longer-term processes such as snowmelt and permafrost dynamics (Kratzert et al., 2018).</p>
      <p id="d2e418">As illustrated in Fig. 3, the memory cell of each repetitive LSTM block is primarily composed of three gates: the input gate (<inline-formula><mml:math id="M13" display="inline"><mml:mrow><mml:msub><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>), forget gate (<inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>), and output gate (<inline-formula><mml:math id="M15" display="inline"><mml:mrow><mml:msub><mml:mi>o</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>). The input gate determines which new information should be stored in the cell state. The forget gate decides what information should be discarded from the previous cell state. The output gate controls how much of the cell state should be exposed to the next layer. This gating mechanism allows LSTMs to maintain and update relevant information over long sequences while filtering out irrelevant details (Hochreiter and Schmidhuber, 1997). At any time step <inline-formula><mml:math id="M16" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>, the hidden state (<inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) and the cell state (<inline-formula><mml:math id="M18" display="inline"><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) are calculated based on the previous hidden state (<inline-formula><mml:math id="M19" display="inline"><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) and cell state (<inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) with three logic gates as follows:

                <disp-formula specific-use="gather" content-type="numbered"><mml:math id="M21" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E4"><mml:mtd><mml:mtext>4</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi>f</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msub><mml:msub><mml:mi>X</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msub><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E5"><mml:mtd><mml:mtext>5</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msub><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mi mathvariant="normal">i</mml:mi></mml:msub><mml:msub><mml:mi>X</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mi mathvariant="normal">i</mml:mi></mml:msub><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi mathvariant="normal">i</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E6"><mml:mtd><mml:mtext>6</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msubsup><mml:mi>c</mml:mi><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mi>tanh⁡</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mi mathvariant="normal">c</mml:mi></mml:msub><mml:msub><mml:mi>X</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mi mathvariant="normal">c</mml:mi></mml:msub><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi mathvariant="normal">c</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E7"><mml:mtd><mml:mtext>7</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msub><mml:mi>c</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>⊗</mml:mo><mml:mi>c</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>⊗</mml:mo><mml:msubsup><mml:mi>c</mml:mi><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msubsup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E8"><mml:mtd><mml:mtext>8</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi>o</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="italic">σ</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mi mathvariant="normal">o</mml:mi></mml:msub><mml:msub><mml:mi>X</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mi mathvariant="normal">o</mml:mi></mml:msub><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi mathvariant="normal">o</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E9"><mml:mtd><mml:mtext>9</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi>h</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>o</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>⊗</mml:mo><mml:mi>tanh⁡</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

          where <inline-formula><mml:math id="M22" display="inline"><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M23" display="inline"><mml:mrow><mml:msubsup><mml:mi>c</mml:mi><mml:mi>t</mml:mi><mml:mo>′</mml:mo></mml:msubsup></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are the cell state, candidate cell state, and hidden state at time step <inline-formula><mml:math id="M25" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>, respectively; <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> refers to the input variables processed by the KAN module; <inline-formula><mml:math id="M27" display="inline"><mml:mi>W</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M28" display="inline"><mml:mi>U</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M29" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> are weight matrices and bias vectors whereas subscripts <inline-formula><mml:math id="M30" display="inline"><mml:mi mathvariant="normal">f</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M31" display="inline"><mml:mi mathvariant="normal">i</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M32" display="inline"><mml:mi mathvariant="normal">c</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math id="M33" display="inline"><mml:mi mathvariant="normal">o</mml:mi></mml:math></inline-formula> denote the forget gate, input gate, candidate cell, and output gate; <inline-formula><mml:math id="M34" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M35" display="inline"><mml:mi>tanh⁡</mml:mi></mml:math></inline-formula> are the sigmoid and hyperbolic tangent activation functions; <inline-formula><mml:math id="M36" display="inline"><mml:mo>⊗</mml:mo></mml:math></inline-formula> is the element-wise operation.</p>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e972">The architecture of the LSTM model.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3165/2026/hess-30-3165-2026-f03.png"/>

        </fig>

</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Attention</title>
      <p id="d2e989">A global attention mechanism is incorporated into the LSTM component of the newly proposed model to assign different importance weights to past time steps when making predictions, which enables the model to dynamically weight and aggregate information across temporal sequences. As the influence of historical conditions on current discharge exhibits complex temporal dependencies in hydrological modeling, the attention mechanism can help capture both short-term fluctuations and long-range interactions in input variables. The attention score for each time step can be computed as (Vaswani et al., 2017):

                <disp-formula specific-use="gather" content-type="numbered"><mml:math id="M37" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E10"><mml:mtd><mml:mtext>10</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi>e</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">v</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>tanh⁡</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msub><mml:msub><mml:mi>h</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi mathvariant="normal">a</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E11"><mml:mtd><mml:mtext>11</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mi>exp⁡</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mo>∑</mml:mo><mml:mi>j</mml:mi></mml:msub><mml:mi>exp⁡</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mfrac></mml:mstyle><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E12"><mml:mtd><mml:mtext>12</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi mathvariant="bold-italic">C</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mo>∑</mml:mo><mml:mi>t</mml:mi></mml:msub><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:msub><mml:mi>h</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E13"><mml:mtd><mml:mtext>13</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mi mathvariant="normal">c</mml:mi></mml:msub><mml:mi mathvariant="bold-italic">C</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi mathvariant="normal">c</mml:mi></mml:msub><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

          where <inline-formula><mml:math id="M38" display="inline"><mml:mi>W</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M39" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> denote weight and bias parameters; <inline-formula><mml:math id="M40" display="inline"><mml:mrow><mml:msub><mml:mi>e</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> refers to the attention score at time step <inline-formula><mml:math id="M41" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>; <inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the hidden state from the LSTM component at time step <inline-formula><mml:math id="M43" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>; <inline-formula><mml:math id="M44" display="inline"><mml:mi mathvariant="bold-italic">v</mml:mi></mml:math></inline-formula> is a learnable vector which determines the importance of each hidden state; <inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">α</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the attention weight; <inline-formula><mml:math id="M46" display="inline"><mml:mi mathvariant="bold-italic">C</mml:mi></mml:math></inline-formula> is the context vector that represents a weighted sum of all hidden states; <inline-formula><mml:math id="M47" display="inline"><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover></mml:math></inline-formula> refers to the predicted discharge.</p>
</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title>Physics-informed mechanisms</title>
      <p id="d2e1246">Physics-informed neural networks improve hydrological modeling by combining established physical information with deep learning architectures, which creates a synergistic approach that leverages the strengths of both methodologies. In this study, a hybrid physics-informed approach is implemented through two complementary mechanisms: (1) a dedicated snowpack layer directly integrated into the model architecture, and (2) a physics-constrained loss function. The snowpack layer explicitly simulates snow accumulation and melting processes based on temperature and precipitation. It tracks precipitation falling as snow when temperatures drop below freezing (<inline-formula><mml:math id="M48" display="inline"><mml:mrow><mml:mi>T</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M49" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M50" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula> represents temperature) and computes snowmelt using a temperature-dependent rate function (Hock, 2003):

            <disp-formula id="Ch1.E14" content-type="numbered"><label>14</label><mml:math id="M51" display="block"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi mathvariant="normal">r</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mi mathvariant="normal">m</mml:mi></mml:msub><mml:mo>⋅</mml:mo><mml:mo movablelimits="false">max⁡</mml:mo><mml:mo>(</mml:mo><mml:mi>T</mml:mi><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M52" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi mathvariant="normal">r</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the melting rate, and <inline-formula><mml:math id="M53" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi mathvariant="normal">m</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the melting factor coefficient. The melting factor of 0.5 <inline-formula><mml:math id="M54" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mi mathvariant="normal">°</mml:mi><mml:msup><mml:mi mathvariant="normal">C</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is adopted in this study based on empirical studies of Arctic snowpack dynamics (Hock, 2003). The snowpack mass balance is estimated as follows (DeWalle and Rango, 2008):

            <disp-formula id="Ch1.E15" content-type="numbered"><label>15</label><mml:math id="M55" display="block"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msubsup><mml:mi>P</mml:mi><mml:mi>t</mml:mi><mml:mtext>snow</mml:mtext></mml:msubsup><mml:mo>-</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M56" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> denote the snowpack water equivalent at time <inline-formula><mml:math id="M58" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>; <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the actual snowmelt, which is calculated as <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>min⁡</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi mathvariant="normal">r</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>; <inline-formula><mml:math id="M62" display="inline"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi>t</mml:mi><mml:mtext>snow</mml:mtext></mml:msubsup></mml:mrow></mml:math></inline-formula> refers to the snowfall fraction of precipitation, which is determined by the following equation (Harpold et al., 2017):

            <disp-formula id="Ch1.E16" content-type="numbered"><label>16</label><mml:math id="M63" display="block"><mml:mrow><mml:msubsup><mml:mi>P</mml:mi><mml:mi>t</mml:mi><mml:mtext>snow</mml:mtext></mml:msubsup><mml:mo>=</mml:mo><mml:mfenced open="{" close=""><mml:mtable columnspacing="1em" rowspacing="0.2ex" class="cases" columnalign="left left" framespacing="0em"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mtext>if</mml:mtext><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mi>T</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mtext>otherwise</mml:mtext></mml:mtd></mml:mtr></mml:mtable></mml:mfenced><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the precipitation rate. The initial values of snow storage and melt are set as zero at the beginning. An architectural innovation is that the calculated snowmelt amount is directly added to the data-driven neural network output before the final activation function of the first stage as shown in Fig. 2, creating a hybrid prediction that leverages both physical understanding and learned patterns:

            <disp-formula id="Ch1.E17" content-type="numbered"><label>17</label><mml:math id="M65" display="block"><mml:mrow><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mtext>ReLU</mml:mtext><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mtext>LSTM</mml:mtext></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          Where <inline-formula><mml:math id="M66" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M67" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mtext>LSTM</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> are the predicted discharge from the first stage and the output from the LSTM with attention component, respectively; ReLU refers to rectified linear unit activation function. In addition to the snowpack layer, a physics-constrained loss function is implemented for enforcing physical consistency through the term:

            <disp-formula id="Ch1.E18" content-type="numbered"><label>18</label><mml:math id="M68" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="script">L</mml:mi><mml:mtext>phys</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>n</mml:mi></mml:mfrac></mml:mstyle><mml:msub><mml:mo>∑</mml:mo><mml:mi>i</mml:mi></mml:msub><mml:mo movablelimits="false">max⁡</mml:mo><mml:mo mathsize="1.1em">(</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo mathsize="1.1em">)</mml:mo><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M69" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> is the number of samples, and <inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="script">L</mml:mi><mml:mtext>phys</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> refers to the physics-constrained loss function term. This term penalizes physically inconsistent predictions where the modeled discharge is less than the calculated snowmelt contribution.</p>
      <p id="d2e1708">The calculated snowmelt contribution is one of the major contributors to the discharge rate in permafrost-dominated watersheds, such as the Kolyma River.  While instantaneous discharge can legitimately fall below melt rates due to transient storage in the active layer, evapotranspiration losses, or refreezing during diurnal temperature fluctuations, these effects become negligible at the monthly aggregation scale in large, permafrost-dominated basins like the Kolyma River (Gusev et al., 2015).  Continuous permafrost covering <inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">90</mml:mn></mml:mrow></mml:math></inline-formula> % of the Kolyma basin severely restricts subsurface infiltration and groundwater storage (Walvoord and Kurylyk, 2016; Woo et al., 2008). Unlike temperate watersheds where snowmelt can recharge deep aquifers, the impermeable permafrost layer forces meltwater to travel through the shallow active layer with limited storage capacity. Consequently, snowmelt rapidly converts to surface and near-surface runoff with minimal opportunity for long-term retention (Bring et al., 2016). Also, Arctic rivers such as the Kolyma River and the Lena River exhibit strong discharge seasonality characteristic, with the majority of the annual discharge occurring during summer months (Ye et al., 2003). During these months, snowmelt represents the dominant water source, and the monthly timestep aggregates over 30 <inline-formula><mml:math id="M72" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula> during which daily temperature fluctuations and local-scale heterogeneity in melt timing average out across the entire basin. While refreezing can occur during cold nights or sublimation during clear, windy days, these losses are small relative to the total melt flux at monthly basin-scale aggregation (Suzuki et al., 2015). Therefore, snowmelt represents a dominant and appropriate lower bound on discharge at this spatiotemporal scale (Yang et al., 2002).</p>
      <p id="d2e1729">The asymmetric physical constraint in this study is designed and implemented to reflect both the availability of data and the scale-dependent hydrology of large permafrost-dominated Arctic watersheds. It is worthwhile to note that implementing symmetric upper bound constraints will further increase the physics-informed condition. Future studies should collect comprehensive data and develop more sophisticated, symmetric physics constraints that fully respect mass conservation while accounting for all water balance components.</p>
      <p id="d2e1732">In summary, this dual physics-guided approach is particularly valuable for Arctic rivers where seasonal snow accumulation and permafrost melt dominate the hydrological regime. In these regions, river discharge often exhibits complex, threshold-dependent behaviors and memory effects related to temperature-controlled phase changes in water, and processes that statistical models often struggle to capture accurately without explicit physical constraints. By incorporating both a direct snowmelt contribution mechanism and physics-consistency loss penalties, the proposed model maintains physical realism even when data limitations exist.</p>
</sec>
<sec id="Ch1.S3.SS5">
  <label>3.5</label><title>Residual compensated mechanism</title>
      <p id="d2e1744">While the physics-informed deep learning model may improve prediction accuracy by embedding domain knowledge, they may still fail to capture certain discrepancies between observed and predicted discharge values caused by sources, such as model simplifications, missing hydrological processes, noise in the input data, and extreme events. To address this limitation, a residual compensated mechanism is incorporated. As shown in Fig. 2, the residual compensated framework in the newly proposed model operates in a two-stage process. First, we train a physics-informed KAN-LSTM model that incorporates snowpack dynamics and constraints through the combined loss function (<inline-formula><mml:math id="M73" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="script">L</mml:mi><mml:mtext>combined</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>):

            <disp-formula id="Ch1.E19" content-type="numbered"><label>19</label><mml:math id="M74" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="script">L</mml:mi><mml:mtext>combined</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mi mathvariant="italic">α</mml:mi><mml:msub><mml:mi mathvariant="script">L</mml:mi><mml:mtext>MSE</mml:mtext></mml:msub><mml:mo mathsize="1.1em">(</mml:mo><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>,</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mtext>obs</mml:mtext></mml:msub><mml:mo mathsize="1.1em">)</mml:mo><mml:mo>+</mml:mo><mml:mi mathvariant="italic">β</mml:mi><mml:msub><mml:mi mathvariant="script">L</mml:mi><mml:mtext>phys</mml:mtext></mml:msub><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="script">L</mml:mi><mml:mtext>MSE</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> refers to the mean squared error between the prediction <inline-formula><mml:math id="M76" display="inline"><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:math></inline-formula> and the observation <inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mtext>obs</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>; <inline-formula><mml:math id="M78" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M79" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula> are weighting coefficients that control the relative importance of the data-driven loss (MSE) and physics-informed constraint terms in the combined loss function. These parameters are determined during the model development phase to achieve optimal performance on the testing dataset. In the second stage, the residuals (<inline-formula><mml:math id="M80" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) between observations and physics-based predictions are computed: <inline-formula><mml:math id="M81" display="inline"><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mtext>obs</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. These residuals represent the information discrepancies that the physics-informed KAN-LSTM model fails to capture. A separate residual model (<inline-formula><mml:math id="M82" display="inline"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mtext>res</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>) which has a KAN-LSTM architecture without physics-informed components is trained to specifically learn the discrepancies: <inline-formula><mml:math id="M83" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>R</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mtext>res</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. The final discharge prediction (<inline-formula><mml:math id="M84" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mtext>final</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) is obtained by combining results from the first and second stage:

            <disp-formula id="Ch1.E20" content-type="numbered"><label>20</label><mml:math id="M85" display="block"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mtext>final</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>R</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

          This residual compensated approach has several advantages: on one hand, it preserves the physical consistency by incorporating the physics-informed component during the first stage. On the other hand, the residual prediction in the second stage can focus exclusively on missed patterns and systematic anomalies, creating a specialized representation for complex processes. As a result, it enables end-to-end training where each component focuses on complementary aspects of the hydrological system: the physics-informed deep learning model captures the first-order processes driven by hydrometeorological variables, while the residual model captures secondary influences and complex feedback mechanisms. It is especially beneficial for Arctic river systems, where seasonal transitions and complex cryospheric processes may not be fully captured by simplified physics representations.</p>
</sec>
<sec id="Ch1.S3.SS6">
  <label>3.6</label><title>Evaluation metrics</title>
      <p id="d2e2007">To assess the performance of the proposed model in the Kolyma River, three popular evaluation metrics are adopted in this study: Nash–Sutcliffe Efficiency (NSE), Root Mean Square Error (RMSE) and Kling–Gupta Efficiency (KGE) (Cinkus et al., 2023; Gupta et al., 2009; Zhou and Zhang, 2022a; Kling et al., 2012). NSE is a dimensionless metric widely used in hydrological modeling that measures how well the model predictions match the observed data compared to using the mean of the observations as a predictor (Gupta et al., 2009). An NSE value of 1 indicates a perfect fit, while values approaching zero or negative suggest that the model performs no better than using the mean value of the observed data. The NSE value can be calculated as:

            <disp-formula id="Ch1.E21" content-type="numbered"><label>21</label><mml:math id="M86" display="block"><mml:mrow><mml:mtext>NSE</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mo mathsize="1.1em">(</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mtext>obs</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mrow><mml:mtext>final</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mo mathsize="1.1em">)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mtext>obs</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mstyle><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mtext>obs</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M88" display="inline"><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:math></inline-formula> are the observed discharge value at time step <inline-formula><mml:math id="M89" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> and the average discharge, respectively. In hydrological modeling, NSE values above 0.75 indicate very good model performance (Moriasi et al., 2007). RMSE is an absolute error metric that quantifies the average magnitude of prediction errors in the original units of discharge being predicted. RMSE gives higher weight to large errors due to its squared terms, which makes it particularly useful for evaluating models where large errors are especially undesirable, such as in flood prediction. Lower RMSE values indicate better model performance, with <inline-formula><mml:math id="M90" display="inline"><mml:mrow><mml:mtext>RMSE</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> representing a perfect fit. It is defined as:

            <disp-formula id="Ch1.E22" content-type="numbered"><label>22</label><mml:math id="M91" display="block"><mml:mrow><mml:mtext>RMSE</mml:mtext><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>n</mml:mi></mml:mfrac></mml:mstyle><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mo mathsize="1.1em">(</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mtext>obs</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mrow><mml:mtext>final</mml:mtext><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mo mathsize="1.1em">)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula>

          In addition to NSE and RMSE, the Kling–Gupta Efficiency (KGE) is employed to provide a balanced assessment of model performance. The KGE metric was developed to address certain limitations of NSE, particularly its sensitivity to extreme values and the potential compensation of errors in mean, variance, and correlation (Gupta et al., 2009).  Unlike other metrics, KGE explicitly decomposes model performance into three components: linear correlation, bias ratio, and variability ratio. In this study, the modified KGE is employed, which addresses issues with the original formulation's sensitivity to the magnitude of standard deviations (Kling et al., 2012). The modified KGE (<inline-formula><mml:math id="M92" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula>) is calculated as:

            <disp-formula id="Ch1.E23" content-type="numbered"><label>23</label><mml:math id="M93" display="block"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:msqrt><mml:mrow><mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mtext>kge</mml:mtext></mml:msub><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mtext>kge</mml:mtext></mml:msub><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi mathvariant="italic">γ</mml:mi><mml:mtext>kge</mml:mtext></mml:msub><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M94" display="inline"><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mtext>kge</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> refers to the linear correlation coefficient between observed and simulated discharge; <inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">β</mml:mi><mml:mtext>kge</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> refers to the ratio of simulated mean to observed mean; <inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">γ</mml:mi><mml:mtext>kge</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> denotes the variability ratio. The <inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> ranges theoretically from <inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula> to 1, with <inline-formula><mml:math id="M99" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> indicating perfect agreement between observations and predictions in terms of correlation, bias, and variability. A <inline-formula><mml:math id="M100" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> value of <inline-formula><mml:math id="M101" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.41</mml:mn></mml:mrow></mml:math></inline-formula> represents the performance of using the mean flow as a predictor, serving as a natural benchmark below which model predictions are no better than simply using the long-term average (Knoben et al., 2019). In hydrological modeling applications, <inline-formula><mml:math id="M102" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> values above 0.75 are generally considered very good, values between 0.5 and 0.75 indicate satisfactory performance, and values below 0.5 suggest unsatisfactory model performance (Towner et al., 2019). The use of multiple complementary metrics (NSE, RMSE, and <inline-formula><mml:math id="M103" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula>) provides a comprehensive evaluation framework. While NSE emphasizes matching variance and is sensitive to peak flows, <inline-formula><mml:math id="M104" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> provides balanced assessment across correlation, bias, and variability. RMSE quantifies absolute error magnitude in original units, which is particularly important for operational applications. Together, these metrics enable thorough assessment of model performance across different aspects of discharge prediction, from overall pattern matching to peak flow accuracy.</p>
</sec>
<sec id="Ch1.S3.SS7">
  <label>3.7</label><title>Model implementation and training</title>
      <p id="d2e2432">As shown in Fig. 4, prior to model training, the input variables, including monthly precipitation, temperature and evapotranspiration data, are preprocessed and standardized using the <inline-formula><mml:math id="M105" display="inline"><mml:mi>Z</mml:mi></mml:math></inline-formula>-score normalization technique: <inline-formula><mml:math id="M106" display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mtext>std</mml:mtext></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mi>X</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="italic">μ</mml:mi></mml:mrow><mml:mi mathvariant="italic">σ</mml:mi></mml:mfrac></mml:mstyle></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M107" display="inline"><mml:mi mathvariant="italic">μ</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M108" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula> are the mean and standard deviation computed from the training dataset; <inline-formula><mml:math id="M109" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M110" display="inline"><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mtext>std</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> denote the input values before and after standardization, respectively. This standardization process ensures that features measured on different scales contribute appropriately during training and facilitates model convergence (LeCun et al., 1998).</p>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e2501">The architecture of the residual compensated physics-informed KAN-LSTM model with attention.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3165/2026/hess-30-3165-2026-f04.png"/>

        </fig>

      <p id="d2e2510">In regions dominated by permafrost, snow accumulation and melt typically exhibit strong seasonal periodicity (Andersson et al., 2021; Ernakovich et al., 2014). Discharge patterns are strongly influenced by annual cycles of temperature, snow accumulation, and melt in Arctic hydrological systems (Häkkinen and Mellor, 1992). Accurately capturing such periodic behaviors can help develop robust long-term forecasting models. To include these cyclical patterns and facilitate smooth temporal transition, a trigonometric encoding (TE) of seasonal features is incorporated as input variables using sine and cosine transformations of the calendar month. Specifically, the timestamp is encoded to two features using the following trigonometric transformations:

            <disp-formula id="Ch1.E24" content-type="numbered"><label>24</label><mml:math id="M111" display="block"><mml:mrow><mml:msub><mml:mtext>Month</mml:mtext><mml:mi>sin⁡</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>sin⁡</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mi>m</mml:mi><mml:mn mathvariant="normal">12</mml:mn></mml:mfrac></mml:mstyle></mml:mrow></mml:mfenced><mml:mspace linebreak="nobreak" width="0.125em"/><mml:mo>;</mml:mo><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mspace linebreak="nobreak" width="0.25em"/><mml:msub><mml:mtext>Month</mml:mtext><mml:mi>cos⁡</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>cos⁡</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mi>m</mml:mi><mml:mn mathvariant="normal">12</mml:mn></mml:mfrac></mml:mstyle></mml:mrow></mml:mfenced><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M112" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula> refers to the calendar month <inline-formula><mml:math id="M113" display="inline"><mml:mrow><mml:mi>m</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">2</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">…</mml:mi><mml:mo>,</mml:mo><mml:mn mathvariant="normal">12</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula>. These encodings aim at capturing cyclical temporal patterns without introducing artificial discontinuities between December and January. The trigonometric features are concatenated with other input variables, including temperature, precipitation and evapotranspiration, and fed into the residual-compensated physics-informed KAN-LSTM model with attention.</p>
      <p id="d2e2607">The hyperparameters and configuration settings used in this study are summarized in Table S1 in the Supplement. The choice of hyperparameters balances model capacity with overfitting risk, given the limited training data available.  The LSTM hidden dimension of 64 units and a dropout rate of 0.3 prevent overfitting while capturing essential temporal patterns. The batch size and epoch size are set to 32 and 150, respectively. The optimal physics constraint weight (<inline-formula><mml:math id="M114" display="inline"><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.3</mml:mn></mml:mrow></mml:math></inline-formula>) and the MSE weight (<inline-formula><mml:math id="M115" display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.7</mml:mn></mml:mrow></mml:math></inline-formula>) are adopted by conducting grid search over <inline-formula><mml:math id="M116" display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>∈</mml:mo><mml:mo mathvariant="italic">{</mml:mo><mml:mn mathvariant="normal">0.1</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.3</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.5</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.7</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">0.9</mml:mn><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula> (Fig. S1 in the Supplement).  With these hyperparameters, the newly proposed model trained in the training dataset of the Kolyma River, and then the fine-tuned models are applied to the unseen testing dataset for the assessment of the predictive performance.  The prediction performance is compared with several popular temporal baseline models, including simple RNN, LSTM, and GRU models. Simple RNN is a basic recurrent architecture that processes sequential data by maintaining a hidden state updated at each time step, but it often suffers from vanishing gradients when learning long-term dependencies. LSTM addresses this limitation through its gating mechanisms and a separate cell state, which allows information to persist across long sequences (Hochreiter and Schmidhuber, 1997). GRU simplifies the LSTM architecture by combining the input and forget gates into a single update gate and merging the cell and hidden states, thereby reducing the number of parameters while retaining the ability to model long-range dependencies (Cho et al., 2014). These three recurrent architectures are widely used for sequence modeling and provide meaningful baseline references for assessing the proposed RCPIKLA framework. To assess model stability and minimize the effects of stochastic processes in the training procedure, each model configuration is trained 10 times independently on Google Colab. This repeated training protocol allows assessment of performance variability arising from the inherent stochasticity in the optimization process, including random batch shuffling and numerical precision variations.</p>
      <p id="d2e2667">Ablation analysis is commonly used to assess the contribution of individual model components (Zhi et al., 2023; Zhou, 2025). In this study, we compare three ablation variants: the complete RCPIKLA model, which incorporates both physics-informed constraints and residual compensation; RCKLA-no physics-informed, which retains the residual structure but excludes the physics-informed constraints; and PIKLA-no residual, which includes the physics-informed constraints but removes the residual compensation structure. Each variant is trained 10 times independently at each forecasting horizon from 1 to 12 months, yielding 120 evaluations per model.</p>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Results</title>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Performance comparison among various baseline models with various time steps</title>
      <p id="d2e2686">The model performance across different time steps (1–12 months) reveals variations in predictive capabilities among the models tested. Prediction ensemble means and variability across 10 independent training runs at each forecasting horizon are reported in Tables S2–S4. Presented in Fig. 5, it shows the comparison of NSE, RMSE and <inline-formula><mml:math id="M117" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> values for the Kolyma River discharge predictions using several popular baseline models and the newly proposed residual compensated physics-informed KAN-LSTM model with attention. The NSE values demonstrate that the newly proposed RCPIKLA model consistently outperforms all baseline models across all time steps, achieving the highest NSE values ranging from 0.81 to 0.86. This superior performance is particularly obvious at the time step of 9 months, where RCPIKLA reaches peak NSE values of approximately 0.86. The traditional deep learning models, including the simple RNN, GRU, and LSTM models, show similar performance patterns with NSE values ranging between 0.65 and 0.76. These models exhibit a noticeable decline in performance at medium-range time steps (4–8 months), with their lowest NSE values observed around months 5–6, which suggests limitations in capturing seasonal transitions in Arctic river systems. The RMSE analysis corroborates these findings, with RCPIKLA achieving the lowest error values (7.1–8.5 <inline-formula><mml:math id="M118" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula>) across all time steps. Again, the RCPIKLA model demonstrates lower prediction errors compared to other baseline approaches, which exhibit RMSE values ranging from 9.5 to 11.5 <inline-formula><mml:math id="M119" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula>. The higher RMSE values for Simple RNN, GRU, and LSTM at medium-range time steps further highlight their difficulties in accurately predicting discharge during critical seasonal transition periods. The <inline-formula><mml:math id="M120" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> metric provides additional insights into model performance by decomposing errors into correlation, bias, and variability components. The RCPIKLA model achieves <inline-formula><mml:math id="M121" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> values ranging from 0.74 to 0.82 across all time steps. Similar to NSE, the RCPIKLA model reaches its peak <inline-formula><mml:math id="M122" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> performance of approximately 0.82 at the 9-month time step. The baseline models demonstrate modest <inline-formula><mml:math id="M123" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> performance, with values ranging from 0.64 to 0.73. A notable degradation in <inline-formula><mml:math id="M124" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> performance is observed at the 12-month time step, where the RCPIKLA value drops to approximately 0.74, falling below the 0.75 threshold. This decline likely reflects the challenges of maintaining balanced performance across all three <inline-formula><mml:math id="M125" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> components (correlation, bias, and variability) at very long forecasting horizons. At 12 months, accumulated prediction errors and the increased difficulty in capturing seasonal phase transitions may cause the model's predictions to exhibit greater bias or variability mismatch compared to observations, despite maintaining reasonable correlation.</p>

      <fig id="F5" specific-use="star"><label>Figure 5</label><caption><p id="d2e2785">The means of NSE (left), RMSE (middle) and <inline-formula><mml:math id="M126" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> (right) values of 10 independent runs over various time steps. The models include the residual-compensated physics-informed KAN-LSTM model with attention (RCPIKLA), simple RNN, LSTM, GRU and KAN-LSTM.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3165/2026/hess-30-3165-2026-f05.png"/>

        </fig>

      <p id="d2e2805">The optimal performance at the 9-month input sequence length reflects important temporal characteristics of this permafrost-dominated watershed and the model's capacity to capture structured temporal dependencies. In the Kolyma River basin, current discharge is influenced by hydrometeorological conditions that could span multiple seasons, such as snow accumulation, snowmelt dynamics, and subsequent baseflow recession controlled by active layer storage and permafrost-restricted groundwater flows. The 9-month optimal input window captures the information of seasonal dynamics which provides the model with sufficient temporal context. The attention mechanism further refines this by assigning higher importance to specific antecedent months that strongly influence current discharge. Shorter sequences may fail to capture full seasonal cycles and snow accumulation processes, while longer sequences (10–12 months) likely introduce temporal uncertainties.</p>
      <p id="d2e2809">To complement the mean evaluation metrics, Fig. 6 summarizes the distributions of NSE, RMSE, and <inline-formula><mml:math id="M127" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> values across 10 independent runs for each model architecture. The box plots illustrate the variability and stability of model performance and provide insight into model robustness and generalization ability. The RCPIKLA model demonstrates the best overall performance with the highest median NSE, lowest median RMSE, and highest median <inline-formula><mml:math id="M128" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> along with the narrowest interquartile range. This indicates not only high accuracy but also low variability across runs, suggesting a stable learning and prediction process. Regarding NSE and RMSE, outliers are less frequent and less extreme for RCPIKLA, which indicates a consistently reliable model output. LSTM and Simple RNN exhibit wider interquartile ranges in all metrics' distributions. This means higher sensitivity to random initialization and potential overfitting or underfitting in different runs. GRU shows moderately better consistency than LSTM and Simple RNN but still falls short of the stability achieved by RCPIKLA. RCPIKLA's <inline-formula><mml:math id="M129" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> distribution (0.78–0.82) shows clear separation from the baseline models, with minimal distribution overlap. This distinct separation in <inline-formula><mml:math id="M130" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> performance, combined with greater NSE and lower RMSE, confirms that the newly proposed RCPIKLA model obtains accurate prediction performance, and outperforms other baseline models. These results demonstrate that incorporating physical constraints with the KAN-LSTM model and complementing them with residual learning significantly improve predictive performance for capturing complex patterns in Arctic river discharge.</p>

      <fig id="F6" specific-use="star"><label>Figure 6</label><caption><p id="d2e2858">The box plot of NSE (left), RMSE (middle), and <inline-formula><mml:math id="M131" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> (right) values of multiple models of various time steps (1–12 months) for 10 runs.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3165/2026/hess-30-3165-2026-f06.png"/>

        </fig>

      <p id="d2e2878">The evaluation metrics of LSTM, KAN-LSTM (KAN transformation followed by LSTM without attention, physics constraints, or residual compensation), and RCPIKLA are compared and analyzed. Figure 5 presents this comparison alongside other baselines across multiple forecasting horizons (1–12 months), while Fig. 6 shows the distribution of metrics across 10 independent training runs. The comparison between LSTM and KAN-LSTM shows that KAN-based nonlinear feature transformation can produce consistent improvements across all time steps. Averaged across all forecasting horizons, KAN-LSTM achieves NSE of 0.77 (<inline-formula><mml:math id="M132" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.025</mml:mn></mml:mrow></mml:math></inline-formula>), RMSE of 9.4 <inline-formula><mml:math id="M133" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M134" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.68</mml:mn></mml:mrow></mml:math></inline-formula>), and <inline-formula><mml:math id="M135" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> of 0.75 (<inline-formula><mml:math id="M136" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.027</mml:mn></mml:mrow></mml:math></inline-formula>), compared to LSTM's NSE of 0.70 (<inline-formula><mml:math id="M137" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.034</mml:mn></mml:mrow></mml:math></inline-formula>), RMSE of 10.94 <inline-formula><mml:math id="M138" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula> (<inline-formula><mml:math id="M139" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.61</mml:mn></mml:mrow></mml:math></inline-formula>), and <inline-formula><mml:math id="M140" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> of 0.67 (<inline-formula><mml:math id="M141" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.023</mml:mn></mml:mrow></mml:math></inline-formula>). This represents approximately 12 % improvement in NSE attributable specifically to KAN's learnable univariate functions. At the optimal 9-month time step, KAN-LSTM achieves NSE of 0.78 compared to LSTM's 0.70, which demonstrates that KAN provides substantial value for prediction.</p>
</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Performance comparison among various deep learning models at different value ranges</title>
      <p id="d2e2988">As shown in Fig. 5, the optimal performance of the proposed RCPIKLA model is obtained when the time step is 9 months. In addition to temporal comparisons, the predictive performance across different discharge value ranges is further assessed to evaluate how well each model performs under varying flow conditions, from low to high discharge events. The predicted and observed values of the proposed model and baselines when the time step is 9 months are presented in Figs. 7 and 8. The performance metrics reveal substantial differences in model accuracy. The RCPIKLA model demonstrates more robust performance compared to others across all value ranges with the highest NSE coefficient of 0.856, the lowest RMSE of 7.077 <inline-formula><mml:math id="M142" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula> and the highest <inline-formula><mml:math id="M143" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> of 0.817. This indicates that the proposed hybrid approach, which integrates physics-informed constraints with residual compensation, captures the nonlinear and non-stationary characteristics of the Kolyma River discharge more effectively than other architectures. The GRU model achieves an intermediate performance level (<inline-formula><mml:math id="M144" display="inline"><mml:mrow><mml:mtext>NSE</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.750</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M145" display="inline"><mml:mrow><mml:mtext>RMSE</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">9.418</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M146" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.718</mml:mn></mml:mrow></mml:math></inline-formula>), which outperforms other recurrent neural networks but falls short of KNN based models. Both LSTM and Simple RNN exhibit similar and relatively poorer performance metrics, which demonstrates their limitations in capturing the complex hydrological dynamics of Arctic river systems when used without additional enhancements.</p>

      <fig id="F7" specific-use="star"><label>Figure 7</label><caption><p id="d2e3055">The predicted and observed values of 10 independent runs when the time step is 9 months, including RCPIKLA, LSTM, GRU and Simple RNN models. The red dash line angled at 45° represents the line of perfect agreement between observed and predicted values.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3165/2026/hess-30-3165-2026-f07.png"/>

        </fig>

      <fig id="F8" specific-use="star"><label>Figure 8</label><caption><p id="d2e3066">Time series comparison of observed and predicted monthly discharge for the Kolyma River during the test period when the time step is 9 months.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3165/2026/hess-30-3165-2026-f08.png"/>

        </fig>

      <p id="d2e3076">It is worthwhile to note that all models perform reasonably well for low to moderate discharge values (0–30 <inline-formula><mml:math id="M147" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula>), but significant differences emerge at higher discharge events (<inline-formula><mml:math id="M148" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">80</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M149" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula>), which is crucial for flood forecasting. Although the proposed RCPIKLA model maintains better prediction accuracy for these high discharge events, there is room for improvement, which may be attributed to the limited number of high discharge events in the training dataset. This systematic underestimation of peak flows represents a common challenge in data-driven hydrological modeling, particularly for Arctic river systems, where extreme discharge events are relatively rare but carry significant implications for water resource management and hazard mitigation. Kratzert et al.  (2019) observed similar patterns in LSTM-based rainfall-runoff modeling across diverse catchments. For Arctic rivers specifically, Gelfan et al. (2017) and Chang et al. (2025) reported that process-based models and machine learning approaches struggle with extreme conditions due to the complex processes and events that are poorly represented in limited observational records. In our study, extreme high discharge events (<inline-formula><mml:math id="M150" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">80</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M151" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula>) constitute less than 5 % of the training dataset, creating a class imbalance problem common in hydrological time series (Nearing et al., 2021). The squared error loss function (MSE) used in model training inherently weights all samples equally, which can lead to optimization that favors the more numerous moderate flow events at the expense of rare extremes. Future work could address this limitation through specialized sampling techniques or physics-informed constraints specifically designed to better capture high-magnitude discharge events.</p>
</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Interpretability analysis of Kolmogorov–Arnold Networks</title>
      <p id="d2e3131">Kolmogorov–Arnold Networks can learn interpretable univariate functions that can be visualized and approximated symbolically (Liu et al., 2024). The learned activation functions from the KAN component for each input feature are derived and presented to examine how each hydroclimatic input is transformed prior to temporal aggregation by the LSTM-attention block. While the overall model remains a sequence model, the KAN component offers mechanistic insight into learned input transformations.</p>
      <p id="d2e3134">Presented in Fig. 9, the learned univariate KAN functions for the primary hydroclimatic predictors and the seasonal encodings are plotted against standardized inputs. The learned mappings show distinct behaviors across variables. Temperature exhibits threshold-dependent behavior and an increasing response for positive standardized values, which are consistent with degree-day snowmelt formulations (Hock, 2003). The minimal response at very low temperatures reflects periods when all precipitation accumulates as snow with no melt contribution to discharge.  The strengthening positive trend at high temperatures captures accelerated snowmelt during warmer periods and melt-season activation. The PET function remains relatively constant across most of the range but drops at extremely high PET values. This negative response at high evapotranspiration demand is physically meaningful in permafrost watersheds where shallow active layers and restricted groundwater storage make baseflow highly sensitive to evaporative losses during warm, dry periods. The transition may represent a threshold where evaporative water losses begin to substantially reduce streamflow, consistent with observations of increased Arctic river sensitivity to evapotranspiration under warming (Nijssen et al., 2001). Precipitation shows minimal direct transformation with a nearly flat or slightly negative function. It can be caused by winter precipitation accumulating as snow and contributing to discharge only after spring melt, which creates multi-month lags (Gelfan et al., 2017). The learned functions for the temporal encoding variables (Month<sub>sin⁡</sub> and Month<sub>cos⁡</sub>) shows how the KAN components represent seasonality. Month<sub>sin⁡</sub> exhibits a clear, smoothly varying nonlinear transformation, whereas Month<sub>cos⁡</sub> remains comparatively flat. The monotonic tendency in the Month<sub>sin⁡</sub> curve suggests an asymmetric seasonal influence. It shows that the model responds differently to the rising and falling portions of the annual cycle, which is consistent with the sharp melt-season transition and the comparatively gradual recession that often follows peak flow. Importantly, because trigonometric encoding provides a continuous cyclical representation of annual timing, the KAN transformation can capture seasonal structure without introducing an artificial discontinuity at the year boundary.</p>

      <fig id="F9" specific-use="star"><label>Figure 9</label><caption><p id="d2e3184">Learned univariate KAN functions for each input variable into the following LSTM component. Each panel shows how a single input feature is transformed before being passed to the LSTM layers. Blue lines represent the mean transformation across all KAN output dimensions, with shaded regions indicating <inline-formula><mml:math id="M157" display="inline"><mml:mrow><mml:mo>±</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> standard deviation, reflecting transformation variability.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3165/2026/hess-30-3165-2026-f09.png"/>

        </fig>

      <p id="d2e3204">It is worthwhile to note that, as a hybrid architecture, RCPIKLA is primarily interpretable at the KAN stage. As the KAN module represents input–feature mappings through learnable univariate functions, the learned curves and their symbolic approximations provide a transparent description of how each hydroclimatic predictor is transformed before being passed to the sequence model. However, this interpretability does not extend to a fully closed-form, end-to-end explanation of the final discharge prediction: the downstream LSTM block integrates information across multiple antecedent months and mixes transformed features through recurrent dynamics and temporal weighting. Consequently, the KAN-derived functions should be interpreted as input transformations, rather than as a complete mechanistic decomposition of the full temporal prediction process.</p>
</sec>
<sec id="Ch1.S4.SS4">
  <label>4.4</label><title>On the role of the physics informed constraints and residual structure</title>
      <p id="d2e3215">The ablation results comparing RCPIKLA, RCKLA (no physics), and PIKLA (no residual) are presented in Fig. 10. The results reveal that the complete RCPIKLA model achieves mean NSE of <inline-formula><mml:math id="M158" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.827</mml:mn><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.030</mml:mn></mml:mrow></mml:math></inline-formula> (mean <inline-formula><mml:math id="M159" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula> standard deviation) across 120 evaluations, which represents significant improvements over the PIKLA model without residual compensation (<inline-formula><mml:math id="M160" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.790</mml:mn><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.029</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M161" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.001</mml:mn></mml:mrow></mml:math></inline-formula>) and the RCKLA without physics (<inline-formula><mml:math id="M162" display="inline"><mml:mrow><mml:mn mathvariant="normal">0.812</mml:mn><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.031</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M163" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.001</mml:mn></mml:mrow></mml:math></inline-formula>). Similarly, RCPIKLA obtains lowest RMSE (<inline-formula><mml:math id="M164" display="inline"><mml:mrow><mml:mn mathvariant="normal">8.12</mml:mn><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.75</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M165" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula>) compared to PIKLA (<inline-formula><mml:math id="M166" display="inline"><mml:mrow><mml:mn mathvariant="normal">8.98</mml:mn><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.52</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M167" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M168" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.001</mml:mn></mml:mrow></mml:math></inline-formula>) and RCKLA (<inline-formula><mml:math id="M169" display="inline"><mml:mrow><mml:mn mathvariant="normal">8.47</mml:mn><mml:mo>±</mml:mo><mml:mn mathvariant="normal">0.76</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M170" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M171" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.001</mml:mn></mml:mrow></mml:math></inline-formula>). These comparative results highlight two important aspects of the model architecture: (1) The physics-informed constraints contribute to overall model robustness and performance stability. By incorporating physical principles of snowpack accumulation and melt processes through the specialized SnowpackLayer, the model better captures the underlying hydrological dynamics of the Arctic river system. The physics-informed loss function, which mathematically enforces the relationship between melted snow and discharge, helps maintain physical consistency in the predictions. (2) The residual compensation mechanism addresses model inadequacies by learning the systematic errors in the physics-based predictions. This is particularly valuable for handling complex nonlinear processes that are not fully captured by the simplified physical representations. The performance difference between PIKLA and RCPIKLA demonstrates that the residual structure successfully compensates for approximation errors in the physics-informed component. Residuals (Predicted <inline-formula><mml:math id="M172" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula> Observed) are evaluated on the test set across all forecast horizons (1 to 12 time steps), using 10 independent runs per horizon. When pooling all residuals across horizons and runs, RCPIKLA obtains a low residual (0.08 <inline-formula><mml:math id="M173" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula>, corresponding to <inline-formula><mml:math id="M174" display="inline"><mml:mrow><mml:mo>+</mml:mo><mml:mn mathvariant="normal">0.57</mml:mn></mml:mrow></mml:math></inline-formula> % of the mean observed discharge), whereas RCKLA exhibits a negative mean residual (<inline-formula><mml:math id="M175" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.31</mml:mn></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M176" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M177" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2.23</mml:mn></mml:mrow></mml:math></inline-formula> %). These results indicate that the physics-informed constraint does not introduce a systematic bias. Instead, it reduces the slight underprediction tendency of the unconstrained model and yields a more centered residual distribution overall.</p>

      <fig id="F10" specific-use="star"><label>Figure 10</label><caption><p id="d2e3427">Performance comparison of model variants, including RCPIKLA, PIKLA (no residual) and RCKLA (no physics), across all forecasting horizons (1–12 months) and 10 independent training runs. Box plots show distributions of <bold>(a)</bold> NSE, and <bold>(b)</bold> RMSE. Each box aggregates 120 evaluations (12 time steps <inline-formula><mml:math id="M178" display="inline"><mml:mrow><mml:mo>×</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula> runs), with boxes showing median (center line), interquartile range (box edges), and <inline-formula><mml:math id="M179" display="inline"><mml:mrow><mml:mn mathvariant="normal">1.5</mml:mn><mml:mo>×</mml:mo></mml:mrow></mml:math></inline-formula> IQR whiskers. Individual points represent each evaluation. Red diamonds mark the mean values with numerical annotations (mean <inline-formula><mml:math id="M180" display="inline"><mml:mo>±</mml:mo></mml:math></inline-formula> standard deviation).</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3165/2026/hess-30-3165-2026-f10.png"/>

        </fig>

      <p id="d2e3469">In summary, the ablation comparisons isolate individual component contributions: the residual structure (RCPIKLA vs. PIKLA) improves NSE by 0.038 (4.8 % relative improvement), while the physics-informed constraint (RCPIKLA vs. RCKLA) contributes 0.015 NSE improvement (1.8 % relative).  Both components provide independent, statistically significant (<inline-formula><mml:math id="M181" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.001</mml:mn></mml:mrow></mml:math></inline-formula>) performance gains, confirming their complementary roles in the hybrid architecture. The synergistic integration of both components yields a new structure that balances data-driven flexibility with physical consistency.  This hybrid approach is particularly advantageous in data-limited environments like Arctic rivers, where the physics-informed constraints and the residual compensation help overcome model simplifications and data uncertainty.</p>
</sec>
<sec id="Ch1.S4.SS5">
  <label>4.5</label><title>The role of seasonal variations and trigonometric encoding</title>
      <p id="d2e3493">To assess the contribution of explicit seasonal representation, model variants are evaluated with and without trigonometric encoding (TE) of monthly seasonality. The comparative analysis is plotted in Fig. 11, which reveals substantial performance differences across NSE, RMSE, and <inline-formula><mml:math id="M182" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> metrics. The results are aggregated across all time step, with each time step evaluated using 10 independent runs. The box plot of three evaluation metrics indicates that trigonometric encoding substantially improves performance across all model architectures. The proposed RCPIKLA model maintains the highest median NSE (approximately 0.83) with trigonometric encoding, while the removal of TE (denoted by “-no TE”) leads to degraded performance (median NSE around 0.80) and wider value ranges. This pattern is consistent across all architectures, with GRU, LSTM, and Simple RNN models all exhibiting substantial performance degradation when seasonal encoding is removed. The widths of the box plots, representing interquartile ranges, also decrease substantially with TE, indicating greater consistency and reduced variability across model runs. Similar improvements are observed in GRU, LSTM, and Simple RNN models. In particular, the LSTM and Simple RNN models without trigonometric encoding show greater instability, with some runs achieving NSE values below 0.5, which shows severely compromised predictive capability. Regarding RMSE, the incorporation of TE effectively reduces median errors and decreases variability, particularly for RCPIKLA, where RMSE values exhibit the narrowest range. Outliers observed in models without trigonometric encoding suggest that omitting seasonal encodings can lead to occasional severe prediction errors, likely caused by the model's inability to account effectively for seasonal patterns. The <inline-formula><mml:math id="M183" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> metric corroborates these findings, with RCPIKLA achieving a median <inline-formula><mml:math id="M184" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> of 0.781 with TE and 0.750 without TE (4.1 % improvement). Baseline models show improvements of 2.5 %–4.7 % when adding TE, with LSTM and Simple RNN exhibiting the largest gains (0.638 to 0.668 and 0.641 to 0.671, respectively). This indicates that while trigonometric encoding provides universal benefits across all architectures, the combination of RCPIKLA's physics-informed components with TE yields synergistic improvements, achieving the best overall performance across all three metrics.</p>

      <fig id="F11" specific-use="star"><label>Figure 11</label><caption><p id="d2e3531">The comparison of models with and without trigonometric encoding for seasonal variations as inputs.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3165/2026/hess-30-3165-2026-f11.png"/>

        </fig>

      <p id="d2e3540">Overall, the performance improvements from trigonometric seasonal encoding observed across all model architectures highlight the importance of explicit temporal feature engineering in hydrological applications. This finding is consistent with recent deep learning studies in environmental and hydrological modeling. For example, Pölz et al.  (2024) demonstrated that providing deep learning models with explicit time-aware features such as cyclical time features improved discharge prediction compared to expecting the model to learn seasonal patterns solely from data. Snieder and Khan (2025) also supported cyclical encoding from a methodological perspective, as encoding time with sine–cosine terms provides a continuous, cyclic representation of annual timing. This avoids the artificial discontinuity at the year boundary, which can otherwise introduce spurious jumps and make seasonal relationships harder for data-driven models to learn.</p>
      <p id="d2e3544">The results showing that baseline models (LSTM, GRU) gain 4.7 % performance from TE, while the physics-informed RCPIKLA gains 4.1 %, suggests that different model components capture seasonal information through complementary mechanisms. The physics-informed snowpack layer already provides implicit seasonal awareness through temperature-dependent snow accumulation and melt, which may explain why RCPIKLA benefits slightly less from explicit TE. Without explicit encoding of this cyclical pattern, models struggle to establish accurate temporal context for the meteorological inputs, resulting in compromised predictive accuracy.</p>
</sec>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <label>5</label><title>Conclusion</title>
      <p id="d2e3557">In this study, a novel hybrid model integrating physics-informed constraints with advanced deep learning architectures is proposed to improve discharge prediction accuracy in permafrost-dominated Arctic rivers. The proposed RCPIKLA model obtains robust and accurate prediction performance on the Kolyma River, outperforming conventional deep learning approaches across all forecasting horizons from 1 to 12 months. The key findings are summarized as follows: <list list-type="order"><list-item>
      <p id="d2e3562">The predictive performance of the newly proposed model and baseline models are plotted and evaluated across a range of time steps, from 1 to 12 months. As illustrated in Fig. 5, the newly proposed model consistently overperforms other baselines at all time steps and produces robust predictive performance. It obtains the highest NSE values ranging from 0.81 to 0.86, the lowest RMSE values between 7.1 to 8.5 <inline-formula><mml:math id="M185" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula> and the highest <inline-formula><mml:math id="M186" display="inline"><mml:mrow><mml:msup><mml:mtext>KGE</mml:mtext><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> values between 0.74 to 0.82. The hybrid model achieves optimal performance at 9-month input sequences, which suggests that the permafrost-covered Arctic river discharge exhibits multi-seasonal temporal dependencies on preceding hydrometeorological conditions, such as snow accumulation and melting processes and active layer storage dynamics.</p></list-item><list-item>
      <p id="d2e3585">The predictive performance across different discharge value ranges is further assessed to understand how well each model captures the full spectrum of hydrological variability. All models perform reasonably well for low to moderate discharge values (0–30 <inline-formula><mml:math id="M187" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula>), but more obvious differences emerge at moderate and high discharge events. Although the proposed RCPIKLA model maintains improved prediction accuracy, challenges remain in accurately predicting extreme high discharge events, with all models showing a tendency to underestimate peak flows. This limitation may be partially attributed to the relatively sparse representation of high discharge events in the dataset, which constrains the model's ability to generalize under extreme hydrological scenarios.</p></list-item><list-item>
      <p id="d2e3597">Both physics-informed constraints and residual compensation contribute distinctly to model performance. The physics-informed component, which incorporates snowpack accumulation and melt processes, provides the proposed model with basic domain knowledge that helps overcome data limitations in the permafrost-dominated Kolyma River basin. The residual compensation mechanism examines systematic errors in the physics-based predictions and helps capture complex nonlinear processes that are not fully represented.</p></list-item><list-item>
      <p id="d2e3601">By transforming month values into sine and cosine components that preserve the cyclical nature of seasonal patterns, the incorporation of trigonometric seasonal encoding can improve the predictive performance. This approach enhances prediction accuracy across all architectures, with improvements of 4 %–6 % in performance metrics, highlighting the importance of representing the pronounced seasonal dynamics of Arctic rivers characterized by frozen winter conditions, spring snowmelt peaks, and moderate summer flows.</p></list-item></list></p>
      <p id="d2e3604">While the RCPIKLA model demonstrates robust performance for the Kolyma River prediction under historical and current hydroclimatic conditions, several limitations should be acknowledged. As a data-driven model trained on historical observations, the model's performance may degrade if climate change induces fundamental shifts in watershed behavior that extend beyond the range of training conditions. Such regime changes may include but are not limited to scenarios like transitions from continuous to discontinuous permafrost, and significantly altered seasonal patterns. Under such scenarios, the model would need to extrapolate beyond its training data range, which remains a challenge for data-driven approaches. Future applications under changing climate conditions should include regular model retraining and validation as new observations become available.</p>
</sec>

      
      </body>
    <back><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d2e3611">The representative example datasets and code are available on GitHub at <uri>https://github.com/Zhou-R/HESS_KAN</uri> (last access: 30 April 2026) and are archived on Zenodo at <ext-link xlink:href="https://doi.org/10.5281/zenodo.19862397" ext-link-type="DOI">10.5281/zenodo.19862397</ext-link> (Zhou, 2026).</p>
  </notes><app-group>
        <supplementary-material position="anchor"><p id="d2e3620">The supplement related to this article is available online at <inline-supplementary-material xlink:href="https://doi.org/10.5194/hess-30-3165-2026-supplement" xlink:title="pdf">https://doi.org/10.5194/hess-30-3165-2026-supplement</inline-supplementary-material>.</p></supplementary-material>
        </app-group><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e3629">RZ: Writing – original draft, Visualization, Validation, Resources, Methodology, Investigation, Formal analysis, Data curation, Conceptualization. SL: Writing – review and editing, Resources, Data curation, Investigation.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e3635">The contact author has declared that neither of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e3641">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e3647">This research has been supported by the National Science Foundation, Directorate for Geosciences (grant no. 2407963).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e3653">This paper was edited by Rohini Kumar and reviewed by two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bib1"><label>1</label><mixed-citation>Alzubaidi, L., Bai, J., Al-Sabaawi, A., Santamaría, J., Albahri, A. S., Al-dabbagh, B. S. N., Fadhel, M. A., Manoufali, M., Zhang, J., Al-Timemy, A. H., Duan, Y., Abdullah, A., Farhan, L., Lu, Y., Gupta, A., Albu, F., Abbosh, A., and Gu, Y.: A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications, J. Big Data, 10, 46, <ext-link xlink:href="https://doi.org/10.1186/s40537-023-00727-2" ext-link-type="DOI">10.1186/s40537-023-00727-2</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib2"><label>2</label><mixed-citation>Andersson, T. R., Hosking, J. S., Pérez-Ortiz, M., Paige, B., Elliott, A., Russell, C., Law, S., Jones, D. C., Wilkinson, J., Phillips, T., Byrne, J., Tietsche, S., Sarojini, B. B., Blanchard-Wrigglesworth, E., Aksenov, Y., Downie, R., and Shuckburgh, E.: Seasonal arctic sea ice forecasting with probabilistic deep learning, Nat. Commun., 12, 5124, <ext-link xlink:href="https://doi.org/10.1038/s41467-021-25257-4" ext-link-type="DOI">10.1038/s41467-021-25257-4</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib3"><label>3</label><mixed-citation>Bakhshi Ostadkalayeh, F., Moradi, S., Asadi, A., Moghaddam Nia, A., and Taheri, S.: Performance improvement of LSTM-based deep learning model for streamflow forecasting using kalman filtering, Water Resour. Manage., 37, 3111–3127,  <ext-link xlink:href="https://doi.org/10.1007/s11269-023-03492-2" ext-link-type="DOI">10.1007/s11269-023-03492-2</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib4"><label>4</label><mixed-citation>Basu, B., Morrissey, P., and Gill, L. W.: Application of nonlinear time series and machine learning algorithms for forecasting groundwater flooding in a lowland karst area, Water Resour. Res., 58, e2021WR029576,  <ext-link xlink:href="https://doi.org/10.1029/2021WR029576" ext-link-type="DOI">10.1029/2021WR029576</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib5"><label>5</label><mixed-citation>Bring, A., Fedorova, I., Dibike, Y., Hinzman, L., Mård, J., Mernild, S. H., Prowse, T., Semenova, O., Stuefer, S. L., and Woo, M.-K.: Arctic terrestrial hydrology: a synthesis of processes, regional effects, and research challenges, J. Geophys. Res.-Biogeo., 121, 621–649, <ext-link xlink:href="https://doi.org/10.1002/2015JG003131" ext-link-type="DOI">10.1002/2015JG003131</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib6"><label>6</label><mixed-citation>Chang, S. Y., Schwenk, J., and Solander, K. C.: Deep learning advances arctic river water temperature predictions, Water Resour. Res., 61, e2024WR039053,  <ext-link xlink:href="https://doi.org/10.1029/2024WR039053" ext-link-type="DOI">10.1029/2024WR039053</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bib7"><label>7</label><mixed-citation>Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/ARXIV.1406.1078" ext-link-type="DOI">10.48550/ARXIV.1406.1078</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bib8"><label>8</label><mixed-citation>Cinkus, G., Mazzilli, N., Jourde, H., Wunsch, A., Liesch, T., Ravbar, N., Chen, Z., and Goldscheider, N.: When best is the enemy of good – critical evaluation of performance criteria in hydrological models, Hydrol. Earth Syst. Sci., 27, 2397–2411, <ext-link xlink:href="https://doi.org/10.5194/hess-27-2397-2023" ext-link-type="DOI">10.5194/hess-27-2397-2023</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib9"><label>9</label><mixed-citation>DeWalle, D. R. and Rango, A.: Principles of snow hydrology, 1st edn., Cambridge University Press,  <ext-link xlink:href="https://doi.org/10.1017/CBO9780511535673" ext-link-type="DOI">10.1017/CBO9780511535673</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bib10"><label>10</label><mixed-citation>Ernakovich, J. G., Hopping, K. A., Berdanier, A. B., Simpson, R. T., Kachergis, E. J., Steltzer, H., and Wallenstein, M. D.: Predicted responses of arctic and alpine ecosystems to altered seasonality under climate change, Glob. Change Biol., 20, 3256–3269,  <ext-link xlink:href="https://doi.org/10.1111/gcb.12568" ext-link-type="DOI">10.1111/gcb.12568</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bib11"><label>11</label><mixed-citation>Feng, D., Gleason, C. J., Lin, P., Yang, X., Pan, M., and Ishitsuka, Y.: Recent changes to arctic river discharge, Nat. Commun., 12, 6917,  <ext-link xlink:href="https://doi.org/10.1038/s41467-021-27228-1" ext-link-type="DOI">10.1038/s41467-021-27228-1</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib12"><label>12</label><mixed-citation>Gao, S., Huang, Y., Zhang, S., Han, J., Wang, G., Zhang, M., and Lin, Q.: Short-term runoff prediction with GRU and LSTM networks without requiring time step optimization during sample generation, J. Hydrol., 589, 125188,  <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2020.125188" ext-link-type="DOI">10.1016/j.jhydrol.2020.125188</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib13"><label>13</label><mixed-citation>Gelfan, A., Gustafsson, D., Motovilov, Y., Arheimer, B., Kalugin, A., Krylenko, I., and Lavrenov, A.: Climate change impact on the water regime of two great arctic rivers: modeling and uncertainty issues, Climatic Change, 141, 499–515,  <ext-link xlink:href="https://doi.org/10.1007/s10584-016-1710-5" ext-link-type="DOI">10.1007/s10584-016-1710-5</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib14"><label>14</label><mixed-citation>Granata, F., Zhu, S., and Di Nunno, F.: Advanced streamflow forecasting for central european rivers: the cutting-edge kolmogorov-arnold networks compared to transformers, J. Hydrol., 645, 132175,  <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2024.132175" ext-link-type="DOI">10.1016/j.jhydrol.2024.132175</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib15"><label>15</label><mixed-citation>Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.: Decomposition of the mean squared error and NSE performance criteria: implications for improving hydrological modelling, J. Hydrol., 377, 80–91,  <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2009.08.003" ext-link-type="DOI">10.1016/j.jhydrol.2009.08.003</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bib16"><label>16</label><mixed-citation>Gusev, E. M., Nasonova, O. N., and Dzhogan, L. Y.: Physically based simulating long-term dynamics of diurnal variations of river runoff and snow water equivalent in the kolyma river basin, Water Resour., 42, 834–841,  <ext-link xlink:href="https://doi.org/10.1134/S0097807815060056" ext-link-type="DOI">10.1134/S0097807815060056</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bib17"><label>17</label><mixed-citation>Häkkinen, S. and Mellor, G. L.: Modeling the seasonal variability of a coupled arctic ice-ocean system, J. Geophys. Res.-Oceans, 97, 20285–20304, <ext-link xlink:href="https://doi.org/10.1029/92JC02037" ext-link-type="DOI">10.1029/92JC02037</ext-link>, 1992.</mixed-citation></ref>
      <ref id="bib1.bib18"><label>18</label><mixed-citation>Harpold, A. A., Kaplan, M. L., Klos, P. Z., Link, T., McNamara, J. P., Rajagopal, S., Schumer, R., and Steele, C. M.: Rain or snow: hydrologic processes, observations, prediction, and research needs, Hydrol. Earth Syst. Sci., 21, 1–22, <ext-link xlink:href="https://doi.org/10.5194/hess-21-1-2017" ext-link-type="DOI">10.5194/hess-21-1-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib19"><label>19</label><mixed-citation>Harris, I., Osborn, T. J., Jones, P., and Lister, D.: Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset, Sci. Data, 7, 109,  <ext-link xlink:href="https://doi.org/10.1038/s41597-020-0453-3" ext-link-type="DOI">10.1038/s41597-020-0453-3</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib20"><label>20</label><mixed-citation>Hochreiter, S. and Schmidhuber, J.: Long short-term memory, Neural Comput., 9, 1735–1780,  <ext-link xlink:href="https://doi.org/10.1162/neco.1997.9.8.1735" ext-link-type="DOI">10.1162/neco.1997.9.8.1735</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bib21"><label>21</label><mixed-citation>Hock, R.: Temperature index melt modelling in mountain areas, J. Hydrol., 282, 104–115,  <ext-link xlink:href="https://doi.org/10.1016/S0022-1694(03)00257-9" ext-link-type="DOI">10.1016/S0022-1694(03)00257-9</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bib22"><label>22</label><mixed-citation>Holmes, R. M., McClelland, J. W., Peterson, B. J., Tank, S. E., Bulygina, E., Eglinton, T. I., Gordeev, V. V., Gurtovaya, T. Y., Raymond, P. A., Repeta, D. J., Staples, R., Striegl, R. G., Zhulidov, A. V., and Zimov, S. A.: Seasonal and annual fluxes of nutrients and organic matter from large rivers to the Arctic Ocean and surrounding seas, Estuar. Coast., 35, 369–382,  <ext-link xlink:href="https://doi.org/10.1007/s12237-011-9386-6" ext-link-type="DOI">10.1007/s12237-011-9386-6</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bib23"><label>23</label><mixed-citation>Jin, A., Wang, Q., Zhan, H., and Zhou, R.: Comparative performance assessment of physical-based and data-driven machine-learning models for simulating streamflow: a case study in three catchments across the US, J. Hydrol. Eng., 29, 5024004,  <ext-link xlink:href="https://doi.org/10.1061/JHYEFF.HEENG-6118" ext-link-type="DOI">10.1061/JHYEFF.HEENG-6118</ext-link>, 2024a.</mixed-citation></ref>
      <ref id="bib1.bib24"><label>24</label><mixed-citation>Jin, A., Wang, Q., Zhou, R., Shi, W., and Qiao, X.: Hybrid multivariate machine learning models for streamflow forecasting: a two-stage decomposition–reconstruction framework, J. Hydrol. Eng., 29, 4024026,  <ext-link xlink:href="https://doi.org/10.1061/JHYEFF.HEENG-6254" ext-link-type="DOI">10.1061/JHYEFF.HEENG-6254</ext-link>, 2024b.</mixed-citation></ref>
      <ref id="bib1.bib25"><label>25</label><mixed-citation>Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., and Yang, L.: Physics-informed machine learning, Nat. Rev. Phys., 3, 422–440,  <ext-link xlink:href="https://doi.org/10.1038/s42254-021-00314-5" ext-link-type="DOI">10.1038/s42254-021-00314-5</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib26"><label>26</label><mixed-citation>Kling, H., Fuchs, M., and Paulin, M.: Runoff conditions in the upper danube basin under an ensemble of climate change scenarios, J. Hydrol., 424, 264–277,  <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2012.01.011" ext-link-type="DOI">10.1016/j.jhydrol.2012.01.011</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bib27"><label>27</label><mixed-citation>Knoben, W. J. M., Freer, J. E., and Woods, R. A.: Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores, Hydrol. Earth Syst. Sci., 23, 4323–4331, <ext-link xlink:href="https://doi.org/10.5194/hess-23-4323-2019" ext-link-type="DOI">10.5194/hess-23-4323-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib28"><label>28</label><mixed-citation>Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005–6022, <ext-link xlink:href="https://doi.org/10.5194/hess-22-6005-2018" ext-link-type="DOI">10.5194/hess-22-6005-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bib29"><label>29</label><mixed-citation>Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, G.: Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089–5110, <ext-link xlink:href="https://doi.org/10.5194/hess-23-5089-2019" ext-link-type="DOI">10.5194/hess-23-5089-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib30"><label>30</label><mixed-citation>Krogh, S. A., Pomeroy, J. W., and Marsh, P.: Diagnosis of the hydrology of a small arctic basin at the tundra–taiga transition using a physically based hydrological model, J. Hydrol., 550, 685–703,  <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2017.05.042" ext-link-type="DOI">10.1016/j.jhydrol.2017.05.042</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib31"><label>31</label><mixed-citation>Kůrková, V.: Kolmogorov's theorem and multilayer neural networks, Neural Networks, 5, 501–506, <ext-link xlink:href="https://doi.org/10.1016/0893-6080(92)90012-8" ext-link-type="DOI">10.1016/0893-6080(92)90012-8</ext-link>, 1992.</mixed-citation></ref>
      <ref id="bib1.bib32"><label>32</label><mixed-citation>LeCun, Y., Bottou, L., Orr, G. B., and Müller, K.-R.: Efficient BackProp, in: Neural networks: tricks of the trade, vol. 1524, edited by: Orr, G. B. and Müller, K.-R., Springer Berlin Heidelberg, Berlin, Heidelberg, 9–50, <ext-link xlink:href="https://doi.org/10.1007/3-540-49430-8_2" ext-link-type="DOI">10.1007/3-540-49430-8_2</ext-link>, 1998.</mixed-citation></ref>
      <ref id="bib1.bib33"><label>33</label><mixed-citation>Liu, S., Wang, P., Yu, J., Zhou, R., Bai, B., Gabysheva, O. I., Frolova, N. L., and Pozdniakov, S. P.: Changes in hydrological regime regulate POC export across permafrost-dominated arctic river basins, Geosci. Front., 102208,  <ext-link xlink:href="https://doi.org/10.1016/j.gsf.2025.102208" ext-link-type="DOI">10.1016/j.gsf.2025.102208</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bib34"><label>34</label><mixed-citation>Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., Hou, T. Y., and Tegmark, M.: KAN: Kolmogorov–Arnold Networks, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/ARXIV.2404.19756" ext-link-type="DOI">10.48550/ARXIV.2404.19756</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib35"><label>35</label><mixed-citation>McClelland, J. W., Holmes, R. M., Peterson, B. J., and Stieglitz, M.: Increasing river discharge in the eurasian arctic: consideration of dams, permafrost thaw, and fires as potential agents of change, J. Geophys. Res.-Atmos., 109, 2004JD004583,  <ext-link xlink:href="https://doi.org/10.1029/2004JD004583" ext-link-type="DOI">10.1029/2004JD004583</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bib36"><label>36</label><mixed-citation>Moriasi, D. N., Arnold, J. G., Liew, M. W. V., Bingner, R. L., Harmel, R. D., and Veith, T. L.: Model evaluation guidelines for systematic quantification of accuracy in watershed simulations, T. ASABE, 50, 885–900,  <ext-link xlink:href="https://doi.org/10.13031/2013.23153" ext-link-type="DOI">10.13031/2013.23153</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bib37"><label>37</label><mixed-citation>Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D., Frame, J. M., Prieto, C., and Gupta, H. V.: What role does hydrological science play in the age of machine learning?, Water Resour. Res., 57, e2020WR028091,  <ext-link xlink:href="https://doi.org/10.1029/2020WR028091" ext-link-type="DOI">10.1029/2020WR028091</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib38"><label>38</label><mixed-citation>Nijssen, B., O'Donnell, G. M., Hamlet, A. F., and Lettenmaier, D. P.: Hydrologic sensitivity of global rivers to climate change, Climatic Change, 50, 143–175,  <ext-link xlink:href="https://doi.org/10.1023/A:1010616428763" ext-link-type="DOI">10.1023/A:1010616428763</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bib39"><label>39</label><mixed-citation>Peterson, B. J., Holmes, R. M., McClelland, J. W., Vörösmarty, C. J., Lammers, R. B., Shiklomanov, A. I., Shiklomanov, I. A., and Rahmstorf, S.: Increasing river discharge to the Arctic Ocean, Science, 298, 2171–2173, <ext-link xlink:href="https://doi.org/10.1126/science.1077445" ext-link-type="DOI">10.1126/science.1077445</ext-link>, 2002.</mixed-citation></ref>
      <ref id="bib1.bib40"><label>40</label><mixed-citation>Pölz, A., Blaschke, A. P., Komma, J., Farnleitner, A. H., and Derx, J.: Transformer versus LSTM: a comparison of deep learning models for karst spring discharge forecasting, Water Resour. Res., 60, e2022WR032602, <ext-link xlink:href="https://doi.org/10.1029/2022WR032602" ext-link-type="DOI">10.1029/2022WR032602</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib41"><label>41</label><mixed-citation>Prowse, T., Alfredsen, K., Beltaos, S., Bonsal, B. R., Bowden, W. B., Duguay, C. R., Korhola, A., McNamara, J., Vincent, W. F., Vuglinsky, V., Walter Anthony, K. M., and Weyhenmeyer, G. A.: Effects of changes in arctic lake and river ice, Ambio, 40, 63–74,  <ext-link xlink:href="https://doi.org/10.1007/s13280-011-0217-6" ext-link-type="DOI">10.1007/s13280-011-0217-6</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bib42"><label>42</label><mixed-citation>Rawlins, M. A. and Karmalkar, A. V.: Regime shifts in Arctic terrestrial hydrology manifested from impacts of climate warming, The Cryosphere, 18, 1033–1052, <ext-link xlink:href="https://doi.org/10.5194/tc-18-1033-2024" ext-link-type="DOI">10.5194/tc-18-1033-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib43"><label>43</label><mixed-citation>Schneider, U., Hänsel, S., Finger, P., Rustemeier, E., and Ziese, M.: GPCC full data monthly version 2022 at 2.5°: monthly land–surface precipitation from rain-gauges built on GTS-based and historic data: globally gridded monthly totals (2022), <ext-link xlink:href="https://doi.org/10.5676/DWD_GPCC/FD_M_V2022_250" ext-link-type="DOI">10.5676/DWD_GPCC/FD_M_V2022_250</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bib44"><label>44</label><mixed-citation>Sergeev, A., Baglaeva, E., and Subbotina, I.: Hybrid model combining LSTM with discrete wavelet transformation to predict surface methane concentration in the arctic island belyy, Atmos. Environ., 317, 120210,  <ext-link xlink:href="https://doi.org/10.1016/j.atmosenv.2023.120210" ext-link-type="DOI">10.1016/j.atmosenv.2023.120210</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib45"><label>45</label><mixed-citation>Singh, A., Kalke, H., Loewen, M., and Ray, N.: River ice segmentation with deep learning, IEEE T. Geosci. Remote, 58, 7570–7579,  <ext-link xlink:href="https://doi.org/10.1109/TGRS.2020.2981082" ext-link-type="DOI">10.1109/TGRS.2020.2981082</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bib46"><label>46</label><mixed-citation>Snieder, E. and Khan, U. T.: A diversity-centric strategy for the selection of spatio-temporal training data for LSTM-based streamflow forecasting, Hydrol. Earth Syst. Sci., 29, 785–798, <ext-link xlink:href="https://doi.org/10.5194/hess-29-785-2025" ext-link-type="DOI">10.5194/hess-29-785-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bib47"><label>47</label><mixed-citation>Spencer, R. G. M., Mann, P. J., Dittmar, T., Eglinton, T. I., McIntyre, C., Holmes, R. M., Zimov, N., and Stubbins, A.: Detecting the signature of permafrost thaw in arctic rivers, Geophys. Res. Lett., 42, 2830–2835,  <ext-link xlink:href="https://doi.org/10.1002/2015GL063498" ext-link-type="DOI">10.1002/2015GL063498</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bib48"><label>48</label><mixed-citation>Suzuki, K., Liston, G. E., and Matsuo, K.: Estimation of continental-basin-scale sublimation in the lena river basin, siberia, Adv. Meteorol., 2015, 1–14,  <ext-link xlink:href="https://doi.org/10.1155/2015/286206" ext-link-type="DOI">10.1155/2015/286206</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bib49"><label>49</label><mixed-citation>Tank, S. E., McClelland, J. W., Spencer, R. G. M., Shiklomanov, A. I., Suslova, A., Moatar, F., Amon, R. M. W., Cooper, L. W., Elias, G., Gordeev, V. V., Guay, C., Gurtovaya, T. Yu., Kosmenko, L. S., Mutter, E. A., Peterson, B. J., Peucker-Ehrenbrink, B., Raymond, P. A., Schuster, P. F., Scott, L., Staples, R., Striegl, R. G., Tretiakov, M., Zhulidov, A. V., Zimov, N., Zimov, S., and Holmes, R. M.: Recent trends in the chemistry of major northern rivers signal widespread arctic change, Nat. Geosci., 16, 789–796,  <ext-link xlink:href="https://doi.org/10.1038/s41561-023-01247-7" ext-link-type="DOI">10.1038/s41561-023-01247-7</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib50"><label>50</label><mixed-citation>Towner, J., Cloke, H. L., Zsoter, E., Flamig, Z., Hoch, J. M., Bazo, J., Coughlan de Perez, E., and Stephens, E. M.: Assessing the performance of global hydrological models for capturing peak river flows in the Amazon basin, Hydrol. Earth Syst. Sci., 23, 3057–3080, <ext-link xlink:href="https://doi.org/10.5194/hess-23-3057-2019" ext-link-type="DOI">10.5194/hess-23-3057-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bib51"><label>51</label><mixed-citation>Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.: Attention is all you need, in: Advances in Neural Information Processing Systems, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.1706.03762" ext-link-type="DOI">10.48550/arXiv.1706.03762</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bib52"><label>52</label><mixed-citation>Vonk, J. E., Fritz, M., Speetjens, N. J., Babin, M., Bartsch, A., Basso, L. S., Bröder, L., Göckede, M., Gustafsson, Ö., Hugelius, G., Irrgang, A. M., Juhls, B., Kuhn, M. A., Lantuit, H., Manizza, M., Martens, J., O'Regan, M., Suslova, A., Tank, S. E., Terhaar, J., and Zolkos, S.: The land–ocean arctic carbon cycle, Nat. Rev. Earth Environ., 6, 86–105, <ext-link xlink:href="https://doi.org/10.1038/s43017-024-00627-w" ext-link-type="DOI">10.1038/s43017-024-00627-w</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bib53"><label>53</label><mixed-citation>Walvoord, M. A. and Kurylyk, B. L.: Hydrologic impacts of thawing permafrost – a review, Vadose Zone J., 15, 1–20,  <ext-link xlink:href="https://doi.org/10.2136/vzj2016.01.0010" ext-link-type="DOI">10.2136/vzj2016.01.0010</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bib54"><label>54</label><mixed-citation>Wang, P., Huang, Q., Pozdniakov, S. P., Liu, S., Ma, N., Wang, T., Zhang, Y., Yu, J., Xie, J., Fu, G., Frolova, N. L., and Liu, C.: Potential role of permafrost thaw on increasing siberian river discharge, Environ. Res. Lett., 16, 34046,  <ext-link xlink:href="https://doi.org/10.1088/1748-9326/abe326" ext-link-type="DOI">10.1088/1748-9326/abe326</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib55"><label>55</label><mixed-citation>Woo, M.-K., Kane, D. L., Carey, S. K., and Yang, D.: Progress in permafrost hydrology in the new millennium, Permafrost Periglac., 19, 237–254,  <ext-link xlink:href="https://doi.org/10.1002/ppp.613" ext-link-type="DOI">10.1002/ppp.613</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bib56"><label>56</label><mixed-citation>Xie, K., Liu, P., Zhang, J., Han, D., Wang, G., and Shen, C.: Physics-guided deep learning for rainfall–runoff modeling by considering extreme events and monotonic relationships, J. Hydrol., 603, 127043,  <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2021.127043" ext-link-type="DOI">10.1016/j.jhydrol.2021.127043</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bib57"><label>57</label><mixed-citation>Yang, D., Kane, D. L., Hinzman, L. D., Zhang, X., Zhang, T., and Ye, H.: Siberian lena river hydrologic regime and recent change, J. Geophys. Res.-Atmos., 107,  <ext-link xlink:href="https://doi.org/10.1029/2002JD002542" ext-link-type="DOI">10.1029/2002JD002542</ext-link>, 2002.</mixed-citation></ref>
      <ref id="bib1.bib58"><label>58</label><mixed-citation>Yang, S., Yang, D., Chen, J., Santisirisomboon, J., Lu, W., and Zhao, B.: A physical process and machine learning combined hydrological model for daily streamflow simulations of large watersheds with limited observation data, J. Hydrol., 590, 125206,  <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2020.125206" ext-link-type="DOI">10.1016/j.jhydrol.2020.125206</ext-link>, 2020. </mixed-citation></ref>
      <ref id="bib1.bib59"><label>59</label><mixed-citation>Ye, B., Yang, D., and Kane, D. L.: Changes in lena river streamflow hydrology: human impacts versus natural variations, Water Resour. Res., 39, 2003WR001991,  <ext-link xlink:href="https://doi.org/10.1029/2003WR001991" ext-link-type="DOI">10.1029/2003WR001991</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bib60"><label>60</label><mixed-citation>Zhang, S., Gan, T. Y., Bush, A. B. G., and Zhang, G.: Evaluation of the impact of climate change on the streamflow of major pan-arctic river basins through machine learning models, J. Hydrol., 619, 129295,  <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2023.129295" ext-link-type="DOI">10.1016/j.jhydrol.2023.129295</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib61"><label>61</label><mixed-citation>Zhi, W., Ouyang, W., Shen, C., and Li, L.: Temperature outweighs light and flow as the predominant driver of dissolved oxygen in US rivers, Nat. Water, 1, 249–260,  <ext-link xlink:href="https://doi.org/10.1038/s44221-023-00038-z" ext-link-type="DOI">10.1038/s44221-023-00038-z</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bib62"><label>62</label><mixed-citation>Zhong, L., Lei, H., and Yang, J.: Development of a distributed physics-informed deep learning hydrological model for data-scarce regions, Water Resour. Res., 60, e2023WR036333,  <ext-link xlink:href="https://doi.org/10.1029/2023WR036333" ext-link-type="DOI">10.1029/2023WR036333</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bib63"><label>63</label><mixed-citation>Zhou, R.: Multi-scale dynamic spatiotemporal graph attention network for forecasting karst spring discharge, J. Hydrol., 133289,  <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2025.133289" ext-link-type="DOI">10.1016/j.jhydrol.2025.133289</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bib64"><label>64</label><mixed-citation>Zhou, R.: Zhou-R/HESS_KAN: v0.01 (v0.0.1), Zenodo [data set] and [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.19862397" ext-link-type="DOI">10.5281/zenodo.19862397</ext-link>, 2026.</mixed-citation></ref>
      <ref id="bib1.bib65"><label>65</label><mixed-citation>Zhou, R. and Zhang, Y.: On the role of the architecture for spring discharge prediction with deep learning approaches, Hydrol. Process., 36,  <ext-link xlink:href="https://doi.org/10.1002/hyp.14737" ext-link-type="DOI">10.1002/hyp.14737</ext-link>, 2022a.</mixed-citation></ref>
      <ref id="bib1.bib66"><label>66</label><mixed-citation>Zhou, R. and Zhang, Y.: Reconstruction of missing spring discharge by using deep learning models with ensemble empirical mode decomposition of precipitation, Environ. Sci. Pollut. R.,  <ext-link xlink:href="https://doi.org/10.1007/s11356-022-21597-w" ext-link-type="DOI">10.1007/s11356-022-21597-w</ext-link>, 2022b.</mixed-citation></ref>
      <ref id="bib1.bib67"><label>67</label><mixed-citation>Zhou, R. and Zhang, Y.: Linear and nonlinear ensemble deep learning models for karst spring discharge forecasting, J. Hydrol., 627, 130394,  <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2023.130394" ext-link-type="DOI">10.1016/j.jhydrol.2023.130394</ext-link>, 2023a.</mixed-citation></ref>
      <ref id="bib1.bib68"><label>68</label><mixed-citation>Zhou, R. and Zhang, Y.: Predicting and explaining karst spring dissolved oxygen using interpretable deep learning approach, Hydrol. Process., 37, e14948,  <ext-link xlink:href="https://doi.org/10.1002/hyp.14948" ext-link-type="DOI">10.1002/hyp.14948</ext-link>, 2023b.</mixed-citation></ref>
      <ref id="bib1.bib69"><label>69</label><mixed-citation>Zhou, R., Zhang, Y., Wang, Q., Jin, A., and Shi, W.: A hybrid self-adaptive DWT-WaveNet-LSTM deep learning architecture for karst spring forecasting, J. Hydrol., 634, 131128,  <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2024.131128" ext-link-type="DOI">10.1016/j.jhydrol.2024.131128</ext-link>, 2024a.</mixed-citation></ref>
      <ref id="bib1.bib70"><label>70</label><mixed-citation>Zhou, R., Wang, Q., Jin, A., Shi, W., and Liu, S.: Interpretable multi-step hybrid deep learning model for karst spring discharge prediction: integrating temporal fusion transformers with ensemble empirical mode decomposition, J. Hydrol., 132235,  <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2024.132235" ext-link-type="DOI">10.1016/j.jhydrol.2024.132235</ext-link>, 2024b.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>A hybrid Kolmogorov–Arnold Networks-based model with attention for predicting Arctic river streamflow</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>1</label><mixed-citation>
       Alzubaidi, L., Bai, J., Al-Sabaawi, A., Santamaría, J., Albahri, A. S., Al-dabbagh, B. S. N., Fadhel, M. A., Manoufali, M., Zhang, J., Al-Timemy, A. H., Duan, Y., Abdullah, A., Farhan, L., Lu, Y., Gupta, A., Albu, F., Abbosh, A., and Gu, Y.: A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications, J. Big Data, 10, 46, <a href="https://doi.org/10.1186/s40537-023-00727-2" target="_blank">https://doi.org/10.1186/s40537-023-00727-2</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>2</label><mixed-citation>
       Andersson, T. R., Hosking, J. S., Pérez-Ortiz, M., Paige, B., Elliott, A., Russell, C., Law, S., Jones, D. C., Wilkinson, J., Phillips, T., Byrne, J., Tietsche, S., Sarojini, B. B., Blanchard-Wrigglesworth, E., Aksenov, Y., Downie, R., and Shuckburgh, E.: Seasonal arctic sea ice forecasting with probabilistic deep learning, Nat. Commun., 12, 5124, <a href="https://doi.org/10.1038/s41467-021-25257-4" target="_blank">https://doi.org/10.1038/s41467-021-25257-4</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>3</label><mixed-citation>
       Bakhshi Ostadkalayeh, F., Moradi, S., Asadi, A., Moghaddam Nia, A., and Taheri, S.: Performance improvement of LSTM-based deep learning model for streamflow forecasting using kalman filtering, Water Resour. Manage., 37, 3111–3127,  <a href="https://doi.org/10.1007/s11269-023-03492-2" target="_blank">https://doi.org/10.1007/s11269-023-03492-2</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>4</label><mixed-citation>
       Basu, B., Morrissey, P., and Gill, L. W.: Application of nonlinear time series and machine learning algorithms for forecasting groundwater flooding in a lowland karst area, Water Resour. Res., 58, e2021WR029576,  <a href="https://doi.org/10.1029/2021WR029576" target="_blank">https://doi.org/10.1029/2021WR029576</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>5</label><mixed-citation>
       Bring, A., Fedorova, I., Dibike, Y., Hinzman, L., Mård, J., Mernild, S. H., Prowse, T., Semenova, O., Stuefer, S. L., and Woo, M.-K.: Arctic terrestrial hydrology: a synthesis of processes, regional effects, and research challenges, J. Geophys. Res.-Biogeo., 121, 621–649, <a href="https://doi.org/10.1002/2015JG003131" target="_blank">https://doi.org/10.1002/2015JG003131</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>6</label><mixed-citation>
       Chang, S. Y., Schwenk, J., and Solander, K. C.: Deep learning advances arctic river water temperature predictions, Water Resour. Res., 61, e2024WR039053,  <a href="https://doi.org/10.1029/2024WR039053" target="_blank">https://doi.org/10.1029/2024WR039053</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>7</label><mixed-citation>
       Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., and Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation, arXiv [preprint], <a href="https://doi.org/10.48550/ARXIV.1406.1078" target="_blank">https://doi.org/10.48550/ARXIV.1406.1078</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>8</label><mixed-citation>
       Cinkus, G., Mazzilli, N., Jourde, H., Wunsch, A., Liesch, T., Ravbar, N., Chen, Z., and Goldscheider, N.: When best is the enemy of good – critical evaluation of performance criteria in hydrological models, Hydrol. Earth Syst. Sci., 27, 2397–2411, <a href="https://doi.org/10.5194/hess-27-2397-2023" target="_blank">https://doi.org/10.5194/hess-27-2397-2023</a>, 2023. 
    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>9</label><mixed-citation>
       DeWalle, D. R. and Rango, A.: Principles of snow hydrology, 1st edn., Cambridge University Press,  <a href="https://doi.org/10.1017/CBO9780511535673" target="_blank">https://doi.org/10.1017/CBO9780511535673</a>, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>10</label><mixed-citation>
       Ernakovich, J. G., Hopping, K. A., Berdanier, A. B., Simpson, R. T., Kachergis, E. J., Steltzer, H., and Wallenstein, M. D.: Predicted responses of arctic and alpine ecosystems to altered seasonality under climate change, Glob. Change Biol., 20, 3256–3269,  <a href="https://doi.org/10.1111/gcb.12568" target="_blank">https://doi.org/10.1111/gcb.12568</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>11</label><mixed-citation>
       Feng, D., Gleason, C. J., Lin, P., Yang, X., Pan, M., and Ishitsuka, Y.: Recent changes to arctic river discharge, Nat. Commun., 12, 6917,  <a href="https://doi.org/10.1038/s41467-021-27228-1" target="_blank">https://doi.org/10.1038/s41467-021-27228-1</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>12</label><mixed-citation>
       Gao, S., Huang, Y., Zhang, S., Han, J., Wang, G., Zhang, M., and Lin, Q.: Short-term runoff prediction with GRU and LSTM networks without requiring time step optimization during sample generation, J. Hydrol., 589, 125188,  <a href="https://doi.org/10.1016/j.jhydrol.2020.125188" target="_blank">https://doi.org/10.1016/j.jhydrol.2020.125188</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>13</label><mixed-citation>
       Gelfan, A., Gustafsson, D., Motovilov, Y., Arheimer, B., Kalugin, A., Krylenko, I., and Lavrenov, A.: Climate change impact on the water regime of two great arctic rivers: modeling and uncertainty issues, Climatic Change, 141, 499–515,  <a href="https://doi.org/10.1007/s10584-016-1710-5" target="_blank">https://doi.org/10.1007/s10584-016-1710-5</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>14</label><mixed-citation>
       Granata, F., Zhu, S., and Di Nunno, F.: Advanced streamflow forecasting for central european rivers: the cutting-edge kolmogorov-arnold networks compared to transformers, J. Hydrol., 645, 132175,  <a href="https://doi.org/10.1016/j.jhydrol.2024.132175" target="_blank">https://doi.org/10.1016/j.jhydrol.2024.132175</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>15</label><mixed-citation>
       Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.: Decomposition of the mean squared error and NSE performance criteria: implications for improving hydrological modelling, J. Hydrol., 377, 80–91,  <a href="https://doi.org/10.1016/j.jhydrol.2009.08.003" target="_blank">https://doi.org/10.1016/j.jhydrol.2009.08.003</a>, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>16</label><mixed-citation>
       Gusev, E. M., Nasonova, O. N., and Dzhogan, L. Y.: Physically based simulating long-term dynamics of diurnal variations of river runoff and snow water equivalent in the kolyma river basin, Water Resour., 42, 834–841,  <a href="https://doi.org/10.1134/S0097807815060056" target="_blank">https://doi.org/10.1134/S0097807815060056</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>17</label><mixed-citation>
       Häkkinen, S. and Mellor, G. L.: Modeling the seasonal variability of a coupled arctic ice-ocean system, J. Geophys. Res.-Oceans, 97, 20285–20304, <a href="https://doi.org/10.1029/92JC02037" target="_blank">https://doi.org/10.1029/92JC02037</a>, 1992.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>18</label><mixed-citation>
       Harpold, A. A., Kaplan, M. L., Klos, P. Z., Link, T., McNamara, J. P., Rajagopal, S., Schumer, R., and Steele, C. M.: Rain or snow: hydrologic processes, observations, prediction, and research needs, Hydrol. Earth Syst. Sci., 21, 1–22, <a href="https://doi.org/10.5194/hess-21-1-2017" target="_blank">https://doi.org/10.5194/hess-21-1-2017</a>, 2017. 
    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>19</label><mixed-citation>
       Harris, I., Osborn, T. J., Jones, P., and Lister, D.: Version 4 of the CRU TS monthly high-resolution gridded multivariate climate dataset, Sci. Data, 7, 109,  <a href="https://doi.org/10.1038/s41597-020-0453-3" target="_blank">https://doi.org/10.1038/s41597-020-0453-3</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>20</label><mixed-citation>
       Hochreiter, S. and Schmidhuber, J.: Long short-term memory, Neural Comput., 9, 1735–1780,  <a href="https://doi.org/10.1162/neco.1997.9.8.1735" target="_blank">https://doi.org/10.1162/neco.1997.9.8.1735</a>, 1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>21</label><mixed-citation>
       Hock, R.: Temperature index melt modelling in mountain areas, J. Hydrol., 282, 104–115,  <a href="https://doi.org/10.1016/S0022-1694(03)00257-9" target="_blank">https://doi.org/10.1016/S0022-1694(03)00257-9</a>, 2003.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>22</label><mixed-citation>
       Holmes, R. M., McClelland, J. W., Peterson, B. J., Tank, S. E., Bulygina, E., Eglinton, T. I., Gordeev, V. V., Gurtovaya, T. Y., Raymond, P. A., Repeta, D. J., Staples, R., Striegl, R. G., Zhulidov, A. V., and Zimov, S. A.: Seasonal and annual fluxes of nutrients and organic matter from large rivers to the Arctic Ocean and surrounding seas, Estuar. Coast., 35, 369–382,  <a href="https://doi.org/10.1007/s12237-011-9386-6" target="_blank">https://doi.org/10.1007/s12237-011-9386-6</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>23</label><mixed-citation>
       Jin, A., Wang, Q., Zhan, H., and Zhou, R.: Comparative performance assessment of physical-based and data-driven machine-learning models for simulating streamflow: a case study in three catchments across the US, J. Hydrol. Eng., 29, 5024004,  <a href="https://doi.org/10.1061/JHYEFF.HEENG-6118" target="_blank">https://doi.org/10.1061/JHYEFF.HEENG-6118</a>, 2024a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>24</label><mixed-citation>
       Jin, A., Wang, Q., Zhou, R., Shi, W., and Qiao, X.: Hybrid multivariate machine learning models for streamflow forecasting: a two-stage decomposition–reconstruction framework, J. Hydrol. Eng., 29, 4024026,  <a href="https://doi.org/10.1061/JHYEFF.HEENG-6254" target="_blank">https://doi.org/10.1061/JHYEFF.HEENG-6254</a>, 2024b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>25</label><mixed-citation>
       Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., and Yang, L.: Physics-informed machine learning, Nat. Rev. Phys., 3, 422–440,  <a href="https://doi.org/10.1038/s42254-021-00314-5" target="_blank">https://doi.org/10.1038/s42254-021-00314-5</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>26</label><mixed-citation>
       Kling, H., Fuchs, M., and Paulin, M.: Runoff conditions in the upper danube basin under an ensemble of climate change scenarios, J. Hydrol., 424, 264–277,  <a href="https://doi.org/10.1016/j.jhydrol.2012.01.011" target="_blank">https://doi.org/10.1016/j.jhydrol.2012.01.011</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>27</label><mixed-citation>
       Knoben, W. J. M., Freer, J. E., and Woods, R. A.: Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores, Hydrol. Earth Syst. Sci., 23, 4323–4331, <a href="https://doi.org/10.5194/hess-23-4323-2019" target="_blank">https://doi.org/10.5194/hess-23-4323-2019</a>, 2019. 
    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>28</label><mixed-citation>
       Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005–6022, <a href="https://doi.org/10.5194/hess-22-6005-2018" target="_blank">https://doi.org/10.5194/hess-22-6005-2018</a>, 2018. 
    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>29</label><mixed-citation>
       Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, G.: Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089–5110, <a href="https://doi.org/10.5194/hess-23-5089-2019" target="_blank">https://doi.org/10.5194/hess-23-5089-2019</a>, 2019. 
    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>30</label><mixed-citation>
       Krogh, S. A., Pomeroy, J. W., and Marsh, P.: Diagnosis of the hydrology of a small arctic basin at the tundra–taiga transition using a physically based hydrological model, J. Hydrol., 550, 685–703,  <a href="https://doi.org/10.1016/j.jhydrol.2017.05.042" target="_blank">https://doi.org/10.1016/j.jhydrol.2017.05.042</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>31</label><mixed-citation>
       Kůrková, V.: Kolmogorov's theorem and multilayer neural networks, Neural Networks, 5, 501–506, <a href="https://doi.org/10.1016/0893-6080(92)90012-8" target="_blank">https://doi.org/10.1016/0893-6080(92)90012-8</a>, 1992.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>32</label><mixed-citation>
       LeCun, Y., Bottou, L., Orr, G. B., and Müller, K.-R.: Efficient BackProp, in: Neural networks: tricks of the trade, vol. 1524, edited by: Orr, G. B. and Müller, K.-R., Springer Berlin Heidelberg, Berlin, Heidelberg, 9–50, <a href="https://doi.org/10.1007/3-540-49430-8_2" target="_blank">https://doi.org/10.1007/3-540-49430-8_2</a>, 1998.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>33</label><mixed-citation>
       Liu, S., Wang, P., Yu, J., Zhou, R., Bai, B., Gabysheva, O. I., Frolova, N. L., and Pozdniakov, S. P.: Changes in hydrological regime regulate POC export across permafrost-dominated arctic river basins, Geosci. Front., 102208,  <a href="https://doi.org/10.1016/j.gsf.2025.102208" target="_blank">https://doi.org/10.1016/j.gsf.2025.102208</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>34</label><mixed-citation>
       Liu, Z., Wang, Y., Vaidya, S., Ruehle, F., Halverson, J., Soljačić, M., Hou, T. Y., and Tegmark, M.: KAN: Kolmogorov–Arnold Networks, arXiv [preprint], <a href="https://doi.org/10.48550/ARXIV.2404.19756" target="_blank">https://doi.org/10.48550/ARXIV.2404.19756</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>35</label><mixed-citation>
       McClelland, J. W., Holmes, R. M., Peterson, B. J., and Stieglitz, M.: Increasing river discharge in the eurasian arctic: consideration of dams, permafrost thaw, and fires as potential agents of change, J. Geophys. Res.-Atmos., 109, 2004JD004583,  <a href="https://doi.org/10.1029/2004JD004583" target="_blank">https://doi.org/10.1029/2004JD004583</a>, 2004.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>36</label><mixed-citation>
       Moriasi, D. N., Arnold, J. G., Liew, M. W. V., Bingner, R. L., Harmel, R. D., and Veith, T. L.: Model evaluation guidelines for systematic quantification of accuracy in watershed simulations, T. ASABE, 50, 885–900,  <a href="https://doi.org/10.13031/2013.23153" target="_blank">https://doi.org/10.13031/2013.23153</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>37</label><mixed-citation>
       Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D., Frame, J. M., Prieto, C., and Gupta, H. V.: What role does hydrological science play in the age of machine learning?, Water Resour. Res., 57, e2020WR028091,  <a href="https://doi.org/10.1029/2020WR028091" target="_blank">https://doi.org/10.1029/2020WR028091</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>38</label><mixed-citation>
       Nijssen, B., O'Donnell, G. M., Hamlet, A. F., and Lettenmaier, D. P.: Hydrologic sensitivity of global rivers to climate change, Climatic Change, 50, 143–175,  <a href="https://doi.org/10.1023/A:1010616428763" target="_blank">https://doi.org/10.1023/A:1010616428763</a>, 2001.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>39</label><mixed-citation>
       Peterson, B. J., Holmes, R. M., McClelland, J. W., Vörösmarty, C. J., Lammers, R. B., Shiklomanov, A. I., Shiklomanov, I. A., and Rahmstorf, S.: Increasing river discharge to the Arctic Ocean, Science, 298, 2171–2173, <a href="https://doi.org/10.1126/science.1077445" target="_blank">https://doi.org/10.1126/science.1077445</a>, 2002.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>40</label><mixed-citation>
       Pölz, A., Blaschke, A. P., Komma, J., Farnleitner, A. H., and Derx, J.: Transformer versus LSTM: a comparison of deep learning models for karst spring discharge forecasting, Water Resour. Res., 60, e2022WR032602, <a href="https://doi.org/10.1029/2022WR032602" target="_blank">https://doi.org/10.1029/2022WR032602</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>41</label><mixed-citation>
       Prowse, T., Alfredsen, K., Beltaos, S., Bonsal, B. R., Bowden, W. B., Duguay, C. R., Korhola, A., McNamara, J., Vincent, W. F., Vuglinsky, V., Walter Anthony, K. M., and Weyhenmeyer, G. A.: Effects of changes in arctic lake and river ice, Ambio, 40, 63–74,  <a href="https://doi.org/10.1007/s13280-011-0217-6" target="_blank">https://doi.org/10.1007/s13280-011-0217-6</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>42</label><mixed-citation>
       Rawlins, M. A. and Karmalkar, A. V.: Regime shifts in Arctic terrestrial hydrology manifested from impacts of climate warming, The Cryosphere, 18, 1033–1052, <a href="https://doi.org/10.5194/tc-18-1033-2024" target="_blank">https://doi.org/10.5194/tc-18-1033-2024</a>, 2024. 
    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>43</label><mixed-citation>
       Schneider, U., Hänsel, S., Finger, P., Rustemeier, E., and Ziese, M.: GPCC full data monthly version 2022 at 2.5°: monthly land–surface precipitation from rain-gauges built on GTS-based and historic data: globally gridded monthly totals (2022), <a href="https://doi.org/10.5676/DWD_GPCC/FD_M_V2022_250" target="_blank">https://doi.org/10.5676/DWD_GPCC/FD_M_V2022_250</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>44</label><mixed-citation>
       Sergeev, A., Baglaeva, E., and Subbotina, I.: Hybrid model combining LSTM with discrete wavelet transformation to predict surface methane concentration in the arctic island belyy, Atmos. Environ., 317, 120210,  <a href="https://doi.org/10.1016/j.atmosenv.2023.120210" target="_blank">https://doi.org/10.1016/j.atmosenv.2023.120210</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>45</label><mixed-citation>
       Singh, A., Kalke, H., Loewen, M., and Ray, N.: River ice segmentation with deep learning, IEEE T. Geosci. Remote, 58, 7570–7579,  <a href="https://doi.org/10.1109/TGRS.2020.2981082" target="_blank">https://doi.org/10.1109/TGRS.2020.2981082</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>46</label><mixed-citation>
       Snieder, E. and Khan, U. T.: A diversity-centric strategy for the selection of spatio-temporal training data for LSTM-based streamflow forecasting, Hydrol. Earth Syst. Sci., 29, 785–798, <a href="https://doi.org/10.5194/hess-29-785-2025" target="_blank">https://doi.org/10.5194/hess-29-785-2025</a>, 2025. 
    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>47</label><mixed-citation>
       Spencer, R. G. M., Mann, P. J., Dittmar, T., Eglinton, T. I., McIntyre, C., Holmes, R. M., Zimov, N., and Stubbins, A.: Detecting the signature of permafrost thaw in arctic rivers, Geophys. Res. Lett., 42, 2830–2835,  <a href="https://doi.org/10.1002/2015GL063498" target="_blank">https://doi.org/10.1002/2015GL063498</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>48</label><mixed-citation>
       Suzuki, K., Liston, G. E., and Matsuo, K.: Estimation of continental-basin-scale sublimation in the lena river basin, siberia, Adv. Meteorol., 2015, 1–14,  <a href="https://doi.org/10.1155/2015/286206" target="_blank">https://doi.org/10.1155/2015/286206</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>49</label><mixed-citation>
       Tank, S. E., McClelland, J. W., Spencer, R. G. M., Shiklomanov, A. I., Suslova, A., Moatar, F., Amon, R. M. W., Cooper, L. W., Elias, G., Gordeev, V. V., Guay, C., Gurtovaya, T. Yu., Kosmenko, L. S., Mutter, E. A., Peterson, B. J., Peucker-Ehrenbrink, B., Raymond, P. A., Schuster, P. F., Scott, L., Staples, R., Striegl, R. G., Tretiakov, M., Zhulidov, A. V., Zimov, N., Zimov, S., and Holmes, R. M.: Recent trends in the chemistry of major northern rivers signal widespread arctic change, Nat. Geosci., 16, 789–796,  <a href="https://doi.org/10.1038/s41561-023-01247-7" target="_blank">https://doi.org/10.1038/s41561-023-01247-7</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>50</label><mixed-citation>
       Towner, J., Cloke, H. L., Zsoter, E., Flamig, Z., Hoch, J. M., Bazo, J., Coughlan de Perez, E., and Stephens, E. M.: Assessing the performance of global hydrological models for capturing peak river flows in the Amazon basin, Hydrol. Earth Syst. Sci., 23, 3057–3080, <a href="https://doi.org/10.5194/hess-23-3057-2019" target="_blank">https://doi.org/10.5194/hess-23-3057-2019</a>, 2019. 
    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>51</label><mixed-citation>
       Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I.: Attention is all you need, in: Advances in Neural Information Processing Systems, arXiv [preprint], <a href="https://doi.org/10.48550/arXiv.1706.03762" target="_blank">https://doi.org/10.48550/arXiv.1706.03762</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>52</label><mixed-citation>
       Vonk, J. E., Fritz, M., Speetjens, N. J., Babin, M., Bartsch, A., Basso, L. S., Bröder, L., Göckede, M., Gustafsson, Ö., Hugelius, G., Irrgang, A. M., Juhls, B., Kuhn, M. A., Lantuit, H., Manizza, M., Martens, J., O'Regan, M., Suslova, A., Tank, S. E., Terhaar, J., and Zolkos, S.: The land–ocean arctic carbon cycle, Nat. Rev. Earth Environ., 6, 86–105, <a href="https://doi.org/10.1038/s43017-024-00627-w" target="_blank">https://doi.org/10.1038/s43017-024-00627-w</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>53</label><mixed-citation>
       Walvoord, M. A. and Kurylyk, B. L.: Hydrologic impacts of thawing permafrost – a review, Vadose Zone J., 15, 1–20,  <a href="https://doi.org/10.2136/vzj2016.01.0010" target="_blank">https://doi.org/10.2136/vzj2016.01.0010</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>54</label><mixed-citation>
       Wang, P., Huang, Q., Pozdniakov, S. P., Liu, S., Ma, N., Wang, T., Zhang, Y., Yu, J., Xie, J., Fu, G., Frolova, N. L., and Liu, C.: Potential role of permafrost thaw on increasing siberian river discharge, Environ. Res. Lett., 16, 34046,  <a href="https://doi.org/10.1088/1748-9326/abe326" target="_blank">https://doi.org/10.1088/1748-9326/abe326</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>55</label><mixed-citation>
       Woo, M.-K., Kane, D. L., Carey, S. K., and Yang, D.: Progress in permafrost hydrology in the new millennium, Permafrost Periglac., 19, 237–254,  <a href="https://doi.org/10.1002/ppp.613" target="_blank">https://doi.org/10.1002/ppp.613</a>, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>56</label><mixed-citation>
       Xie, K., Liu, P., Zhang, J., Han, D., Wang, G., and Shen, C.: Physics-guided deep learning for rainfall–runoff modeling by considering extreme events and monotonic relationships, J. Hydrol., 603, 127043,  <a href="https://doi.org/10.1016/j.jhydrol.2021.127043" target="_blank">https://doi.org/10.1016/j.jhydrol.2021.127043</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>57</label><mixed-citation>
       Yang, D., Kane, D. L., Hinzman, L. D., Zhang, X., Zhang, T., and Ye, H.: Siberian lena river hydrologic regime and recent change, J. Geophys. Res.-Atmos., 107,  <a href="https://doi.org/10.1029/2002JD002542" target="_blank">https://doi.org/10.1029/2002JD002542</a>, 2002.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>58</label><mixed-citation>
       Yang, S., Yang, D., Chen, J., Santisirisomboon, J., Lu, W., and Zhao, B.: A physical process and machine learning combined hydrological model for daily streamflow simulations of large watersheds with limited observation data, J. Hydrol., 590, 125206,  <a href="https://doi.org/10.1016/j.jhydrol.2020.125206" target="_blank">https://doi.org/10.1016/j.jhydrol.2020.125206</a>, 2020.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>59</label><mixed-citation>
       Ye, B., Yang, D., and Kane, D. L.: Changes in lena river streamflow hydrology: human impacts versus natural variations, Water Resour. Res., 39, 2003WR001991,  <a href="https://doi.org/10.1029/2003WR001991" target="_blank">https://doi.org/10.1029/2003WR001991</a>, 2003.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib60"><label>60</label><mixed-citation>
       Zhang, S., Gan, T. Y., Bush, A. B. G., and Zhang, G.: Evaluation of the impact of climate change on the streamflow of major pan-arctic river basins through machine learning models, J. Hydrol., 619, 129295,  <a href="https://doi.org/10.1016/j.jhydrol.2023.129295" target="_blank">https://doi.org/10.1016/j.jhydrol.2023.129295</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib61"><label>61</label><mixed-citation>
       Zhi, W., Ouyang, W., Shen, C., and Li, L.: Temperature outweighs light and flow as the predominant driver of dissolved oxygen in US rivers, Nat. Water, 1, 249–260,  <a href="https://doi.org/10.1038/s44221-023-00038-z" target="_blank">https://doi.org/10.1038/s44221-023-00038-z</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib62"><label>62</label><mixed-citation>
       Zhong, L., Lei, H., and Yang, J.: Development of a distributed physics-informed deep learning hydrological model for data-scarce regions, Water Resour. Res., 60, e2023WR036333,  <a href="https://doi.org/10.1029/2023WR036333" target="_blank">https://doi.org/10.1029/2023WR036333</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib63"><label>63</label><mixed-citation>
       Zhou, R.: Multi-scale dynamic spatiotemporal graph attention network for forecasting karst spring discharge, J. Hydrol., 133289,  <a href="https://doi.org/10.1016/j.jhydrol.2025.133289" target="_blank">https://doi.org/10.1016/j.jhydrol.2025.133289</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib64"><label>64</label><mixed-citation>
       Zhou, R.: Zhou-R/HESS_KAN: v0.01 (v0.0.1), Zenodo [data set] and [code], <a href="https://doi.org/10.5281/zenodo.19862397" target="_blank">https://doi.org/10.5281/zenodo.19862397</a>, 2026.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib65"><label>65</label><mixed-citation>
       Zhou, R. and Zhang, Y.: On the role of the architecture for spring discharge prediction with deep learning approaches, Hydrol. Process., 36,  <a href="https://doi.org/10.1002/hyp.14737" target="_blank">https://doi.org/10.1002/hyp.14737</a>, 2022a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib66"><label>66</label><mixed-citation>
       Zhou, R. and Zhang, Y.: Reconstruction of missing spring discharge by using deep learning models with ensemble empirical mode decomposition of precipitation, Environ. Sci. Pollut. R.,  <a href="https://doi.org/10.1007/s11356-022-21597-w" target="_blank">https://doi.org/10.1007/s11356-022-21597-w</a>, 2022b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib67"><label>67</label><mixed-citation>
       Zhou, R. and Zhang, Y.: Linear and nonlinear ensemble deep learning models for karst spring discharge forecasting, J. Hydrol., 627, 130394,  <a href="https://doi.org/10.1016/j.jhydrol.2023.130394" target="_blank">https://doi.org/10.1016/j.jhydrol.2023.130394</a>, 2023a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib68"><label>68</label><mixed-citation>
       Zhou, R. and Zhang, Y.: Predicting and explaining karst spring dissolved oxygen using interpretable deep learning approach, Hydrol. Process., 37, e14948,  <a href="https://doi.org/10.1002/hyp.14948" target="_blank">https://doi.org/10.1002/hyp.14948</a>, 2023b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib69"><label>69</label><mixed-citation>
       Zhou, R., Zhang, Y., Wang, Q., Jin, A., and Shi, W.: A hybrid self-adaptive DWT-WaveNet-LSTM deep learning architecture for karst spring forecasting, J. Hydrol., 634, 131128,  <a href="https://doi.org/10.1016/j.jhydrol.2024.131128" target="_blank">https://doi.org/10.1016/j.jhydrol.2024.131128</a>, 2024a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib70"><label>70</label><mixed-citation>
       Zhou, R., Wang, Q., Jin, A., Shi, W., and Liu, S.: Interpretable multi-step hybrid deep learning model for karst spring discharge prediction: integrating temporal fusion transformers with ensemble empirical mode decomposition, J. Hydrol., 132235,  <a href="https://doi.org/10.1016/j.jhydrol.2024.132235" target="_blank">https://doi.org/10.1016/j.jhydrol.2024.132235</a>, 2024b.

    </mixed-citation></ref-html>--></article>
