<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">HESS</journal-id><journal-title-group>
    <journal-title>Hydrology and Earth System Sciences</journal-title>
    <abbrev-journal-title abbrev-type="publisher">HESS</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Hydrol. Earth Syst. Sci.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1607-7938</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/hess-30-629-2026</article-id><title-group><article-title>When physics gets in the way: an entropy-based evaluation of conceptual constraints in hybrid hydrological models</article-title><alt-title>An entropy-based evaluation of conceptual constraints in hybrid hydrological models</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Álvarez Chaves</surname><given-names>Manuel</given-names></name>
          <email>manuel.alvarez-chaves@simtech.uni-stuttgart.de</email>
        <ext-link>https://orcid.org/0009-0002-8990-3785</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Acuña Espinoza</surname><given-names>Eduardo</given-names></name>
          
        <ext-link>https://orcid.org/0000-0001-5218-9800</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Ehret</surname><given-names>Uwe</given-names></name>
          
        <ext-link>https://orcid.org/0000-0003-3454-8755</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Guthke</surname><given-names>Anneli</given-names></name>
          
        <ext-link>https://orcid.org/0000-0003-2901-1603</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>Stuttgart Center for Simulation Science, Cluster of Excellence EXC 2075, University of Stuttgart, 70569 Stuttgart, Germany</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Institute of Water and Environment, Karlsruhe Institute of Technology (KIT), 76131 Karlsruhe, Germany</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Manuel Álvarez Chaves (manuel.alvarez-chaves@simtech.uni-stuttgart.de)</corresp></author-notes><pub-date><day>4</day><month>February</month><year>2026</year></pub-date>
      
      <volume>30</volume>
      <issue>3</issue>
      <fpage>629</fpage><lpage>658</lpage>
      <history>
        <date date-type="received"><day>9</day><month>April</month><year>2025</year></date>
           <date date-type="rev-request"><day>5</day><month>May</month><year>2025</year></date>
           <date date-type="rev-recd"><day>13</day><month>November</month><year>2025</year></date>
           <date date-type="accepted"><day>3</day><month>December</month><year>2025</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Manuel Álvarez Chaves et al.</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026.html">This article is available from https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026.html</self-uri><self-uri xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026.pdf">The full text article is available as a PDF file from https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e113">Merging physics-based with data-driven approaches in hybrid hydrological modeling offers new opportunities to enhance predictive accuracy while addressing challenges of model interpretability and fidelity. Traditional hydrological models, developed using physical principles, are easily interpretable but often limited by their rigidity and assumptions. In contrast, machine learning methods, such as Long Short-Term Memory (LSTM) networks, offer exceptional predictive performance but are often criticized for their black-box nature. Hybrid models aim to reconcile these approaches by imposing physics to constrain and understand what the ML part of the model does. This study introduces a quantitative metric based on Information Theory to evaluate the relative contributions of physics-based and data-driven components in hybrid models. Through synthetic examples and a large-sample case study, we examine the role of physics-based conceptual constraints: can we actually call the hybrid model “physics-constrained”, or does the data-driven component overwrite these constraints for the sake of performance? We test this on the arguably most constrained form of hybrid models, i.e., we prescribe structures of typical conceptual hydrological models and allow an LSTM to modify only its parameters over time, as learned during training against observed discharge data. Our findings indicate that performance predominantly relies on the data-driven component, with the physics-constraint often adding minimal value or even making the prediction problem harder. This observation challenges the assumption that integrating physics should enhance model performance by informing the LSTM. Even more alarming, the data-driven component is able to avoid (parts of) the conceptual constraint by driving certain parameters to insensitive constants or value sequences that effectively cancel out certain storage behavior. Our proposed approach helps to analyse such conditions in-depth, which provides valuable insights into model functioning, case study specifics, and the power or problems of prior knowledge prescribed in the form of conceptual constraints. Notably, our results also show that hybrid modeling may offer hints towards parsimonious model representations that capture dominant physical processes, but avoid illegitimate constraints. Overall, our framework can (1) uncover the true role of constraints in presumably “physics-constrained” machine learning, and (2) guide the development of more accurate representations of hydrological systems through careful evaluation of the utility of expert knowledge to tackle the prediction problem at hand.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>Deutsche Forschungsgemeinschaft</funding-source>
<award-id>EXC 2075–390740016</award-id>
<award-id>507884992</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e125">Hydrological models are essential tools for the management of water resources as well as scientific research. Due to their wide range of applications, the motivations and reasons behind the choices that lead to the specific usage of a model over another are not clear and the issue of adequacy in the choice of a model is not typically addressed <xref ref-type="bibr" rid="bib1.bibx45" id="paren.1"/>. Worryingly, the choice of a model is often relegated to past experience and not adequacy <xref ref-type="bibr" rid="bib1.bibx4" id="paren.2"/>.</p>
      <p id="d2e134">Some authors have argued for the creation of a Community Hydrology Model which could be able to represent different processes at different scales, making it suitable for a wide range of applications, but there are open challenges that need to be addressed before such a model can be developed <xref ref-type="bibr" rid="bib1.bibx80" id="paren.3"/>. In contrast, other authors support the concept of flexible modeling frameworks that enable users to combine different representations of processes and model constructs <xref ref-type="bibr" rid="bib1.bibx34 bib1.bibx21" id="paren.4"/>. Using this approach, a unique model can be developed for a specific application, and the issue of model adequacy is addressed by testing multiple models as different working hypotheses <xref ref-type="bibr" rid="bib1.bibx22" id="paren.5"/>.</p>
<sec id="Ch1.S1.SS1">
  <label>1.1</label><title>Conceptual rainfall-runoff models</title>
      <p id="d2e153">So far, the traditional modeling approach has been that of simplified physical concepts in which different compartments in the hydrological cycle are represented by interconnected storage units and these models obey physical principles. Thus, understanding of the physical system is translated into the model and vice-versa, making the models easily interpretable.</p>
      <p id="d2e156">Typically, catchment scale processes in a rainfall-runoff model are represented by a reservoir element that can be described by ordinary differential equations (ODEs):

                <disp-formula specific-use="gather" content-type="numbered"><mml:math id="M1" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E1"><mml:mtd><mml:mtext>1</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mi mathvariant="normal">d</mml:mi><mml:mi>S</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="normal">d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:mi>S</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:mi>u</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>|</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E2"><mml:mtd><mml:mtext>2</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi>Q</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi>g</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:mi>S</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>,</mml:mo><mml:mi>u</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo><mml:mo>|</mml:mo><mml:mi mathvariant="italic">θ</mml:mi></mml:mrow></mml:mfenced></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

          where <inline-formula><mml:math id="M2" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> represents the conceptual storage of a reservoir element at time <inline-formula><mml:math id="M3" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M4" display="inline"><mml:mrow><mml:mi>u</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is time-dependent forcing data, <inline-formula><mml:math id="M5" display="inline"><mml:mrow><mml:mi>Q</mml:mi><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the response of the reservoir element to the forcing and <inline-formula><mml:math id="M6" display="inline"><mml:mi mathvariant="italic">θ</mml:mi></mml:math></inline-formula> are the model parameters. Furthermore, <inline-formula><mml:math id="M7" display="inline"><mml:mi>f</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M8" display="inline"><mml:mi>g</mml:mi></mml:math></inline-formula> are functions that describe the evolution of storage and output with time <xref ref-type="bibr" rid="bib1.bibx34" id="paren.6"/>. These types of models are physically-based because the main driving principle of a model is conservation of mass through Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>).</p>
      <p id="d2e336">These very simple principles for conceptual rainfall-runoff models have been adapted into modular modeling frameworks such as FUSE <xref ref-type="bibr" rid="bib1.bibx21" id="paren.7"/>, Superflex <xref ref-type="bibr" rid="bib1.bibx34 bib1.bibx27" id="paren.8"/> and RAVEN <xref ref-type="bibr" rid="bib1.bibx26" id="paren.9"/>. These frameworks enable researchers to develop an unlimited range of modular structures for rainfall-runoff models. In practice, researchers apply these frameworks in model comparison studies using one of typically two approaches: top-down or bottom-up development. The top-down approach begins with a complex model and reduces its components, while the bottom-up approach starts with a simple model and gradually increases its complexity <xref ref-type="bibr" rid="bib1.bibx46" id="paren.10"/>.</p>
      <p id="d2e351">Other studies evaluate an existing set of standard model structures within these frameworks. Although the choice of components is often arbitrary and informed by prior experience, a paradigm for automatic model structure identification has been proposed which systematically tests and identifies the most adequate model structures for a rainfall-runoff model while acknowledging the challenge of equifinality <xref ref-type="bibr" rid="bib1.bibx74" id="paren.11"/>.</p>
</sec>
<sec id="Ch1.S1.SS2">
  <label>1.2</label><title>LSTMs</title>
      <p id="d2e366">Unlike the previous approach, machine learning (ML) and other purely data-driven approaches assume no prior knowledge and learn the required relationships between variables from the provided data alone. In particular, Long Short-Term Memory (LSTM) networks have been shown to provide very accurate predictions of streamflow establishing a number of benchmarks across different data sets <xref ref-type="bibr" rid="bib1.bibx52 bib1.bibx54 bib1.bibx55 bib1.bibx58 bib1.bibx61" id="paren.12"/>. The performance of these models can be partly attributed to the flexibility of LSTM networks (LSTMs hereafter) which do not have the constraints that physically-based models have.</p>
      <p id="d2e372">LSTMs <xref ref-type="bibr" rid="bib1.bibx44" id="paren.13"/> are a type of recurrent neural network (RNN) which has been widely adapted in hydrology for rainfall-runoff modeling and/or predicting streamflow <xref ref-type="bibr" rid="bib1.bibx56" id="paren.14"/>. More generally, RNNs and LSTMs have found applications in modeling dynamical systems <xref ref-type="bibr" rid="bib1.bibx37" id="paren.15"/>. Indeed, the reason they have been successful is that this type of neural network adds both memory (that is, states) and feedback to allow for the current output values to depend on past output values and states <xref ref-type="bibr" rid="bib1.bibx41" id="paren.16"/>. As mentioned previously, because a catchment can be represented as a set of ODEs which make it a dynamical system <xref ref-type="bibr" rid="bib1.bibx50" id="paren.17"/>, the usage of LSTMs for rainfall-runoff modeling arises naturally. Ultimately, both approaches: conceptual and data-driven models are complementary, and direct mappings between one another have been identified <xref ref-type="bibr" rid="bib1.bibx78" id="paren.18"/>.</p>
      <p id="d2e394">The issue of lacking mass conservation in LSTMs has been addressed by models which include an additional term that accounts for unobserved sinks, pointing towards deficiencies in data products <xref ref-type="bibr" rid="bib1.bibx36" id="paren.19"/> and this issue of closure is often a point of discussion and controversy <xref ref-type="bibr" rid="bib1.bibx15 bib1.bibx66" id="paren.20"/>. Nevertheless, the main criticism of these models comes from their “black-box” nature, which makes their internal processes difficult to understand. Current methods for interpreting neural networks typically require the use of a secondary model to analyze the primary one <xref ref-type="bibr" rid="bib1.bibx63" id="paren.21"/>. For example, while researchers have proposed techniques to correlate LSTM hidden states with real-world variables <xref ref-type="bibr" rid="bib1.bibx59" id="paren.22"/>, this interpretation process remains complex and requires the implementation of an additional model, known as a probe. In some cases, even the LSTM cell states themselves have shown successful correlation with the main drivers of the hydrological cycle <xref ref-type="bibr" rid="bib1.bibx53" id="paren.23"/>. Although interpreting LSTM states is feasible, researchers also address this challenge by selecting model architectures that are inherently more interpretable, although these approaches still often require supplementary models for comprehensive explainability <xref ref-type="bibr" rid="bib1.bibx28" id="paren.24"/>.</p>
</sec>
<sec id="Ch1.S1.SS3">
  <label>1.3</label><title>Hybrid models</title>
      <p id="d2e426">Recently, hybrid modeling approaches <xref ref-type="bibr" rid="bib1.bibx71" id="paren.25"/> have been proposed as end-to-end modeling systems that combine data-driven approaches with traditional physics-based models. Differentiable models <xref ref-type="bibr" rid="bib1.bibx72" id="paren.26"/> represent a particular subset of hybrid models that leverage deep neural networks and differentiable programming paradigms <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx18" id="paren.27"/> to calculate gradients with respect to model variables or parameters, enabling the discovery of unknown relationships. In the broader context of scientific machine learning, these models also belong to the framework of Universal Differential Equations (UDEs), which combine differential equations with neural networks to represent system dynamics <xref ref-type="bibr" rid="bib1.bibx67" id="paren.28"/>. The process of solving UDEs allows researchers to identify unknown functions and system dynamics from data while preserving the underlying mathematical structure of the equations.</p>
      <p id="d2e441">One of the first successful applications of the differentiable framework in hydrology was the work of <xref ref-type="bibr" rid="bib1.bibx76" id="text.29"/>, who used an early differentiable modeling approach named deep-parameter learning to regionally calibrate the HBV model <xref ref-type="bibr" rid="bib1.bibx14" id="paren.30"/> and identify spatial patterns in the calibrated model parameters within a large-scale case study. Their work demonstrated how large datasets could advance the understanding of hydrological processes through differentiable models by finding continuous spatial patterns for the parameters of a hydrological model. In terms of interpretability, this represents a major shift, as models calibrated using local optimization techniques often yield parameter estimates that vary greatly in space.</p>
      <p id="d2e450">This was followed by <xref ref-type="bibr" rid="bib1.bibx30" id="text.31"/>, who used differentiable models to achieve state-of-the-art performance in streamflow prediction on the CAMELS-US dataset <xref ref-type="bibr" rid="bib1.bibx5" id="paren.32"/>. Beyond prediction, the proposed models obtained accurate correlations with independent data products for evapotranspiration and baseflow index (BFI). This opens up opportunities for increased interpretability, by possibly constraining the hybrid model further with non-target variables and achieving “a process granularity that enables providing a narrative to stakeholders” <xref ref-type="bibr" rid="bib1.bibx30" id="paren.33"/>. Similar to the attempts of making LSTMs interpretable, comprehensive explainablity is arguably not reached yet, and it seems to come at the price of reduced accuracy.</p>
      <p id="d2e462">Subsequent work showed their suitability in ungauged settings <xref ref-type="bibr" rid="bib1.bibx31" id="paren.34"/> and on a global scale <xref ref-type="bibr" rid="bib1.bibx32" id="paren.35"/>. The pattern of correlation with external data continued at the global scale where the calculated evapotranspiration from differentiable models, an untrained variable, correlated with Moderate Resolution Imaging Spectroradiometer (MODIS) satellite observations. Differentiable models have also been used to address the numerical challenges of time-stepping models <xref ref-type="bibr" rid="bib1.bibx73" id="paren.36"/>. Beyond streamflow prediction, differentiable models have been successfully applied to stream temperature modeling <xref ref-type="bibr" rid="bib1.bibx68" id="paren.37"/> and photosynthesis simulations <xref ref-type="bibr" rid="bib1.bibx1" id="paren.38"/>.</p>
      <p id="d2e481">Other approaches to hybrid modeling include using dense neural networks embedded directly into hydrological models to improve process descriptions within the model itself <xref ref-type="bibr" rid="bib1.bibx60" id="paren.39"/>. Furthermore, the suggested deep parameter learning approach has been successfully applied and extended independently using the EXP-HYDRO model <xref ref-type="bibr" rid="bib1.bibx83" id="paren.40"/>, with the final hybrid model also obtaining good correlations with unobserved variables from the external ERA5-Land dataset <xref ref-type="bibr" rid="bib1.bibx64" id="paren.41"/>.</p>
      <p id="d2e493">Importantly, current hybrid model applications primarily take advantage of the ability of their data-driven components to exploit information from large datasets, leaving their effectiveness with smaller datasets as an open question. The data requirements for different hydrological modeling methods remain an active area of research <xref ref-type="bibr" rid="bib1.bibx56 bib1.bibx75" id="paren.42"/>.</p>
</sec>
<sec id="Ch1.S1.SS4">
  <label>1.4</label><title>Key idea</title>
      <p id="d2e507">Recent developments show an increasing integration of physics-based and data-driven approaches in hydrological modeling. This trend is evident in streamflow prediction, where researchers have successfully implemented both neural operator-based methods, such as NeuralODEs <xref ref-type="bibr" rid="bib1.bibx47" id="paren.43"/>,  and traditional statistical approaches <xref ref-type="bibr" rid="bib1.bibx20" id="paren.44"/>. These hybrid solutions increasingly blur the distinction between purely physics-based and purely data-driven modeling paradigms.</p>
      <p id="d2e516">Although this integration is gaining widespread adoption in hydrology, recent work by <xref ref-type="bibr" rid="bib1.bibx3" id="text.45"/> raises important questions that need to be addressed. They demonstrate that incorporating physics-based components or prior knowledge doesn't yield an improvement in model performance over a purely data-driven approach. Furthermore, hybrid models can perform well even when the incorporated physical principles oversimplify or misrepresent the underlying system, primarily because their data-driven components can compensate for these imposed limitations. Moreover, their results question the validity of using correlation with unobserved variables to justify this approach, as even models where the physics-based component misrepresents the hydrological system can achieve good correlations with unobserved variables from external data products.</p>
      <p id="d2e522">This observation raises fundamental questions about the value and meaning of incorporating physics-based components into data-driven hydrological models. While purely data-driven methods often achieve high performance, we lack systematic ways to evaluate when and how the addition of physical principles genuinely enhances model performance and improves the representation of underlying physical processes. This study addresses this knowledge gap through the following contributions:</p>
      <p id="d2e525"><list list-type="order">
            <list-item>

      <p id="d2e531">We introduce a quantitative metric to assess whether a hybrid model's performance is dominated by its data-driven or physics-based components in comparison to a purely data-driven benchmark;</p>
            </list-item>
            <list-item>

      <p id="d2e537">We demonstrate the characteristics of this metric under synthetic conditions, i.e. we guide the modeler's intuition about what to expect if the prescribed constraint is physically meaningful or not;</p>
            </list-item>
            <list-item>

      <p id="d2e543">We suggest a diagnostic evaluation routine to better understand the effective hybrid model's structure, not its (presumably) prescribed one based on the imposed conceptual model;</p>
            </list-item>
            <list-item>

      <p id="d2e549">We derive insights about the relative contribution of physics-based and data-driven components from applying this metric to a large-sample case study, illustrating how “physics may get in the way” under imperfectly known model settings.</p>
            </list-item>
          </list></p>
      <p id="d2e555">In particular, we measure the entropy of both the LSTM-predicted time-variable parameters and the LSTM hidden states to quantify how much the data-driven component of our hybrid model counteracts the conceptual model's prescribed constraints. Our hypothesis is that low entropy indicates the LSTM needs minimal parameter variation, suggesting the conceptual constraints accurately describe the natural system. Conversely, high entropy suggests inappropriate constraints (e.g., oversimplified or enforcing mass balance despite imperfect inputs). High entropy points to an imbalance where the data-driven component compensates for inadequacies in the conceptual model by manipulating its parameters. Subsequent evaluation of LSTM-learned parameters helps determine whether this is actually the case. If so, we hope to still identify physical principles within the hybrid model; otherwise, the term “physics-informed” would be proven misleading and attempts of interpretation lack foundation. Our proposed approach helps analyze such conditions in-depth, which provides valuable insights into model functioning, case study specifics, and the strength or limitations of prior knowledge prescribed in the form of conceptual constraints.</p>
      <p id="d2e558">Note that we focus on a typical single-task prediction (here: streamflow) to evaluate the value of adding prior process knowledge (here: rainfall-runoff) in the form of conceptual models to an LSTM network. Yet, we recognize the potential of hybrid models for multi-task learning, where models are evaluated on multiple objectives including multiple target variables, and anticipate that our proposed method can be readily extended to such evaluations in future work.</p>
      <p id="d2e561">We demonstrate our approach through two case studies. The first uses synthetic data with a known “true” model that accurately represents the system, allowing us to test our hypothesis and develop practical insights about our proposed metric. This example builds initial intuition for evaluating hybrid models by measuring entropy in both the conceptual model parameter space and LSTM hidden state space, demonstrating how performance can be attributed to either the data-driven or physics-based components.</p>
      <p id="d2e564">Our second case study applies these insights to a real-world dataset where no “true” model is known, further demonstrating the practical application of our metric. For this case study, we also examine LSTM models that receive the states and fluxes of a previously calibrated conceptual model as inputs. We analyze the entropy of the LSTM hidden states to explore how our proposed metric can help understand how a conceptual model may inform predictions made in a purely data-driven approach. Through these two real-world applications, we show that entropy can be used to analyze both data-driven models attempting to incorporate physical principles and physics-based conceptual models incorporating data-driven components.</p>
      <p id="d2e567">The remainder of the manuscript is structured as follows. Section <xref ref-type="sec" rid="Ch1.S2"/> details the types of models employed in this study, data for the case study and specific aspects of calculating differential entropy in higher dimensions. Sections <xref ref-type="sec" rid="Ch1.S3"/> and  <xref ref-type="sec" rid="Ch1.S4"/> cover the described case studies. Finally, Sect. <xref ref-type="sec" rid="Ch1.S5"/> summarizes our main findings and discusses avenues for future research.</p>
</sec>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Data and methods</title>
      <p id="d2e587">In this section, we outline the basic elements of our study, including the dataset employed across both case studies, models used, and the general methodological framework for training and evaluation. While this section provides a high-level overview of our methods, the subsequent case-specific sections will discuss more in-depth details, including hyperparameter configurations, architectural adaptations, data selection criteria, and other specific considerations unique to each experimental scenario.</p>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>CAMELS-GB</title>
      <p id="d2e597">CAMELS-GB is a large sample catchment hydrology dataset for Great Britain <xref ref-type="bibr" rid="bib1.bibx24" id="paren.46"/>. As with similar large-sample datasets <xref ref-type="bibr" rid="bib1.bibx5 bib1.bibx61" id="paren.47"/>, it collects data for streamflow, catchment attributes, and meteorological time-series data for 671 river basins across England, Scotland and Wales.</p>
      <p id="d2e606">As in <xref ref-type="bibr" rid="bib1.bibx3" id="text.48"/>, we based our experimental setup on the approach of <xref ref-type="bibr" rid="bib1.bibx58" id="text.49"/>. We provide a brief description here and refer readers to these studies as well as Appendix <xref ref-type="sec" rid="App1.Ch1.S1"/> in this article for further details.</p>
      <p id="d2e617">As forcing data, we used the time-series of catchment average values of precipitation, potential evapotranspiration and temperature in the dataset. In addition, as input for the LSTMs, we used 23 of the static attributes that describe the catchments in the dataset. Of these, 3 were related to topography, 6 to soil, 4 to land cover, 1 to human influence and 8 to climate characteristics. These are detailed in Table <xref ref-type="table" rid="TA3"/>. As part of the experimental setup, the data was divided into training, validation, and testing sets. The training set spans from 1 October 1980 to 31 December 1997; the validation set from 1 October 1975 to 30 September 1980; and the testing set from 1 January 1998   to 31 December 2008.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Models</title>
<sec id="Ch1.S2.SS2.SSS1">
  <label>2.2.1</label><title>LSTMs</title>
      <p id="d2e637">An LSTM is a type of recurrent neural network that effectively addresses the vanishing gradient problem through specialized memory cells with input, forget, and output gates. This architecture enables LSTMs to capture long-term dependencies in sequential data, making them valuable for time series prediction. Their capacity to learn temporal patterns without explicit physical parameterizations has proven particularly effective for modeling streamflow. For a more in-depth description of the applications of LSTMs in hydrology, we refer to the work of <xref ref-type="bibr" rid="bib1.bibx52" id="text.50"/>.</p>
</sec>
<sec id="Ch1.S2.SS2.SSS2">
  <label>2.2.2</label><title>Hybrid models</title>
      <p id="d2e651">The hybrid models used in our study follow the paradigm of <xref ref-type="bibr" rid="bib1.bibx72" id="text.51"/> and combine an LSTM network with a conceptual physics-based representation of the hydrological system. More specifically, our models resemble the proposed <inline-formula><mml:math id="M9" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula>HBV model <xref ref-type="bibr" rid="bib1.bibx30" id="paren.52"/>.</p>
      <p id="d2e667">Figure <xref ref-type="fig" rid="F1"/>a shows a “pure” LSTM network that serves as our baseline. Then, for each model in Fig. <xref ref-type="fig" rid="F1"/>b through <xref ref-type="fig" rid="F1"/>d, there is a coupling between an LSTM and a conceptual hydrological model. The model in Fig. <xref ref-type="fig" rid="F1"/>d uses the Simple Hydrological Model or SHM <xref ref-type="bibr" rid="bib1.bibx29" id="paren.53"/> as the conceptual component, which is a simplified version of the HBV model. As an alternative, the model in Fig. <xref ref-type="fig" rid="F1"/>b uses a “Bucket” model i.e. a simple conceptual model that represents the catchment water balance using a single storage. Finally, the model in Fig. <xref ref-type="fig" rid="F1"/>c uses a “Nonsense” model, a conceptual model that deliberately represents processes counter to common intuition: rainfall is immediately captured and stored as baseflow storage, then moves up a soil column to the unsaturated zone before being transformed into output streamflow.</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e688">Sketch of the hybrid models used in this study. The parameters in each model are encircled and highlighted in green. <bold>(a)</bold> LSTM, <bold>(b)</bold> Hybrid Bucket, <bold>(c)</bold> Hybrid Nonsense, <bold>(d)</bold> Hybrid SHM.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f01.png"/>

          </fig>

      <p id="d2e710">While SHM represents a model typically used in hydrological practice, the Bucket and Nonsense models serve as alternative hypotheses to test the limits of the hybrid modeling approach. These models were built using principles from modular frameworks that still find applications in hybrid modeling <xref ref-type="bibr" rid="bib1.bibx21 bib1.bibx34" id="paren.54"/>.</p>
      <p id="d2e716">In simple terms, the approach to hybrid modeling used here can be conceptualized as a hydrological model with dynamic parameters. In rainfall-runoff modeling, the use of dynamic parameters originated with data-based mechanistic modeling <xref ref-type="bibr" rid="bib1.bibx82" id="paren.55"/>, which established methods for identifying time-invariant parameters in relation to their time-variant counterparts. More recent approaches generate time-dependent parameters by introducing stochastic processes that represent deviations from calibrated static parameters <xref ref-type="bibr" rid="bib1.bibx69" id="paren.56"/>. In these methods, both static parameters and their variable components are jointly calibrated via Bayesian updating using Markov chain Monte Carlo. While theoretically convincing, the practical application of stochastic, time-dependent parameters has been very limited due to identifiability problems and the computational burden of propagating time-dependent parameters in a rigorous Bayesian framework <xref ref-type="bibr" rid="bib1.bibx70" id="paren.57"/>. With the recent gain in popularity of differentiable models, the idea of dynamic parameters (albeit in a deterministic setting) has experienced a significant revival in hydrological modeling.</p>
      <p id="d2e728">At runtime, the LSTM runs for the entire length of a sequence of inputs and predicts the conceptual model's parameters at every time step. These predictions are made in “sequence-to-sequence” mode. After this initial run, the operation of the model resembles a traditional hydrological model with the distinction being that the model reads a new set of parameters at every time step along with it's inputs, therefore the parameters of the model vary in time. Due to the initial run of the LSTM and the warm-up period of the hydrological model, all hybrid models in this paper use a sequence length of <inline-formula><mml:math id="M10" display="inline"><mml:mn mathvariant="normal">730</mml:mn></mml:math></inline-formula> d (2 years) with only the second half of the predictions (<inline-formula><mml:math id="M11" display="inline"><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mi mathvariant="normal">sim</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>) evaluated in the loss function. Furthermore, instead of evaluating the model at each unique selection of <inline-formula><mml:math id="M12" display="inline"><mml:mn mathvariant="normal">365</mml:mn></mml:math></inline-formula> time steps, we limit the number of evaluations to <inline-formula><mml:math id="M13" display="inline"><mml:mn mathvariant="normal">450</mml:mn></mml:math></inline-formula> chosen randomly, meaning that the loss function is calculated using <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:mn mathvariant="normal">365</mml:mn><mml:mo>⋅</mml:mo><mml:mn mathvariant="normal">450</mml:mn><mml:mo>=</mml:mo><mml:mn mathvariant="normal">164</mml:mn><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mn mathvariant="normal">250</mml:mn></mml:mrow></mml:math></inline-formula> values of <inline-formula><mml:math id="M15" display="inline"><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mi mathvariant="normal">sim</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mi mathvariant="normal">obs</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>. For more details of the evaluation process, please refer to <xref ref-type="bibr" rid="bib1.bibx3" id="text.58"/>.</p>
</sec>
<sec id="Ch1.S2.SS2.SSS3">
  <label>2.2.3</label><title>Performance evaluation</title>
      <p id="d2e816">Hybrid models are typically trained to make deterministic predictions. Therefore, depending on the case, we use either the mean-squared error (MSE) or the basin average Nash-Sutcliffe efficiency (NSE<sup>*</sup>) <xref ref-type="bibr" rid="bib1.bibx55" id="paren.59"/> as loss functions to evaluate the simulations of our model <inline-formula><mml:math id="M18" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi mathvariant="normal">sim</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> with the observed data <inline-formula><mml:math id="M19" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi mathvariant="normal">obs</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>.

                  <disp-formula specific-use="gather" content-type="numbered"><mml:math id="M20" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E3"><mml:mtd><mml:mtext>3</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="normal">MSE</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mo>⋅</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mo>(</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mi>i</mml:mi><mml:mi mathvariant="normal">obs</mml:mi></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mi>i</mml:mi><mml:mi mathvariant="normal">sim</mml:mi></mml:msubsup><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E4"><mml:mtd><mml:mtext>4</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msup><mml:mi mathvariant="normal">NSE</mml:mi><mml:mo>*</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>B</mml:mi></mml:mfrac></mml:mstyle><mml:mo>⋅</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>b</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>B</mml:mi></mml:munderover><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mo>⋅</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msubsup><mml:mi>y</mml:mi><mml:mi>i</mml:mi><mml:mi mathvariant="normal">obs</mml:mi></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mi>i</mml:mi><mml:mi mathvariant="normal">sim</mml:mi></mml:msubsup></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>b</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula></p>
      <p id="d2e1005">In Eq. (<xref ref-type="disp-formula" rid="Ch1.E3"/>), <inline-formula><mml:math id="M21" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> identifies the predictions and observations on a specific day and <inline-formula><mml:math id="M22" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> the number of days over which to calculate the loss function. In the case of NSE<sup>*</sup>, <inline-formula><mml:math id="M24" display="inline"><mml:mi>B</mml:mi></mml:math></inline-formula> is the number of mini-batches in a training batch (typically <inline-formula><mml:math id="M25" display="inline"><mml:mn mathvariant="normal">256</mml:mn></mml:math></inline-formula>) and the additional term <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> in Eq. (<xref ref-type="disp-formula" rid="Ch1.E4"/>) represents the standard deviation of the streamflow time-series of the basin <inline-formula><mml:math id="M27" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M28" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> is a numerical stabilizer added for cases where <inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is low.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Entropy-based measure of LSTM-induced parameter variability</title>
      <p id="d2e1095">For our evaluation, we aim to measure how much the LSTM makes the conceptual model's parameters vary over time to achieve an optimized performance during training. The underlying premise is that, when using a perfect model, constant true parameter values can be found during optimization, and the “LSTM-induced variability” will be zero. If the conceptual constraint is sufficiently honored by the LSTM, we expect a mild or null variability in the predicted timeseries of parameter values. In contrast, if a severely wrong representation of the true system is used as conceptual model, the LSTM will compensate through highly time-dependent parameter values, and the variability in parameters will be high.</p>
      <p id="d2e1098">This analysis can also be extended to the hidden states of the LSTM network itself. As examples, in Sects. <xref ref-type="sec" rid="Ch1.S3.SS3"/> and <xref ref-type="sec" rid="Ch1.S4.SS3"/>, we examine cases where this extension is necessary to compare models with different numbers of parameters. In Sect. <xref ref-type="sec" rid="Ch1.S4.SS5"/> we also look at models with different numbers of inputs.</p>
      <p id="d2e1107">Although there are several measures of variability, we choose to measure this variability through entropy, as it does not require any assumptions about the type or shape of the statistical distribution of the analyzed data. For analyzing the entropy of timeseries data, we have to evaluate continuous (differential) entropy <xref ref-type="bibr" rid="bib1.bibx23 bib1.bibx62" id="paren.60"/> as shown in Eq. (<xref ref-type="disp-formula" rid="Ch1.E5"/>) with <inline-formula><mml:math id="M30" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> denoting probability density functions (PDFs) of a random variable <inline-formula><mml:math id="M31" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> with support <inline-formula><mml:math id="M32" display="inline"><mml:mi mathvariant="script">X</mml:mi></mml:math></inline-formula>.

            <disp-formula id="Ch1.E5" content-type="numbered"><label>5</label><mml:math id="M33" display="block"><mml:mrow><mml:mi>H</mml:mi><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mo>-</mml:mo><mml:munder><mml:mo movablelimits="false">∫</mml:mo><mml:mi mathvariant="script">X</mml:mi></mml:munder><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo><mml:mi>log⁡</mml:mi><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo><mml:mi mathvariant="normal">d</mml:mi><mml:mi>x</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p id="d2e1184"><xref ref-type="bibr" rid="bib1.bibx13" id="text.61"/> provide a comprehensive overview of different common approaches to estimating differential entropy from data. In this study, we will use the method proposed by Kozachenko and Leonenko (KL) based on nearest neighbor distances <xref ref-type="bibr" rid="bib1.bibx51" id="paren.62"/> shown in Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>):

            <disp-formula id="Ch1.E6" content-type="numbered"><label>6</label><mml:math id="M34" display="block"><mml:mrow><mml:mover accent="true"><mml:mi>H</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi mathvariant="italic">ψ</mml:mi><mml:mo>(</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:mi mathvariant="italic">ψ</mml:mi><mml:mo>(</mml:mo><mml:mi>k</mml:mi><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:mi>log⁡</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>(</mml:mo><mml:mi>d</mml:mi><mml:mo>)</mml:mo><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mi>d</mml:mi><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mi>log⁡</mml:mi><mml:mo>(</mml:mo><mml:msubsup><mml:mi mathvariant="italic">ρ</mml:mi><mml:mi>k</mml:mi><mml:mi>d</mml:mi></mml:msubsup><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M35" display="inline"><mml:mi mathvariant="italic">ψ</mml:mi></mml:math></inline-formula> is the digamma function <inline-formula><mml:math id="M36" display="inline"><mml:mrow><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mi>d</mml:mi><mml:mrow><mml:mi mathvariant="normal">d</mml:mi><mml:mi>z</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mi>log⁡</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="normal">Γ</mml:mi><mml:mo>(</mml:mo><mml:mi>z</mml:mi><mml:mo>)</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M37" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> is the number of points in a sample, <inline-formula><mml:math id="M38" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> is a hyperparameter specifying the number of nearest neighbors used in the estimate, <inline-formula><mml:math id="M39" display="inline"><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>(</mml:mo><mml:mi>d</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the volume of a <inline-formula><mml:math id="M40" display="inline"><mml:mi>d</mml:mi></mml:math></inline-formula>-dimensional unit ball, <inline-formula><mml:math id="M41" display="inline"><mml:mi>d</mml:mi></mml:math></inline-formula> is the number of dimensions of the data and <inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">ρ</mml:mi><mml:mi>k</mml:mi><mml:mi>d</mml:mi></mml:msubsup><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the distance between <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and its <inline-formula><mml:math id="M44" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>th nearest neighbor. The KL estimator for entropy has been shown to be accurate even for data in higher dimensions <xref ref-type="bibr" rid="bib1.bibx84" id="paren.63"/> and was implemented as part of the <uri>https://github.com/manuel-alvarez-chaves/unite_toolbox</uri> (last access: 20 January 2026), a suite of tools we have developed for practical applications of information theory in model evaluation, which can be found in the  code availability section of this article.</p>
</sec>
<sec id="Ch1.S2.SS4">
  <label>2.4</label><title>Diagnostic routine to evaluate hybrid model structure</title>
      <p id="d2e1429">Analyzing the variability in parameter or hidden-state space highlights cases where the prescribed conceptual constraint fails to accurately reflect the underlying system dynamics. Such discrepancies fall into two categories: cases where the physics are appropriate but other reasons make the model struggle (e.g., biased or highly uncertain input data), and cases where the physics constraint itself is problematic (e.g., due to neglected or misrepresented processes). To distinguish between these cases and gain insights into system understanding and model development, we propose a tailored diagnostic evaluation routine that scrutinizes the joint behavior of the LSTM-learned parameters. We demonstrate the effectiveness and diagnostic capabilities of this approach through didactic examples in Sect. <xref ref-type="sec" rid="Ch1.S3"/>.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Didactic examples illustrating the proposed workflow</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Motivation</title>
      <p id="d2e1450">Synthetic examples here serve to create intuition about the role of the data-driven component in hybrid hydrological models. Specifically, we will demonstrate the role of LSTMs predicting time-variant parameters of conceptual hydrological models. We aim to answer the following questions: <list list-type="order"><list-item>
      <p id="d2e1455">How does the data-driven component behave in presence of a perfect conceptual constraint (i.e., the physics of the data-generating process are fully reflected in the conceptual model)?</p></list-item><list-item>
      <p id="d2e1459">How much variability in the LSTM-predicted parameter values will be detectable if the conceptual constraint is reasonable, but not a complete representation of the data-generating process?</p></list-item><list-item>
      <p id="d2e1463">How will the data-driven component react if the conceptual constraint is not reflecting the data-generating process at all?</p></list-item></list></p>
      <p id="d2e1466">Specific details for the experimental set up of these didactic examples are described in Appendix <xref ref-type="sec" rid="App1.Ch1.S1.SS1"/>. The main point to highlight here is that all models were trained using mean-squared error (MSE) as the loss function (Eq. <xref ref-type="disp-formula" rid="Ch1.E3"/>). The reason is that the data used for <inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi mathvariant="normal">obs</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> was created synthetically by running an initial “true” model; therefore, there was no need to account for differences in the magnitude of the streamflow signals between basins.</p>
      <p id="d2e1484">As stated in Sect. <xref ref-type="sec" rid="Ch1.S1.SS1"/>, the main principle driving conceptual hydrological models is the conservation of mass in different reservoirs or storages within a model. Following Eq. (<xref ref-type="disp-formula" rid="Ch1.E1"/>), in a simple case of one storage, conservation of mass can be written as <inline-formula><mml:math id="M46" display="inline"><mml:mrow><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mi mathvariant="normal">d</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mo>-</mml:mo><mml:mi>Q</mml:mi><mml:mo>-</mml:mo><mml:mi mathvariant="normal">ET</mml:mi></mml:mrow></mml:math></inline-formula>, with <inline-formula><mml:math id="M47" display="inline"><mml:mi>P</mml:mi></mml:math></inline-formula> being the input (precipitation) and ET and <inline-formula><mml:math id="M48" display="inline"><mml:mi>Q</mml:mi></mml:math></inline-formula> being two outputs (evapotranspiration and output streamflow, respectively). Let us assume a model that represents a typical power reservoir in which the storage-outflow relationship is described by the power function <inline-formula><mml:math id="M49" display="inline"><mml:mrow><mml:mi>Q</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:msup><mml:mi>S</mml:mi><mml:mi mathvariant="italic">β</mml:mi></mml:msup></mml:mrow><mml:mi>k</mml:mi></mml:mfrac></mml:mstyle></mml:mrow></mml:math></inline-formula>, where <inline-formula><mml:math id="M50" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M51" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> are model parameters with an additional parameter <inline-formula><mml:math id="M52" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula> being used as a correction factor for the output flux of ET. This model and its governing equations are shown in Fig. <xref ref-type="fig" rid="F2"/>a. Using the precipitation and evapotranspiration time-series of a subset of basins in CAMELS-GB (cf. Sect. <xref ref-type="sec" rid="Ch1.S2.SS1"/>) and parameter values <inline-formula><mml:math id="M53" display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.8</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1.2</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M55" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">24.0</mml:mn></mml:mrow></mml:math></inline-formula>, we create a synthetic “observed” streamflow time-series that is shown across all plots in Fig. <xref ref-type="fig" rid="F2"/> (subplots b, d, f, i, l and o). This initial model is considered our “synthetic truth” because it was used to generate the target data (“observed” streamflow) for the competing formulations of hybrid models described in Sect. <xref ref-type="sec" rid="Ch1.S3.SS2"/>.</p>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e1625">Didactic examples, demonstrating the evaluation of hybrid hydrological models by measuring the entropy of the model parameters and the LSTM hidden-state space. Left column: Schematic illustration of hybrid model structures, with Model 1 representing the “true” conceptual physically constrained model coupled with the LSTM as a reference. Center column: Segment of observed/predicted discharge time-series. Right column: Time-series of LSTM-predicted parameters and their univariate distributions.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f02.png"/>

        </fig>

      <p id="d2e1634">We will analyze the resulting time-varying parameter values of the alternative hybrid models, as predicted by their respective LSTM-component, and interpret these results given our knowledge of the true model structure in Sect. <xref ref-type="sec" rid="Ch1.S3.SS3.SSS1"/>. Then, we will explain how we measure variability as the entropy of the resulting parameter distributions in Sect. <xref ref-type="sec" rid="Ch1.S3.SS3.SSS2"/>, and why we move to measuring the “activity” of the LSTM in its hidden state space in Sect. <xref ref-type="sec" rid="Ch1.S3.SS3.SSS3"/>. We summarize the key points of our proposed approach, as illustrated on these didactic examples, in Sect. <xref ref-type="sec" rid="Ch1.S3.SS4"/>.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Hybrid models</title>
      <p id="d2e1653">To investigate the three research questions posed above, we setup an LSTM model as our data-driven benchmark and four alternative hybrid models to predict the time-series of observed discharge, illustrated in the left column of Fig. <xref ref-type="fig" rid="F2"/>: <list list-type="order"><list-item>
      <p id="d2e1660">We use an LSTM to directly predict streamflow from the inputs of precipitation and evapotranspiration (Model 0);</p></list-item><list-item>
      <p id="d2e1664">We couple the “true” model defined above with an LSTM network to predict its parameters <inline-formula><mml:math id="M56" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M57" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M58" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>, as described in Sect. <xref ref-type="sec" rid="Ch1.S2.SS2.SSS2"/> (Model 1);</p></list-item><list-item>
      <p id="d2e1691">We substitute the power-reservoir with a linear reservoir that follows the storage-outflow relationship <inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:mi>Q</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mi>S</mml:mi><mml:mi>k</mml:mi></mml:mfrac></mml:mstyle></mml:mrow></mml:math></inline-formula>, and add a threshold parameter <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">max</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> such that any excess storage directly becomes streamflow <inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:mi>Q</mml:mi><mml:mo>=</mml:mo><mml:mo>(</mml:mo><mml:mi>S</mml:mi><mml:mo>-</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">max</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> if <inline-formula><mml:math id="M62" display="inline"><mml:mrow><mml:mi>S</mml:mi><mml:mo>≥</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mi mathvariant="normal">max</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (Model 2);</p></list-item><list-item>
      <p id="d2e1760">We add an additional reservoir to Model 1 which receives the outflow of the previous reservoir <inline-formula><mml:math id="M63" display="inline"><mml:mrow><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mi mathvariant="normal">d</mml:mi><mml:msub><mml:mi>S</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow><mml:mrow><mml:mi mathvariant="normal">d</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>=</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and both reservoirs have a linear storage-outflow relationship (Model 3);</p></list-item><list-item>
      <p id="d2e1799">We extend the storage-outflow relationship of Model 1 with an additional threshold parameter <inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> that reflects the minimum storage required to generate streamflow (Model 4).</p></list-item></list></p>
      <p id="d2e1813">In Fig. <xref ref-type="fig" rid="F2"/>: Model 0 represents a case where we only have an LSTM which predicts streamflow, i.e. a purely data-driven model. Then, based on the distinction between structures and processes, we have categorized each hybrid model according to its architectural design and process representation. Model 2 maintains the correct one-reservoir architecture of the true model but implements an incorrect process representation by substituting the true exponential outflow relationship with a simple linear relationship. Model 3 deviates from the true model in both aspects: it uses the same incorrect linear outflow relationship while also incorporating an additional storage reservoir that doesn't exist in the true model. Model 4 preserves the correct architecture of the true model but becomes overparameterized in its process representation by introducing an extra parameter, <inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>. Interestingly, when <inline-formula><mml:math id="M66" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> is set to zero, Model 4's process representation perfectly aligns with the outflow relationship in the true model. We explore these relationships further in Sect. <xref ref-type="sec" rid="Ch1.S3.SS3.SSS1"/>.  Additional exemplary model architectures are presented in Appendix <xref ref-type="sec" rid="App1.Ch1.S2"/>.</p>
      <p id="d2e1844">The LSTM architecture of the baseline model and the hybrid models consists of ten hidden states. For our entropy analysis of the hidden states to be meaningful and fair, it is important to compare models with the same architecture. The choice of ten hidden states was determined by the minimum required for both the baseline model and hybrid models to achieve equal performance. To aid in this process, the models were trained on a subset of five randomly selected basins (76005, 83004, 46008, 50008, and 96001) from the CAMELS-GB dataset. In general, using multiple basins improved the training process for all models, particularly for the pure LSTM (Model 0), validating current standard practices <xref ref-type="bibr" rid="bib1.bibx56" id="paren.64"/>. However, the purpose of this analysis was not to achieve maximum performance for a given task but to compare hybrid approaches on equal grounds. We base our entropy analysis on equal performance to ensure fair statements about the role of the conceptual component in hybrid models. Additionally, to allow for extensive repetitions and alterations, we deliberately kept the training effort low (unlike the real-world case study; see Sect. <xref ref-type="sec" rid="Ch1.S4"/>).</p>
      <p id="d2e1852">The selection of these specific basins for this example is not critical. In our true model, we have defined a data-generating process that does not consider basin-specific characteristics, meaning that the models could be trained on any set of basins. The only requirement is that basins have sufficiently long time series of precipitation and evapotranspiration data, which is satisfied by all CAMELS-GB basins. We used only the precipitation and pet time series from each basin and created our own synthetic “observed” streamflow as described in Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/> for model training. The train/test split followed the approach detailed in Sect. <xref ref-type="sec" rid="Ch1.S2.SS1"/>. Parameter variation ranges for the conceptual model components are shown in Table <xref ref-type="table" rid="TA1"/>. Since the target is the product of a model, there was no need to adjust the loss function for specific data characteristics; therefore, we chose MSE (Eq. <xref ref-type="disp-formula" rid="Ch1.E3"/>) as the loss function. Each model was trained for a specific number of epochs using model-specific learning rates. We refer readers to the synthetic example logs for detailed specifications of each model. The reported testing metrics are averaged over five realizations of each model obtained from random initializations using different seeds.</p>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Analysis and discussion of results</title>
<sec id="Ch1.S3.SS3.SSS1">
  <label>3.3.1</label><title>Visualization of time-varying parameters</title>
      <p id="d2e1878">For illustration purposes, we show a short five-month period (November 2004 to April 2005, Cumbria and Carlisle floods of 2005 in the UK <xref ref-type="bibr" rid="bib1.bibx43" id="paren.65"/>) in Fig. <xref ref-type="fig" rid="F2"/> to demonstrate the ability of all models to perfectly fit the data both during high and low flow conditions. The center column shows the predicted streamflow by an exemplary run of the hybrid model, and the right column shows the corresponding parameter trajectories. The reported numerical values of NSE and entropy are for the whole testing period of 1 January   1998 to 31 December 2008, and averaged over the five runs based on different random seeds for initialization. The density plots in the right column of Fig. <xref ref-type="fig" rid="F2"/> were created using a kernel-based estimate for density <xref ref-type="bibr" rid="bib1.bibx79" id="paren.66"/>, but the reported individual entropies of each parameter and the joint entropy of the model parameters were calculated using the KL estimator described in Sect. <xref ref-type="sec" rid="Ch1.S2.SS3"/>.</p>
      <p id="d2e1893"><italic>Pure LSTM (Model 0).</italic> For this case we see that the pure LSTM is able to make accurate predictions, perfectly fitting the observed data. As such, this model serves as our baseline and any additional knowledge should make prediction easier (reduce entropy) or more difficult (increase entropy).</p>
      <p id="d2e1898"><italic>Perfect physics constraint (Model 1).</italic> In the case of Model 1, where the LSTM is coupled to the true conceptual model, we hope to see that the data-driven component does nothing, i.e., it doesn't interfere with the perfect representation of the natural system that is provided by the conceptual constraint. Indeed, we find that the network predicts practically static parameters as shown in Fig. <xref ref-type="fig" rid="F2"/>g, with almost negligible deviations only resulting from the effect of the sequential nature of the LSTM. Reassuringly, the LSTM is able to recover the true parameter values of <inline-formula><mml:math id="M67" display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.8</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:mi mathvariant="italic">β</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1.2</mml:mn></mml:mrow></mml:math></inline-formula>, and <inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">24.0</mml:mn></mml:mrow></mml:math></inline-formula>. As a logical consequence, this hybrid model is able to perfectly mimic the observations with an NSE of 1.0, as they were created with the same conceptual model and parameter values.</p>
      <p id="d2e1941"><italic>Imperfect physics constraint (Models 2 and 3).</italic> The behavior of the time-varying parameters is expected to differ when the LSTM is coupled to a conceptual model that does not adequately represent truth. Subplots (j) and (m) of Fig. <xref ref-type="fig" rid="F2"/> illustrate the behavior of the parameters when the conceptual component of the hybrid model has been incorrectly specified. In these cases, we can see how the LSTM varies the parameters in order to achieve good predictions despite an imperfect conceptual model (i.e., the LSTM compensates for model structural error). This behavior is apparent in the variation of the recession constants for Model 2 (<inline-formula><mml:math id="M70" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>) and Model 3 (<inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M72" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>). In situations of low flow, the recession constants increase, whereas for situations of high flow, the reverse is true.</p>
      <p id="d2e1978"><italic>Over-parameterized constraint (Model 4).</italic> In the case of an over-parameterized conceptual model, the role of the data-driven component is somewhat unclear. All parameters might be tweaked simultaneously in a manner that changes over time, to achieve a best-possible fit with the observed data. Such a case would presumably spoil any attempts to interpret the inner functioning of the hybrid model. However, in this case, we observe that the parameters of Model 4 (Fig. <xref ref-type="fig" rid="F2"/>p) are optimized to have almost constant values. In fact, the LSTM is able to correctly identify that the threshold parameter <inline-formula><mml:math id="M73" display="inline"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> is not meaningful in predicting the output variable, so it is efficiently driven to a value of <inline-formula><mml:math id="M74" display="inline"><mml:mn mathvariant="normal">0.0</mml:mn></mml:math></inline-formula>. By doing so, the LSTM transforms the prescribed constraint in the form of the over-parameterized conceptual model into an architecture that is equivalent with the true one. This allows the LSTM to identify the true values of the other three parameters.</p>
      <p id="d2e2003">In Appendix <xref ref-type="sec" rid="App1.Ch1.S2"/>, we present four additional hybrid model versions that cover one under-parameterized case (Model 5 lacks the parameter <inline-formula><mml:math id="M75" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula>), and three over-parameterized cases of different types (concerning model structure and parameters). The insights from these scenarios match what we have reported for the three broad classes above: the under-parameterized model struggles with the effect that parameters are heavily varied, while the LSTM in over-parameterized models produces almost static parameters in a combination that counteracts the over-parameterization best.</p>
</sec>
<sec id="Ch1.S3.SS3.SSS2">
  <label>3.3.2</label><title>Measuring entropy of conceptual model parameter space</title>
      <p id="d2e2023">To quantify the variability of LSTM-predicted parameter values over time, we aggregate all individual values into a sample. These samples are shown as distributions in subplots (g), (j), (m) and (p) of Fig. <xref ref-type="fig" rid="F2"/>. Wide distributions result for cases where parameters vary significantly over time, and very narrow distributions for cases of almost static behavior. We can quantify the entropy of the joint distribution of the parameters by using Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>) as described in Sect. <xref ref-type="sec" rid="Ch1.S2.SS3"/>, with entropy being larger for wide distributions and lower for narrow distributions. Let it be noted that we are calculating the entropy of the parameters predicted by the neural network which occupy a range of values from <inline-formula><mml:math id="M76" display="inline"><mml:mn mathvariant="normal">0</mml:mn></mml:math></inline-formula> to <inline-formula><mml:math id="M77" display="inline"><mml:mn mathvariant="normal">1</mml:mn></mml:math></inline-formula> as shown in the right-hand side of the right-most subplots in Fig. <xref ref-type="fig" rid="F2"/>, such that the measurements of entropy are not affected by the scale of the parameters. Hence, these values occupy a range of values of 1, and considering that the maximum entropy (of a uniform distribution) over this value range is <inline-formula><mml:math id="M78" display="inline"><mml:mn mathvariant="normal">0.0</mml:mn></mml:math></inline-formula>, the calculated entropies are negative (with more negative meaning smaller entropy which equals smaller variability).</p>
      <p id="d2e2056"><italic>Perfect physics constraint.</italic> Comparing the entropies obtained for Models 1, 2 and 3, we can confirm that Model 1 (LSTM coupled to the true conceptual model) shows the lowest entropy. In a theoretical ideal case, the LSTM would have been able to perfectly recover the true values of <inline-formula><mml:math id="M79" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M80" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M81" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula> without any variation in time at all, and that would lead to a theoretical entropy of <inline-formula><mml:math id="M82" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula> (but this is an unrealistic expectation given the difficult task required of the LSTM, and numerical imprecisions). Nevertheless, the variations of the parameters are very small and thus also the calculated entropy is significantly smaller than for Models 2 and 3.</p>
      <p id="d2e2092"><italic>Imperfect physics constraint.</italic> We find that Model 2 generates less entropy than Model 3, which means that the conceptual model in Model 2 better represents the true model underlying the observed data (while definitely being further from the truth than Model 1). In this sense, the proposed entropy measure can be considered to represent “closeness” of a model's representation of the true system.</p>
      <p id="d2e2097"><italic>Over-parameterized constraint.</italic> Measuring the entropy of the parameters for Model 4 distorts this result. As Model 4 permits a parameter configuration that makes the model equal to Model 1, the predicted parameters of the LSTM are again almost constant, and the calculated joint entropy is even lower than for Model 1. Note that this is a special case of an over-parametrized model. In Appendix <xref ref-type="sec" rid="App1.Ch1.S2"/>, and particularly in Fig. <xref ref-type="fig" rid="FB1"/>g we show an example of an over-parametrized model in which the true model is not findable.</p>
      <p id="d2e2107"><italic>Comparing Conceptual Constraints on the Entropy Axis.</italic> To gain more intuition about how our hybrid models are ranked based on entropy, we place them all (including the ones presented in Appendix <xref ref-type="sec" rid="App1.Ch1.S2"/>) on the same entropy axis (Fig. <xref ref-type="fig" rid="F3"/>).</p>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e2118">Benchmarking axis based on the entropy of the time-varying parameters in the different hybrid models (didactic examples).</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f03.png"/>

          </fig>

      <p id="d2e2127">We would expect to find Model 1 furthest to the left in Fig. <xref ref-type="fig" rid="F3"/>, because the LSTM has nothing to adjust, so parameters are practically constant over time and their joint entropy is minimal. However, we see that is not the case and, among the models discussed in this section, it is Model 4 which creates the lowest entropy.</p>
      <p id="d2e2132">This is an artifact of comparing entropies in different dimensions. As an example, consider <inline-formula><mml:math id="M83" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> to be a random variable that follows a multivariate Gaussian distribution, i.e. <inline-formula><mml:math id="M84" display="inline"><mml:mrow><mml:mi>X</mml:mi><mml:mo>∼</mml:mo><mml:mi mathvariant="script">N</mml:mi><mml:mo>(</mml:mo><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">Σ</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> with <inline-formula><mml:math id="M85" display="inline"><mml:mrow><mml:mi mathvariant="italic">μ</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mi>d</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M86" display="inline"><mml:mrow><mml:mi mathvariant="normal">Σ</mml:mi><mml:mo>∈</mml:mo><mml:msup><mml:mi mathvariant="double-struck">R</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mo>×</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>. The entropy of <inline-formula><mml:math id="M87" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> is then given as:

              <disp-formula id="Ch1.E7" content-type="numbered"><label>7</label><mml:math id="M88" display="block"><mml:mrow><mml:mi>H</mml:mi><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mi>d</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle><mml:mi>log⁡</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">π</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:mfenced><mml:mo>+</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle><mml:mi>log⁡</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:mi mathvariant="normal">det</mml:mi><mml:mfenced open="(" close=")"><mml:mi mathvariant="normal">Σ</mml:mi></mml:mfenced></mml:mrow></mml:mfenced></mml:mrow></mml:math></disp-formula></p>
      <p id="d2e2256">We can see that the entropy of <inline-formula><mml:math id="M89" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> is directly proportional to the determinant of <inline-formula><mml:math id="M90" display="inline"><mml:mi mathvariant="normal">Σ</mml:mi></mml:math></inline-formula>. If we add a single dimension to <inline-formula><mml:math id="M91" display="inline"><mml:mi mathvariant="normal">Σ</mml:mi></mml:math></inline-formula> with a very low value on the main diagonal (a timeseries of almost-constant values will have close-to-zero variance) and all off-diagonal entries being practically zero, the value of entropy tends to decrease because the decrease of the second term through multiplication of the original determinant with a value smaller than one tends to outweigh the increase of the first term.</p>
      <p id="d2e2281">There are two additional cases which show lower entropy due to the number of parameters in their conceptual models (Models 6 and 8) and one further example which has entropy close to Model 4 because it shares a similar inflow-outflow relationship (Model 7). The issues with these results are explained further in Appendix <xref ref-type="sec" rid="App1.Ch1.S2"/>. While explainable through theory, this ranking is counter-intuitive and does not meet our expectations for a metric that unifies the evaluation of arbitrary hybrid models. We have illustrated these results here to allow the reader to follow our argument and move with us deeper into the hybrid models, i.e., into the LSTM hidden-state space.</p>
</sec>
<sec id="Ch1.S3.SS3.SSS3">
  <label>3.3.3</label><title>Measuring entropy of LSTM hidden state space</title>
      <p id="d2e2294">To overcome the challenge of appropriately comparing the “activity” of the LSTM for models with differing numbers of parameters, we propose that the entropy of the coupled system should not be measured in the space of the parameters but in the space of the hidden states of the LSTM instead. Because all of the networks in this example have the same number of hidden states (10) which move in the same range of values (<inline-formula><mml:math id="M92" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> to <inline-formula><mml:math id="M93" display="inline"><mml:mn mathvariant="normal">1</mml:mn></mml:math></inline-formula> due to the tanh function in the operations in the network), the calculated entropies will be comparable between themselves.</p>
      <p id="d2e2314">The entropy values obtained for the hidden state spaces of all four models are reported in the left column of Fig. <xref ref-type="fig" rid="F2"/>. The hidden states of the LSTM in Model 1 have smaller variations than in the rest of the models, and thus the entropy of this network is the lowest among all candidates. This measure of variability has an even more intuitive interpretation as how much the LSTM has to compensate for a misspecified conceptual constraint.</p>
      <p id="d2e2319"><italic>Comparing Conceptual Constraints on the Entropy Axis.</italic> When placing the models on our universal entropy axis in Fig. <xref ref-type="fig" rid="F4"/>, Model 1 now appears furthest to the left, which meets our expectation that the true constraint should coincide with minimal “activity” of the LSTM. We also see the same ranking between Model 2 and Model 3, which again makes intuitive sense, as using a one-reservoir-model better matches the true system. Finally, rearranging by the entropies of the LSTMs, Model 4 is now to the right of Model 1, which identifies it as a misspecified conceptual model, but honors that the resulting hybrid configuration is very close to the true system, as opposed to the proposed configurations in Models 2 and 3. Hence, measuring the entropy of the LSTM hidden states prevents us from disingenuous conclusions obtained by making unfair comparisons between models with different number of parameters.</p>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e2329">Benchmarking axis based on the entropy of the trajectories of the LSTM hidden states in the different hybrid models and the pure LSTM (didactic examples). The division of the green and red backgrounds serves to identify the addition of “good” and “bad” constraints, respectively.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f04.png"/>

          </fig>

      <p id="d2e2338"><italic>Pure LSTM as a Reference.</italic> One more advantage of measuring the entropy directly in the hidden states of the LSTM is that any hybrid model can now be compared to a single “pure” LSTM, i.e., an LSTM with a simple linear head layer instead of the conceptual model. The addition of a conceptual head layer should make the prediction task of the LSTM easier – at least this is the prevailing idea when promoting “physics-informed” ML. In our setting, adding useful information through the conceptual constraint should reduce the required activity of the LSTM, and hence, entropy. If, by contrast, the conceptual constraint made the task even more difficult, it would add entropy. Marking the pure LSTM as Model 0, we can create a divide on our axis between models that add “good” (helpful) physics (here: Models 1 and 4), and models which add “bad” (misleading) physics (here: Models 2 and 3). In addition, Models 5, 6, 7 and 8 are discussed in Appendix <xref ref-type="sec" rid="App1.Ch1.S2"/>, where it is shown that they also fall consistently in these categories of “good” and “bad”.</p>
      <p id="d2e2346"><italic>On the Complexity of the Prediction Task.</italic> The LSTM by itself can be seen as a baseline of the required complexity for accomplishing the prediction task. The proposed measure of entropy can be related to the overall complexity of the network as the entropy of the trajectories of the states in dynamical systems has been related to their Kolgomorov complexity <xref ref-type="bibr" rid="bib1.bibx38" id="paren.67"/>. In theory, if the true model is specified as the conceptual head layer, the entropy of the LSTM is reduced to the theoretical minimum (<inline-formula><mml:math id="M94" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula>) and the required entropy or complexity to accomplish the specific modeling task is completely contributed by the conceptual head layer. Hence, the entropy of the conceptual head layer in Model 1 should be exactly the same as the entropy of the pure LSTM (Model 0), but measuring the entropy of the conceptual head layer by itself is not straight-forward and remains an open challenge.</p>
</sec>
</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title>Summary of the proposed approach</title>
      <p id="d2e2373">Let us distill our proposed approach as a diagnostic framework that discerns the adequacy of conceptual constraints in hybrid models. When the prescribed conceptual model accurately represents the natural system, the LSTM will exhibit minimal intervention, effectively endorsing the conceptual model. Conversely, when the conceptual constraint fundamentally misrepresents the system dynamics, the LSTM will demonstrate high activity, working extensively to overcome the inherent limitations of the prescribed conceptual model. This difference in LSTM activity serves as a clear signal for assessing the fidelity of our initial conceptual model.</p>
      <p id="d2e2376">These situations can be detected by the following proposed workflow: <list list-type="order"><list-item>
      <p id="d2e2381">Visualize the time-sequence of LSTM-predicted parameters to gain insight about how heavily the data-driven component acts against the physics constraint; draw conclusions about compensation mechanisms and judge whether the physics constraint is sufficiently honored or massively altered in the hybrid model.</p></list-item><list-item>
      <p id="d2e2385">Quantify the joint entropy of the LSTM hidden state space trajectories; compare against a pure LSTM for the prediction task for reference, and, ideally, against alternative formulations of conceptual constraints by placing all resulting entropies on the universal model evaluation axis.</p></list-item><list-item>
      <p id="d2e2389">Interpret the results: are the conceptual components of the hybrid models an advantage or a burden in solving the prediction task? Which configurations are more helpful than others? Try to understand why from step 1. Over-parameterization will tend to be helpful but with some parameters driven to “unphysical” values; under-parameterization will make the task unnecessarily difficult.</p></list-item></list></p>
      <p id="d2e2392">From the analysis of the didactic examples, we specifically want to highlight that the constraint-morphing capability of the data-driven component is both an opportunity and a risk: it is very promising to see that the flexibility of the LSTM is not abused, but rather it points us towards parsimonious model structures (as in Model 4). At the same time, this constraint-morphing happens under the hood (e.g., resulting NSE is practically the same for all our analyzed model versions!) – it is not safe to say that a hybrid model naturally satisfies the constraint we have prescribed. As such, we should be careful with stating that a model is “physics-constrained” before investigating in detail what the final version of the LSTM is doing. This is where our proposed diagnostic routine helps.</p>
      <p id="d2e2395">Even though in this section we focused on cases of equal performance, in Sect. <xref ref-type="sec" rid="Ch1.S4"/> and more specifically Sect. <xref ref-type="sec" rid="Ch1.S4.SS3.SSS3"/> we analyze a case study with real data where no true model exists and our proposed hypotheses for hybrid models yield different results in terms of both predictive performance and entropy. It is also important to note that the issue of uncertainty is not addressed by these synthetic examples because the output of the true model, and therefore our observed data, were unaffected by noise. Our measurement of entropy could certainly be part of a larger and more comprehensive framework that includes both epistemic and aleatory uncertainty <xref ref-type="bibr" rid="bib1.bibx40" id="paren.68"/> and probabilistic model representations, but such a framework is beyond the scope of this article. Nevertheless, any measurement of entropy will always contain a fraction attributable to the intrinsic chaos of data, which becomes particularly relevant when transitioning from synthetic to real-world applications. Interestingly, equifinality did not pose an issue with synthetic data in our experiments, as all models achieved perfect predictive performance and the model was always identifiable under the right conditions. This matches the experience of <xref ref-type="bibr" rid="bib1.bibx74" id="text.69"/>. However, in real-world applications, equifinality is likely to be more pronounced due to measurement errors, incomplete observations of the system under study, and other sources of uncertainty. This issue is discussed further in Sect. <xref ref-type="sec" rid="Ch1.S4.SS3.SSS2"/>.</p>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Case study: CAMELS-GB</title>
      <p id="d2e2419">Following the intuition developed by the didactic examples, we apply our developed metric to a case study in large sample hydrology using the CAMELS-GB dataset.</p>
      <p id="d2e2422">Both the pure LSTM and the LSTMs coupled with the conceptual models have 64 hidden states each, which makes them directly comparable between themselves. All models were trained using the Adam optimizer <xref ref-type="bibr" rid="bib1.bibx49" id="paren.70"/> with a learning rate of <inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and a different number of epochs depending on the model, with the number of epochs always ranging between 28 and 32. The ranges allowed for the parameters of the conceptual models are listed in Table <xref ref-type="table" rid="TA2"/>, the static attributes used as input to the LSTM in all models are listed in Table <xref ref-type="table" rid="TA3"/>.</p>
      <p id="d2e2450">Further details about the study setup are presented in Appendix <xref ref-type="sec" rid="App1.Ch1.S1.SS2"/> but this analysis follows the results from <xref ref-type="bibr" rid="bib1.bibx3" id="text.71"/>, so we first summarize their main findings to put these new results into context. The meticulous reader will notice some differences in the results between the previous study and these current results. These differences are discussed in Appendix <xref ref-type="sec" rid="App1.Ch1.S3"/> and do not impact the main findings in either study.</p>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Motivation or: why do we want hybrid models?</title>
      <p id="d2e2469">Hybrid models have demonstrated significant improvements in hydrological predictions across multiple applications. These include enhanced accuracy in daily streamflow prediction <xref ref-type="bibr" rid="bib1.bibx48 bib1.bibx30" id="paren.72"/>, better predictions in large basins <xref ref-type="bibr" rid="bib1.bibx16" id="paren.73"/>, and more precise estimates of variables like volumetric water content <xref ref-type="bibr" rid="bib1.bibx10" id="paren.74"/> and stream water temperature <xref ref-type="bibr" rid="bib1.bibx68" id="paren.75"/>, as some examples. In each case, the hybrid approach outperformed traditional physics-based conceptual models, including the EXP-Hydro and HBV models, the Muskingum-Cunge river routing method, and a partial-differential-equation-based description of the physical process, respectively. However, while these improvements are of note and leaving aside aspects of lacking interpretability, the central question of “to bucket or not to bucket” was: given the remarkable success of purely data-driven approaches, is the additional effort of combining them with conceptual models actually worth it?</p>
      <p id="d2e2484"><xref ref-type="bibr" rid="bib1.bibx3" id="text.76"/> conducted a model comparison study that evaluated four different approaches: a purely data-driven LSTM and three conceptual hydrological models, each later transformed into hybrids through the process described in Sect. <xref ref-type="sec" rid="Ch1.S2.SS2.SSS2"/>. The three conceptual models: SHM, adapted from <xref ref-type="bibr" rid="bib1.bibx29" id="text.77"/>, Bucket, and Nonsense represent different hypotheses of the hydrological system. Among these, SHM is a conventional hydrological model suitable for practical applications, while Bucket and Nonsense serve as contrasting cases: Bucket being an oversimplified representation and Nonsense incorporating physically implausible assumptions.</p>
      <p id="d2e2494">We evaluate the streamflow prediction performance of these seven models using the Cumulative Density Function (CDF) to aggregate model performance across all 671 basins in the dataset. Figure <xref ref-type="fig" rid="F5"/> presents these results, while Table <xref ref-type="table" rid="T1"/> provides key metrics derived from the CDF analysis. The two considered metrics are the median NSE, which corresponds to the CDF's middle quantile (the higher the better), and the “area under the curve” (AUC). The AUC serves as a summary metric where lower values indicate better performance, because the AUC becomes minimal if NSE only takes on maximum values <xref ref-type="bibr" rid="bib1.bibx39" id="paren.78"/>.</p>

      <fig id="F5"><label>Figure 5</label><caption><p id="d2e2507">Comparison of model performance between conceptual models with static parameters (dashed lines), hybrid models with dynamic parameters (solid lines), and the pure LSTM for all CAMELS-GB basins.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f05.png"/>

        </fig>

<table-wrap id="T1"><label>Table 1</label><caption><p id="d2e2519">Comparison of model performance quantified by area under the NSE curve (AUC) and median NSE.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Model</oasis:entry>
         <oasis:entry colname="col2">AUC</oasis:entry>
         <oasis:entry colname="col3">Median NSE</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">LSTM</oasis:entry>
         <oasis:entry colname="col2">0.123</oasis:entry>
         <oasis:entry colname="col3">0.865</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">SHM</oasis:entry>
         <oasis:entry colname="col2">0.267</oasis:entry>
         <oasis:entry colname="col3">0.747</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Hybrid SHM</oasis:entry>
         <oasis:entry colname="col2">0.216</oasis:entry>
         <oasis:entry colname="col3">0.839</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Bucket</oasis:entry>
         <oasis:entry colname="col2">0.395</oasis:entry>
         <oasis:entry colname="col3">0.582</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Hybrid Bucket</oasis:entry>
         <oasis:entry colname="col2">0.147</oasis:entry>
         <oasis:entry colname="col3">0.852</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Nonsense</oasis:entry>
         <oasis:entry colname="col2">0.477</oasis:entry>
         <oasis:entry colname="col3">0.511</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Hybrid Nonsense</oasis:entry>
         <oasis:entry colname="col2">0.265</oasis:entry>
         <oasis:entry colname="col3">0.801</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e2636">Figure <xref ref-type="fig" rid="F5"/> demonstrates the effect of combining conceptual models with LSTM networks. The effect is visible as a drastic shift to the right from the dashed lines (purely conceptual models) to the solid lines (their hybrid counterparts). This improvement is further quantified in Table <xref ref-type="table" rid="T1"/>, where the metrics consistently show improved performance for hybrid versions compared to their original counterparts.</p>
      <p id="d2e2643">Despite these improvements, our results show that incorporating conceptual models did not exceed the performance of a pure LSTM approach (in Fig. <xref ref-type="fig" rid="F5"/>, the LSTM appears farthest to the right). Interestingly, performance improves most when hybridizing the oversimplified Bucket model, and this hybrid model matches the LSTM performance most closely. Intuitively, one might have expected the LSTM's flexibility to help the Nonsense model most, followed by the Bucket model, and finally the SHM model. Furthermore, one might have expected that after hybridization, the Hybrid SHM would perform best and exceed the pure LSTM. Instead, what we observe suggest that the SHM constraint actually limits hybrid performance, adding a Bucket-type constraint is more successful, and that none of these constraints improve prediction skill over the LSTM baseline.</p>
      <p id="d2e2648">These findings raise several urgent questions: <list list-type="bullet"><list-item>
      <p id="d2e2653">Why do apparently “bad” physics allow for better hybrid performance than “good” physics?</p></list-item><list-item>
      <p id="d2e2657">What can we conclude from hybrid performance after all if it does not reflect process fidelity?</p></list-item><list-item>
      <p id="d2e2661">Do physics get in the way of successful data-driven modeling?</p></list-item></list></p>
      <p id="d2e2665">We note that one untouched advantage of the hybrid approach lies in its ability to directly derive unobserved variables, such as soil water equivalent (SWE), without requiring secondary models. Hence, we wish to provide modelers with tools to obtain satisfying answers to these questions and to better inform and justify hybrid modeling in future research and practice.</p>
</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Performance on individual basins</title>
      <p id="d2e2676">To better understand the mechanisms of these hybrid models and their impact on model performance, we will investigate the prediction task for five individual basins in detail. These five basins were carefully chosen to facilitate discussion in this section, as they demonstrate cases in which all hybrid models achieve similar performance (as in Sect. <xref ref-type="sec" rid="Ch1.S3"/>) while having different rankings based on our proposed entropy metric. In Sect. <xref ref-type="sec" rid="Ch1.S4.SS3.SSS3"/> we draw statistical conclusions about the prevailing behaviors for all basins.</p>
      <p id="d2e2683">Figure <xref ref-type="fig" rid="F6"/> exemplarily shows five basins where the performance gap between hybrid and non-hybrid versions is again very clear. However, these basins all share the characteristic that all models, including the deliberately implausible Nonsense model, reach very similar performance when hybridized. This seems counterintuitive in several aspects and again supports the research questions we have formulated above, as we would have expected to see differences in performance among the hybrid models depending on the constraints imposed. Does the LSTM truly not care what the conceptual constraint is as it can effectively transform any constraint into the same end product?</p>

      <fig id="F6"><label>Figure 6</label><caption><p id="d2e2690">Comparison of model performance between conceptual models with static parameters,  hybrid models with dynamic parameters, and the pure LSTM for individual CAMELS-GB basins.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f06.png"/>

        </fig>

      <p id="d2e2700">Furthermore, we would have expected (hoped?) that at least the physics-plausible constraint of SHM would have helped solve the prediction task, yet this is only marginally true for basins 5003 and 41025, which show slightly higher performance for the Hybrid SHM model. Confusingly, in the specific case of basin 5003,  all constraints (physics-plausible or not) seem to help. Overall, Fig. <xref ref-type="fig" rid="F6"/> highlights the urgent need for diagnostic analysis tools that help us understand what it actually means to constrain a data-driven model with a conceptual hydrological model and how much physics remain inside.</p>
      <p id="d2e2705">Since we are in a real-data setup now, there is no “true” model or constraint that we could use as a reference for minimal entropy on our evaluation axis. We will therefore seek the LSTM component that produces the least entropy. Our main anchor will be the performance of the pure LSTM, dividing between meaningful added knowledge and misguided assumptions that require compensation by the LSTM.</p>
</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Analysis and interpretation of entropy diagnostics</title>
<sec id="Ch1.S4.SS3.SSS1">
  <label>4.3.1</label><title>Measuring entropy of LSTM hidden state space</title>
      <p id="d2e2724">Following the intuition developed in Sect. <xref ref-type="sec" rid="Ch1.S3"/>, we address the questions in the previous section through an entropy analysis of the LSTM's hidden states for the prediction of the five individual basins introduced above.</p>
      <p id="d2e2729">Figure <xref ref-type="fig" rid="F7"/> shows the calculated entropy during the testing period for both the pure LSTM and the hybrid models. This is equivalent to the entropy axis we had introduced in our didactic examples, with the pure LSTM marking the divide between “good” and “bad” constraints. Overall, we find that the ranking varies per basin: in some cases (basins 23008, 18014, 41025), the pure LSTM shows by far the lowest entropy and hence none of the constraints can be considered useful for predicting streamflow at these basins; for the other basins, at least some conceptual constraints proved helpful, in basin 5003 even  all of them.</p>

      <fig id="F7"><label>Figure 7</label><caption><p id="d2e2736">Entropy of the trajectories of the LSTM hidden states in the different hybrid models and the pure LSTM for individual CAMELS-GB basins. The division of green and red backgrounds matches that of Fig. <xref ref-type="fig" rid="F4"/>.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f07.png"/>

          </fig>

      <p id="d2e2748">Focusing on basin 5003, the observed ranking aligns with our expectation that SHM is the only plausible and hence most useful constraint. This suggests that SHM's imposed structure reasonably reflects the natural system, effectively transferring part of the entropy to the physics-based component. However,  any conceptual hydrological model reduces the network's entropy compared to the pure LSTM, even the Nonsense model, which opposes our expectation that this constraint should not be useful. Notably, Hybrid Nonsense shows even lower entropy than Hybrid Bucket, indicating that some complexity is necessary and a too simple conceptual model offers little benefit.</p>
      <p id="d2e2751">Basin 73014 presents a counterexample where Hybrid Nonsense performs best (produces the least entropy) and Hybrid SHM is located to the right of the LSTM, suggesting that a plausible hydrological model can even make predictions more difficult. This finding highlights the unpredictability of hybrid models and the need for deeper investigations to achieve interpretability; simply imposing a constraint does not do the job.</p>
</sec>
<sec id="Ch1.S4.SS3.SSS2">
  <label>4.3.2</label><title>Visualization of time-varying parameters</title>
      <p id="d2e2762">In this section, we demonstrate the power of visually analyzing the time-series of the LSTM-predicted parameters on the example of exploring why the Nonsense model creates the least entropy for basin 73014. This analysis illustrates how our entropy-based metric contributes to a broader evaluation framework where models are assessed not only by quantitative metrics but also by a qualitative evaluation of their behaviour <xref ref-type="bibr" rid="bib1.bibx42" id="paren.79"/>.</p>
      <p id="d2e2768">Figure <xref ref-type="fig" rid="F8"/> (top panel) compares the differences between observed and simulated streamflow values for the non-hybrid and hybrid versions of the Nonsense model in this basin. The Hybrid Nonsense model shows drastically improved predictions, represented by the solid line in the streamflow plot, compared to its non-hybrid counterpart (dashed line). That means that allowing the parameters to vary over time greatly improves the ability of this model to make accurate predictions.</p>

      <fig id="F8" specific-use="star"><label>Figure 8</label><caption><p id="d2e2775">Differences between simulated streamflow, states and model parameters of the Nonsense and Hybrid Nonsense models for basin 73014. Both the states and model parameters are shown on a scale relative to their mean.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f08.png"/>

          </fig>

      <p id="d2e2785">To better understand the adjustments made by the LSTM, let us first look at the original structure of the Nonsense model, which is considered physically implausible due to the arrangement of its hydrological storage units (see schematic illustration in Fig. <xref ref-type="fig" rid="F1"/>c). Counter-intuitively, water from direct precipitation or snowmelt initially enters the model through the baseflow storage, typically considered the unit with the longest retention time. The model then routes water through an unusual sequence: it moves next to the interflow storage which once again has a longer residence time, then it passes into the unsaturated zone, loses some mass through evapotranspiration, and is finally transformed into the streamflow output. Ignoring their physical interpretation for a moment, the Nonsense model basically consists of a series of three storages connected sequentially, forming a cascade-like arrangement that essentially transforms the storages into a dampening function, which delays the input signal.</p>
      <p id="d2e2790">The adjustments made to this implausible model structure by the LSTM component of Hybrid Nonsense become apparent when examining its states and parameters (bottom left and right panel in Fig. <xref ref-type="fig" rid="F8"/>). To simplify interpretation, all plots have been normalized by the mean value of the corresponding state or parameter over the analyzed period. This normalization sets the mean value to 1.0 on the plot, with the lines indicating deviations from the mean. However, that doesn't mean that parameters for Nonsense and Hybrid Nonsense have similar or even values that are close to each other. For example, the value of <inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> for Nonsense is <inline-formula><mml:math id="M97" display="inline"><mml:mn mathvariant="normal">398.6</mml:mn></mml:math></inline-formula> mm while the mean value for Hybrid Nonsense is <inline-formula><mml:math id="M98" display="inline"><mml:mn mathvariant="normal">83.5</mml:mn></mml:math></inline-formula> mm. This reflects a typical behaviour in these hybrid models were the   parameters for the non-hybrid and hybrid model can occupy drastically different ranges.</p>
      <p id="d2e2825">Analyzing the static parameters in the Nonsense model, the dampening behavior of the storages becomes evident: the dashed lines for the baseflow storage sb and the interfow storage si closely resemble the output hydrograph, but are dampened too strongly in the unsaturated zone storage su. However, this behaviour changes significantly when the model becomes Hybrid Nonsense. Specifically, the line for sb becomes horizontal, indicating that the parameter <inline-formula><mml:math id="M99" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is being used to effectively “skip” this storage. In fact, the Hybrid Nonsense model modifies su and si to behave as time lags for the input rainfall to become outflow which ultimately is mostly managed by the interactions between  sb and <inline-formula><mml:math id="M100" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. In Fig. <xref ref-type="fig" rid="F8"/> we see that for the high flow peak that happens in 2005-01, <inline-formula><mml:math id="M101" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is increased disproportionately just so that the model can match the peak based on the volume available in  sb.</p>
      <p id="d2e2863">The solid lines for  su and <inline-formula><mml:math id="M102" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> reveal a distinct pattern in which <inline-formula><mml:math id="M103" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> closely tracks the value of  su. This behavior is tied to the conditional property of the storage: if  <inline-formula><mml:math id="M104" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>u</mml:mi></mml:msub><mml:mo>&gt;</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, any excess runoff added to  su is immediately outputted. Because <inline-formula><mml:math id="M105" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> consistently mirrors su, any additional runoff into this storage is immediately converted to simulated streamflow, once again, effectively bypassing this storage. In addition, the outflow of  su is also managed by <inline-formula><mml:math id="M106" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula> which appears to be anticorrelated with  si/<inline-formula><mml:math id="M107" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to match the shape of the observed hydrograph. As a result, the Hybrid Nonsense model essentially functions as a single-storage system with added lagging behavior. This lag is introduced by the sequential transfer of mass between the storages, which occurs one at a time during each time step.</p>
      <p id="d2e2956">The imposed structure of the original Nonsense model was effectively modified by the LSTM, transforming the overcomplicated but physically implausible model into something that more closely resembles the Bucket model, with some additional flexibility guided by the characteristics of the training data. Since we did not impose specific constraints on the storage behavior, apart from limits to the parameters, the LSTM discovered an optimized architecture that, in combination with the data-driven component, works just as well as any of the other constraints. It seems that the modified Nonsense structure is significantly more suitable than the oversimplified Bucket model, presumably because it allows for just the right amount of additional freedom. Interestingly, morphing the structure of the Nonsense constraint costs the LSTM less effort (entropy) than fighting against (arguably) more adequate but too rigid constraints such as the SHM or the Bucket conceptual models – this is important to keep in mind when interpreting the results of our entropy analysis. High entropy clearly indicates struggling caused by the imposed constraint; low entropy paired with unaffected parameters means a plausible constraint, whereas low entropy paired with suspicious time-varying patterns that alter the qualitative behavior of the states means overwriting of constraints in favor of something more efficient; something that can potentially still be meaningful, as we have uncovered here, and also from the over-parameterization cases in our didactic examples (so, there is hope).</p>
      <p id="d2e2959">Figure <xref ref-type="fig" rid="F9"/> presents the same analysis period for the SHM and Hybrid SHM models in basin 73014. Due to its larger structure, interpretation becomes more challenging, but we observe some of the same behaviors identified in the analysis of Hybrid Nonsense. The LSTM determines that some of the additional storage compartments in SHM are unnecessary, as it does not utilize  sb and instead regulates outflow through the fast-flow (sf) and si storages. Furthermore, minimal changes in si suggest that most of the outflow is directed through sf.</p>

      <fig id="F9" specific-use="star"><label>Figure 9</label><caption><p id="d2e2967">Differences between simulated streamflow, states and model parameters of the SHM and Hybrid SHM models for basin 73014. Both the states and model parameters are shown on a scale relative to their mean.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f09.png"/>

          </fig>

      <p id="d2e2976">The high variability in the LSTM-controlled parameters for the remaining reservoirs may show a learned behavior from other basins in the dataset. Such adaptation appears unnecessary for this particular basin, as the predictions made by the non-Hybrid SHM model were already sufficiently accurate and modifications made by the LSTM improved performance only slightly. Ultimately, this leads to the Hybrid SHM model being penalized, placing it last in our ranking. As a final note, that the “intervention” of the LSTM was not obvious from comparing the hydrographs produced by SHM and Hybrid SHM and without the analysis proposed here, one would think that Hybrid SHM is a well-constrained hybrid model that respects the assumptions formulated in SHM – which is not at all the case, as we have shown here.</p>
      <p id="d2e2979">To return to the point of equifinality made in Sect. <xref ref-type="sec" rid="Ch1.S3.SS4"/>, as we have seen in this section, different hybrid model configurations may achieve similar predictive performance while exhibiting varying levels of entropy in the LSTM hidden state space and modifications to their internal behavior. We argue that high variability in parameter combinations represents an undesirable condition in terms of model structure specification. High entropy aligns with this perspective and, in general, entropy can be used to distinguish between equifinal models.</p>
</sec>
<sec id="Ch1.S4.SS3.SSS3">
  <label>4.3.3</label><title>Statistical analysis of results for all basins</title>
      <p id="d2e2992">In the previous section, we analyzed specific results from five basins in the dataset because their results mirror those of our controlled examples in Sect. <xref ref-type="sec" rid="Ch1.S3"/>. We can extend this analysis to all basins in the dataset to comment on their results based solely on entropy, though we acknowledge that our constraint of equal performance in the comparison does not hold, as is clear from Fig. <xref ref-type="fig" rid="F5"/>. Nevertheless, while acknowledging that this constraint is not fulfilled, the insights derived in this section are meaningful to the overall understanding of hybrid models.</p>
      <p id="d2e2999">One could develop a model selection criterion that considers both performance and entropy. In fact, there has been previous research on model selection considering computational complexity and model performance <xref ref-type="bibr" rid="bib1.bibx9" id="paren.80"/>. However, our purpose here is not to introduce a criterion for model selection but to understand the role of conceptual constraints in hybrid models using entropy as a diagnostic tool.</p>
      <p id="d2e3005">In Fig. <xref ref-type="fig" rid="F7"/>, the most common pattern across basins is shown by basins 23008, 18014, and 41025, where the LSTM consistently has the lowest entropy while the other hybrid models show non-consistent rankings. It appears that their ranking is determined by the specific hydrological system being modeled and the required model complexity. We therefore analyze here the overall statistics and rankings of entropy across all basins.</p>
      <p id="d2e3010">The violin plots in Fig. <xref ref-type="fig" rid="F10"/> show the entropy distributions of each model, with median values of <inline-formula><mml:math id="M108" display="inline"><mml:mrow><mml:mi mathvariant="normal">−</mml:mi><mml:mn mathvariant="normal">151.2</mml:mn></mml:mrow></mml:math></inline-formula> nats for the LSTM, <inline-formula><mml:math id="M109" display="inline"><mml:mrow><mml:mi mathvariant="normal">−</mml:mi><mml:mn mathvariant="normal">128.2</mml:mn></mml:mrow></mml:math></inline-formula> nats for Hybrid Nonsense, <inline-formula><mml:math id="M110" display="inline"><mml:mrow><mml:mi mathvariant="normal">−</mml:mi><mml:mn mathvariant="normal">113.1</mml:mn></mml:mrow></mml:math></inline-formula> nats for Hybrid SHM, and <inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:mi mathvariant="normal">−</mml:mi><mml:mn mathvariant="normal">111.0</mml:mn></mml:mrow></mml:math></inline-formula> nats for Hybrid Bucket. The LSTM's wider dispersion highlights the varying complexity required to model each individual basin. The fact that the LSTM is the only model to cover much of the lower entropy range demonstrates that, in most cases, introducing a conceptual constraint substantially increases the modeling challenge. This is apparent for Hybrid Bucket, which has the highest entropy distribution, meaning that such a simple conceptual constraint is not helpful in most cases and the LSTM has to make the most effort to compensate for this overly rigid constraint.</p>

      <fig id="F10"><label>Figure 10</label><caption><p id="d2e3058">Violin plots of the entropy of the trajectories of the LSTM hidden states in the different hybrid models and the pure LSTM across all CAMELS-GB basins.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f10.png"/>

          </fig>

      <p id="d2e3067">Surprisingly, Hybrid Nonsense has the lowest median entropy among the hybrid models. This goes against an initial hypothesis that a researcher might have; yet, as our analysis in Sect. <xref ref-type="sec" rid="Ch1.S4.SS3.SSS2"/> has shown, the Nonsense model lends itself to being most easily transformed by the LSTM into a more suitable structure that can predict streamflow well. Finally, Hybrid SHM did not fulfill our expected result, as it seems that this model is overly complex for this specific dataset and the LSTM has more trouble using it to its advantage than the Nonsense model.</p>
      <p id="d2e3073">To analyze this in more detail, we collected the individual rankings per basin for the whole dataset, identified the unique rankings that appear, and determined their frequency of occurrence. The counts of rankings are shown in Fig. <xref ref-type="fig" rid="F11"/>.</p>

      <fig id="F11" specific-use="star"><label>Figure 11</label><caption><p id="d2e3080">Counts of the different entropy-based model ranking outcomes across all CAMELS-GB basins. To limit the length of the label, the shorter conceptual names of the hybrid models were used but the counts are for the hybrid versions of these models.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f11.png"/>

          </fig>

      <p id="d2e3089">The ranking suggested by the medians in Fig. <xref ref-type="fig" rid="F10"/> (entropy of LSTM being lowest, followed by Hybrid Nonsense, Hybrid SHM, and Hybrid Bucket) reflects most frequent ranking across all basins (67 %). Aggregating all those basins, for which the pure LSTM obtains the lowest entropy, leads to 91 % of all basins. This tells us that, in general over this particular large-sample dataset, the conceptual representations used in our hybrid models were not able to make the prediction task easier for the LSTM and the prior knowledge that we tried to enforce didn't help. Ultimately we got a hybrid model that predicted well not because of the physical constraints that we imposed, but because the LSTM was able to compensate for these constraints through added effort (entropy).</p>
      <p id="d2e3095">Although our previous statement is the main finding of this study, we are still able to identify specific catchments for which the added prior knowledge indeed helped. In Fig. <xref ref-type="fig" rid="F11"/> there are 11 basins (1.6 %) which show the LSTM at the top, meaning that any of the conceptual models added information that helped in prediction; in 8 out of 671 cases (1.2 %), Hybrid SHM showed the lowest entropy, meaning that the constraint that a hydrologist would perceive as the most plausible and useful actually turned out to need the least compensation by the data-driven component. And only in 1 out of 671 cases, the physically least plausible Nonsense model needed the most compensation.</p>
</sec>
</sec>
<sec id="Ch1.S4.SS4">
  <label>4.4</label><title>What if only individual parameters are dynamic?</title>
      <p id="d2e3109">Although our experimental settings have been deliberately kept consistent with <xref ref-type="bibr" rid="bib1.bibx3" id="text.81"/>, it is not necessary for every model parameter to be dynamic, as in the cases examined so far, for our method to work. Rather, it could be scientifically interesting to examine the role of individual parameters in “fighting” the imposed constraints. To illustrate the diagnostic capability of the proposed entropy analysis for this question, we examine a variant of the SHM model in which only the parameter <inline-formula><mml:math id="M112" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula> in the unsaturated zone reservoir is made dynamic, while the remaining seven parameters remain static.</p>

      <fig id="F12" specific-use="star"><label>Figure 12</label><caption><p id="d2e3124"><bold>(a)</bold> Model performance for the LSTM, SHM (all parameters static), Hybrid SHM (all parameters dynamic), and SHM <inline-formula><mml:math id="M113" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula> dynamic, models across all CAMELS-GB basins. <bold>(b)</bold> Violin plots of the entropy of the trajectories of the LSTM hidden states in the different hybrid models and the pure LSTM across all CAMELS-GB basins.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f12.png"/>

        </fig>

      <p id="d2e3145">Figure <xref ref-type="fig" rid="F12"/>a shows the NSE CDF curves for the LSTM, SHM (fully static), Hybrid SHM (all parameters dynamic), and this new SHM <inline-formula><mml:math id="M114" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula>-dynamic variant. The single-parameter dynamic model shows a slight performance increase compared to the fully static conceptual model, but does not match the performance of the model with all dynamic parameters.</p>
      <p id="d2e3159">Figure <xref ref-type="fig" rid="F12"/>b presents the entropy distributions for these models. The fully static conceptual model has no entropy and therefore is not shown. Notably, the single dynamic-parameter variant exhibits significantly higher entropy than even the Hybrid SHM model with all parameters dynamic. This observation is consistent with our interpretation of entropy as a measure of LSTM activity: when the LSTM must compensate for model misspecification through only one degree of freedom instead of eight, as in Hybrid SHM, its activity (and thus entropy) increases substantially without proportional performance gains. In the SHM <inline-formula><mml:math id="M115" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula>-dynamic case, the LSTM attempts to correct the entire conceptual model through a single point of entry. By contrast, in the Hybrid SHM case, the LSTM makes smaller adjustments across multiple model components, resulting in higher performance and lower entropy.</p>
      <p id="d2e3171">This example demonstrates that entropy can serve as a diagnostic for identifying how the neural network component compensates for structural inadequacies in the conceptual model as represented by individual parameters. This suggests the possibility of systematically diagnosing individual components by selectively making parameters dynamic or static, with entropy guiding the process toward a model representation that more realistically reflects the natural system.</p>
</sec>
<sec id="Ch1.S4.SS5">
  <label>4.5</label><title>What about other approaches to hybrid modeling?</title>
      <p id="d2e3182">Measuring the entropy of the data-driven component of a hybrid model works particularly well in our setup because of the tight coupling between the LSTM and the conceptual hydrological model through the parameters of the latter. Nevertheless, our suggested approach can be effectively applied to other types of hybrid models or physics-informed machine learning.</p>
      <p id="d2e3185">As one example of an alternative approach to physics-informed machine learning, post-processing the results of a hydrological model to improve its predictions has been suggested <xref ref-type="bibr" rid="bib1.bibx65 bib1.bibx35" id="paren.82"/>. For this application, a traditional hydrological model with static parameters makes an initial run to predict streamflow, and the predictions as well as the states of the hydrological model are fed to an LSTM to make improved predictions of streamflow. The approach is successful in the sense that it improves predictions of streamflow and manages to move all performance CDF curves close, but not beyond, the LSTM as shown in Fig. <xref ref-type="fig" rid="F13"/>. Note that both the LSTM and post-processing LSTMs use the same number of hidden nodes (64), making our approach and comparison still applicable.</p>

      <fig id="F13"><label>Figure 13</label><caption><p id="d2e3196">Comparison of model performance between conceptual models with static parameters (dashed lines), post-processing hybrid models with dynamic parameters (filled lines), and the pure LSTM for all CAMELS-GB basins.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f13.png"/>

        </fig>

      <p id="d2e3206">The violin plot of entropies for the LSTM hidden states across all basins shown in Fig. <xref ref-type="fig" rid="F14"/> suggests a different conclusion than for our previous hybrids. It seems that the LSTM is largely “unimpressed” by the additional input, no matter which model it was created by; only the Nonsense model (of all things!) seems to be able to effectively reduce the effort required for the prediction task, meaning that there is some pre-processing that this particular model does that is actually helpful. Figure <xref ref-type="fig" rid="F15"/> shows a much more mixed bag of results where, for certain specific basins, any of the conceptual models might produce an output that reduces the entropy of the LSTM. Considering that feeding the model “Nonsense” is helpful in close to 80 % of all basins should again be an impressive warning that feeding physics-based model output to a data-driven model is not necessarily physically meaningful (in that case, we would expect the LSTM to have a harder time with nonsense outputs). This confirms previous findings in literature which suggest that post-processing a conceptual model is not a good method to make “physics-informed” machine learning <xref ref-type="bibr" rid="bib1.bibx65 bib1.bibx35" id="paren.83"/>.</p>

      <fig id="F14"><label>Figure 14</label><caption><p id="d2e3218">Violin plots of the entropy of the trajectories of the LSTM hidden states in the different post-processing hybrid models and the pure LSTM across all CAMELS-GB basins.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f14.png"/>

        </fig>

      <fig id="F15" specific-use="star"><label>Figure 15</label><caption><p id="d2e3229">Counts of the different entropy-based model ranking outcomes across all CAMELS-GB basins. To limit the length of the label, the shorter conceptual names of the post-processing models were used but the counts are for the hybrid versions of these models.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f15.png"/>

        </fig>

      <p id="d2e3238">Since post-processing conceptual models do not allow for scrutinizing conceptual states or parameters, as with hybrid models, we cannot perform our detailed analysis as shown above (Figs. <xref ref-type="fig" rid="F8"/> and <xref ref-type="fig" rid="F9"/>). Hence, interpretation of the impact of the physics-based input (hard to call this a constraint) requires interpretation of the LSTM hidden states. This is a current line of research in its own right <xref ref-type="bibr" rid="bib1.bibx33 bib1.bibx17" id="paren.84"><named-content content-type="pre">e.g.,</named-content></xref>, and goes beyond the scope of this study. It will be interesting to explore what the contribution of “Nonsense” is that seems to simplify the prediction task for the LSTM, while physically-meaningful outputs as LSTM inputs do not necessarily help.</p>
</sec>
<sec id="Ch1.S4.SS6">
  <label>4.6</label><title>Summary of findings from the case study</title>
      <p id="d2e3258">In this case study, we compared a pure LSTM model with three hybrid hydrological models based on the CAMELS-GB large-sample data set. Overall, we found that the LSTM outperformed the hybrid models in predicting streamflow, and our entropy analysis revealed that adding physics-based constraints generally did not simplify the prediction task.</p>
      <p id="d2e3261">Our analysis also showed that the LSTM effectively adjusts the constraints imposed by the conceptual model. For instance, Hybrid Nonsense is very different from its original Nonsense formulation with the LSTM identifying an optimized architecture that, when combined with a data-driven component, performs just as well as all other models. The degree of effort required for the LSTM to modify these constraints provides insight into how accurately the conceptual model represents the underlying system. This finding highlights a key opportunity for hybrid modeling: refining existing models to better suit specific sites based on training data characteristics. In essence, hybrid models can guide us toward more parsimonious model structures.</p>
      <p id="d2e3264">Notably, this process occurs entirely under the hood. If we had evaluated performance using only NSE, we might have mistakenly concluded that the Nonsense constraint was just as valid as SHM or Bucket, since all three achieved the same performance when paired with the LSTM. These results overwhelmingly suggest that we need to reconsider our ways of building and evaluating hybrid models. Even if “just” the parameters of conceptual hydrological models are modified by the data-driven component, the resulting hybrid models function differently than what we expect by imposing mass balance equations.</p>
      <p id="d2e3267">While our findings might be specific to the particular dataset and model candidates used in this study, we have provided an objective method to test our hypotheses in arbitrary other scenarios. Future research should explore a wider range of datasets and hybrid model architectures to validate and extend our conclusions.</p>
      <p id="d2e3271">Finally, we have provided an outlook of how to apply our entropy-based analysis to only partially-dynamic hybrid models and even other types of hybrid construction. While there are differences between approaches that should be taken into consideration for an in-depth analysis of architectures, the evaluation of entropy in the LSTM hidden states already provides an objective insight that would have been obscured when only considering skill scores. Complementary analyses that target the specifics of other hybrid model architectures are left for future work.</p>
</sec>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <label>5</label><title>Conclusions</title>
      <p id="d2e3284">“Man is always prey to his truths. Once he has admitted them, he cannot free himself from them” <xref ref-type="bibr" rid="bib1.bibx19" id="paren.85"/>. The pursuit of a single, universal model to explain every hydrological system is fundamentally absurd. This paper challenges the hydrological community's tendency to rely on a single model, like SHM, as a comprehensive explanation for the complex dynamics of all river basins. Our work shows that SHM, or other conceptual models of its kind, is precisely the kind of rigid “truth” Camus warned us about: a single model that, now coupled with a component that learns from data, represents a seemingly straightforward explanation of every and any complex natural system.</p>
      <p id="d2e3290">The recognition of these limitations extends to the process of model selection itself. As observed by <xref ref-type="bibr" rid="bib1.bibx57" id="text.86"/>, “scientists work from models acquired through education and subsequent exposure to the literature, often without fully knowing what characteristics have elevated these models to the status of community paradigms”. This implicit acceptance of certain modeling approaches, while pragmatic, further highlights the tension between using an interpretable model and capturing the full complexity of real-world systems. Hybrid models acknowledge this tension by incorporating prior knowledge to achieve partial interpretability while accepting the residual complexity that remains unmodeled. However, our study suggests that this balance is often skewed in favor of the data-driven component. The use of a conceptual model as a structural prior represents an attempt to extract meaningful dynamics from a larger environmental system <xref ref-type="bibr" rid="bib1.bibx81" id="paren.87"/>, but as we have shown, this attempt is often forced. When the problem is relatively simple, such as predicting streamflow in our case study, conceptual prior knowledge is largely ignored in favor of a more flexible, data-driven structure, raising the question of whether it was necessary in the first place if only predictive capacity is considered.</p>
      <p id="d2e3299">Our primary contribution is a metric that quantifies how much prior knowledge contributes compared to a purely data-driven approach. In Sect. <xref ref-type="sec" rid="Ch1.S3"/>, we demonstrated how different conceptual models can be evaluated based on how closely their prescribed equations align with those governing the “true” system. Our key finding was that the hybrid model containing the “true” conceptual model required the least assistance from the LSTM (lowest entropy), while models with architectures very different from the true underlying process required the most assistance from the LSTM (highest entropy). This demonstrates that our entropy metric can distinguish between conceptual models that perform well for the right reasons versus those that achieve good performance just through compensation by a data-driven component, a distinction that traditional performance metrics cannot make.</p>
      <p id="d2e3304">Interestingly, measuring the “complexity” of the LSTM in the sense of measuring its “compensation activity” when coupled to a parsimonious but inadequate representation yields larger entropy than when coupled to a more flexible inadequate representation. This means two things: low entropy does not automatically mean that the constraint is honored and that it is realistic; and the LSTM seems to have more work to do to fight against something simple but wrong than to fiddle around with an arbitrary, flexible enough structure to make it work. We have seen this in the overparameterization cases in the didactic examples and with the nonsense model in the case study. So, high entropy means the LSTM is struggling because of a too rigid, inadequate constraint. Low entropy means that the LSTM is seeing something in the constraint; however, deeper analysis in the form of inspecting parameter and state trajectories are required to distinguish whether the constraint is deemed reasonable to predict the data (constant parameters), or whether it just lends itself well to be transformed into something new, parsimonious, and effective, which might even be physics-explainable and guide us towards a better representation of the true model. So, there is hope in hybrids; just in a different way than the community might have anticipated.</p>
      <p id="d2e3308">Additionally, we showed that a data-driven model can serve as a reference point, distinguishing between conceptual models that better approximate reality and those that do not. We propose that data-driven models should serve as the baseline for evaluating hybrid models, which allows us to determine whether incorporating prior knowledge, such as physics-based constraints, genuinely enhances predictive performance or simply adds unnecessary complexity.</p>
      <p id="d2e3311">Applying this metric to a large-scale case study revealed that our attempts to improve predictive capacity through hybridization were often unsuccessful. This was because the added knowledge rarely captured the true system dynamics, forcing the flexible component of our models to compensate for incorrect assumptions (Sect. <xref ref-type="sec" rid="Ch1.S4.SS3.SSS2"/>).</p>
      <p id="d2e3316">Beyond performance, there may be broader value of hybrid models with respect to interpretability and process fidelity. In that context, it is highly important to evaluate the degree to which physical constraints actually constrain model behavior: if the intended model structure that should guarantee interpretability is overwritten, this argument is no longer valid, or we might actually discover a better (in our case: simpler) process representation that again allows for scientific insight and learning. Additionally, correlation between a labeled storage component and external variables should not be the sole standard for evaluating hybrid models because, as we have shown, the original constraint that defined that label may no longer apply in the final model structure.</p>
      <p id="d2e3319">While we focus on entropy as our primary metric, we believe entropy complements, rather than replaces, other existing evaluation approaches focused on process representation and physical understanding. Furthermore, the compensation work of the LSTM should be related to aspects of achieved performance to provide a comprehensive basis for model selection, with the identification of appropriate ways to potentially combine these aspects into a single informative metric being left as an open research question.</p>
      <p id="d2e3322">In the future, hybrid models could become valuable tools for refining our understanding of hydrological systems, but only if we critically reassess traditional modeling practices. The fact that even a “nonsense” conceptual model demonstrated the highest potential for adding useful information in post-processing hybrids raises new questions. We hypothesize that physics constraints in the form of strict sequential processing may be too rigid and that guiding the LSTM toward learning appropriate lag functions or the entire hydrological model itself could be a more effective strategy.</p>
      <p id="d2e3325">Overall, our findings challenge the assumption that physics-informed machine learning necessarily preserves the physics as initially formulated. Instead, the data-driven component may restructure the imposed constraints, uncovering a more effective, potentially physics-explainable alternative. We do not oppose hybrid modeling; rather, we propose a quantitative tool to analyze how much the physics-based constraints are modified and suggest a workflow for diagnosing these structural adaptations. In the end, hybrid modeling, when paired with information-theoretic analyses, has the potential to provide valuable physical insights. Without such an approach, however, many so-called “physics-informed” models may be better described as physics-ignored.</p>
</sec>

      
      </body>
    <back><app-group>

<app id="App1.Ch1.S1">
  <label>Appendix A</label><title>Study setup</title>
      <p id="d2e3339">We provide further details of our experimental setup in this section, and point the interested reader towards a repository in the data repository of the University of Stuttgart (DaRUS) that contains all the code for this project including: training scripts, logs and archived netCDF files of the saved internal and hidden states, fluxes and predictions made by each model. The repository can be found at this link: <ext-link xlink:href="https://doi.org/10.18419/DARUS-4920" ext-link-type="DOI">10.18419/DARUS-4920</ext-link> <xref ref-type="bibr" rid="bib1.bibx6" id="paren.88"/>.</p>
<sec id="App1.Ch1.S1.SS1">
  <label>A1</label><title>Didactic examples</title>
      <p id="d2e3355">The ranges for the parameters of the conceptual models allowed in the didactic examples are listed in Table <xref ref-type="table" rid="TA1"/>.</p><table-wrap id="TA1"><label>Table A1</label><caption><p id="d2e3363">Ranges of the model parameters in the synthetic examples.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="1">Model</oasis:entry>

         <oasis:entry rowsep="1" colname="col2" morerows="1">Parameter</oasis:entry>

         <oasis:entry rowsep="1" namest="col3" nameend="col4" align="center">Limits </oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col3">Lower</oasis:entry>

         <oasis:entry colname="col4">Upper</oasis:entry>

       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="2">1</oasis:entry>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M116" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.5</oasis:entry>

         <oasis:entry colname="col4">2.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M117" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.1</oasis:entry>

         <oasis:entry colname="col4">2.0</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2"><inline-formula><mml:math id="M118" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">10.0</oasis:entry>

         <oasis:entry colname="col4">60.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="2">2</oasis:entry>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M119" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.5</oasis:entry>

         <oasis:entry colname="col4">2.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M120" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi mathvariant="normal">max</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">50.0</oasis:entry>

         <oasis:entry colname="col4">400.0</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2"><inline-formula><mml:math id="M121" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">1.0</oasis:entry>

         <oasis:entry colname="col4">100.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="2">3</oasis:entry>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M122" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.1</oasis:entry>

         <oasis:entry colname="col4">2.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M123" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">30.0</oasis:entry>

         <oasis:entry colname="col4">300.0</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2"><inline-formula><mml:math id="M124" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.1</oasis:entry>

         <oasis:entry colname="col4">40.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="3">4</oasis:entry>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M125" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.5</oasis:entry>

         <oasis:entry colname="col4">2.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M126" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.9</oasis:entry>

         <oasis:entry colname="col4">3.5</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M127" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.0</oasis:entry>

         <oasis:entry colname="col4">50.0</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2"><inline-formula><mml:math id="M128" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">1.0</oasis:entry>

         <oasis:entry colname="col4">60.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="1">5</oasis:entry>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M129" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.5</oasis:entry>

         <oasis:entry colname="col4">2.0</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2"><inline-formula><mml:math id="M130" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.1</oasis:entry>

         <oasis:entry colname="col4">300.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="3">6</oasis:entry>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M131" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.5</oasis:entry>

         <oasis:entry colname="col4">2.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M132" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.9</oasis:entry>

         <oasis:entry colname="col4">3.5</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M133" display="inline"><mml:mi mathvariant="italic">γ</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.5</oasis:entry>

         <oasis:entry colname="col4">2.0</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2"><inline-formula><mml:math id="M134" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">1.0</oasis:entry>

         <oasis:entry colname="col4">60.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="4">7</oasis:entry>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M135" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.1</oasis:entry>

         <oasis:entry colname="col4">2.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M136" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>,</mml:mo><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">1.0</oasis:entry>

         <oasis:entry colname="col4">200.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M137" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">100.0</oasis:entry>

         <oasis:entry colname="col4">400.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M138" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.1</oasis:entry>

         <oasis:entry colname="col4">2.0</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2"><inline-formula><mml:math id="M139" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">1.0</oasis:entry>

         <oasis:entry colname="col4">60.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col1" morerows="3">8</oasis:entry>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M140" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.5</oasis:entry>

         <oasis:entry colname="col4">2.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M141" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi mathvariant="normal">max</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">1.0</oasis:entry>

         <oasis:entry colname="col4">250.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M142" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.9</oasis:entry>

         <oasis:entry colname="col4">3.5</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M143" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">1.0</oasis:entry>

         <oasis:entry colname="col4">100.0</oasis:entry>

       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>


</sec>
<sec id="App1.Ch1.S1.SS2">
  <label>A2</label><title>CAMELS-GB</title>
      <p id="d2e3958">The parameter ranges for the conceptual models used in the CAMELS-GB case are listed in Table <xref ref-type="table" rid="TA2"/>. The parameter ranges shown in both Tables <xref ref-type="table" rid="TA1"/> and <xref ref-type="table" rid="TA2"/> were adapted from typical ranges found in the literature for the HBV model <xref ref-type="bibr" rid="bib1.bibx11 bib1.bibx12" id="paren.89"/>. For the synthetic examples, we simplified the parameter ranges to better suit the proposed examples. The static attributes used as input to the LSTM in all models are listed in Table <xref ref-type="table" rid="TA3"/>.</p>

<table-wrap id="TA2"><label>Table A2</label><caption><p id="d2e3975">Ranges of the model parameters for the CAMELS-GB case study.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="1">Model</oasis:entry>

         <oasis:entry rowsep="1" colname="col2" morerows="1">Parameter</oasis:entry>

         <oasis:entry rowsep="1" namest="col3" nameend="col4" align="center">Limits </oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col3">Lower</oasis:entry>

         <oasis:entry colname="col4">Upper</oasis:entry>

       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="7">SHM</oasis:entry>

         <oasis:entry colname="col2">dd</oasis:entry>

         <oasis:entry colname="col3">0.0</oasis:entry>

         <oasis:entry colname="col4">10.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">f_thr</oasis:entry>

         <oasis:entry colname="col3">10.0</oasis:entry>

         <oasis:entry colname="col4">60.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M144" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">20.0</oasis:entry>

         <oasis:entry colname="col4">700.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M145" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">1.0</oasis:entry>

         <oasis:entry colname="col4">6.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">perc</oasis:entry>

         <oasis:entry colname="col3">0.0</oasis:entry>

         <oasis:entry colname="col4">1.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M146" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>f</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.05</oasis:entry>

         <oasis:entry colname="col4">0.9</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M147" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.01</oasis:entry>

         <oasis:entry colname="col4">0.5</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2"><inline-formula><mml:math id="M148" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.001</oasis:entry>

         <oasis:entry colname="col4">0.2</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="1">Bucket</oasis:entry>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M149" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.002</oasis:entry>

         <oasis:entry colname="col4">1.0</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2">aux_ET</oasis:entry>

         <oasis:entry colname="col3">0.01</oasis:entry>

         <oasis:entry colname="col4">1.5</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col1" morerows="4">Nonsense</oasis:entry>

         <oasis:entry colname="col2">dd</oasis:entry>

         <oasis:entry colname="col3">0.0</oasis:entry>

         <oasis:entry colname="col4">10.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M150" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">max</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">20.0</oasis:entry>

         <oasis:entry colname="col4">700.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M151" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">1.0</oasis:entry>

         <oasis:entry colname="col4">6.0</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M152" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.01</oasis:entry>

         <oasis:entry colname="col4">0.5</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2"><inline-formula><mml:math id="M153" display="inline"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col3">0.001</oasis:entry>

         <oasis:entry colname="col4">0.2</oasis:entry>

       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<table-wrap id="TA3"><label>Table A3</label><caption><p id="d2e4304">Catchment attributes from the CAMELS-GB dataset used to train all models.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Type</oasis:entry>
         <oasis:entry colname="col2">Attribute</oasis:entry>
         <oasis:entry colname="col3">Description</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Topographic</oasis:entry>
         <oasis:entry colname="col2">area</oasis:entry>
         <oasis:entry colname="col3">catchment area (km<sup>2</sup>)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Topographic</oasis:entry>
         <oasis:entry colname="col2">elev_mean</oasis:entry>
         <oasis:entry colname="col3">mean elevation (m a.s.l.)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Topographic</oasis:entry>
         <oasis:entry colname="col2">dpsbar</oasis:entry>
         <oasis:entry colname="col3">slope of the catchment mean drainage path (m km<sup>−1</sup>)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Soil</oasis:entry>
         <oasis:entry colname="col2">sand_perc</oasis:entry>
         <oasis:entry colname="col3">percent sand (%)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Soil</oasis:entry>
         <oasis:entry colname="col2">silt_perc</oasis:entry>
         <oasis:entry colname="col3">percent silt  (%)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Soil</oasis:entry>
         <oasis:entry colname="col2">clay_perc</oasis:entry>
         <oasis:entry colname="col3">percent clay  (%)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Soil</oasis:entry>
         <oasis:entry colname="col2">porosity_hypres</oasis:entry>
         <oasis:entry colname="col3">soil porosity calculate using the hypres pedotransfer function (-)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Soil</oasis:entry>
         <oasis:entry colname="col2">conductivity_hypres</oasis:entry>
         <oasis:entry colname="col3">hydraulic conductiviyu calculated using the hypres pedotransfer function (–)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Soil</oasis:entry>
         <oasis:entry colname="col2">soil_depth_pelletier</oasis:entry>
         <oasis:entry colname="col3">depth to bedrock (m)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Land cover</oasis:entry>
         <oasis:entry colname="col2">dwood_perc</oasis:entry>
         <oasis:entry colname="col3">fraction of precipitation falling as snow (for days colder than 0°C)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Land cover</oasis:entry>
         <oasis:entry colname="col2">ewood_perc</oasis:entry>
         <oasis:entry colname="col3">percent of catchment that is deciduous woodland (%)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Land cover</oasis:entry>
         <oasis:entry colname="col2">crop_perc</oasis:entry>
         <oasis:entry colname="col3">percent of catchment that is evergreen woodland (%)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Land cover</oasis:entry>
         <oasis:entry colname="col2">urban_perc</oasis:entry>
         <oasis:entry colname="col3">percent of catchment that is cropland (%)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Human influence</oasis:entry>
         <oasis:entry colname="col2">reservoir_cap</oasis:entry>
         <oasis:entry colname="col3">percent of catchment that is urban area (%)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Climatic</oasis:entry>
         <oasis:entry colname="col2">p_mean</oasis:entry>
         <oasis:entry colname="col3">catchment reservoir capacity (ML)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Climatic</oasis:entry>
         <oasis:entry colname="col2">pet_mean</oasis:entry>
         <oasis:entry colname="col3">mean daily precipitation (mm d<sup>−1</sup>)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Climatic</oasis:entry>
         <oasis:entry colname="col2">p_seasonality</oasis:entry>
         <oasis:entry colname="col3">mean daily PET (mm d<sup>−1</sup>)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Climatic</oasis:entry>
         <oasis:entry colname="col2">frac_snow</oasis:entry>
         <oasis:entry colname="col3">seasonality and timing of precipitation (estimated using sine curves)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Climatic</oasis:entry>
         <oasis:entry colname="col2">high_prec_freq</oasis:entry>
         <oasis:entry colname="col3">frequency of high-precipitation days (≥5× mean daily precipitation)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Climatic</oasis:entry>
         <oasis:entry colname="col2">low_prec_freq</oasis:entry>
         <oasis:entry colname="col3">frequency of dry days (<inline-formula><mml:math id="M158" display="inline"><mml:mi mathvariant="italic">&lt;</mml:mi></mml:math></inline-formula> 1 mm d<sup>−1</sup>)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Climatic</oasis:entry>
         <oasis:entry colname="col2">high_prec_dur</oasis:entry>
         <oasis:entry colname="col3">average duration of high-precipitation events (<inline-formula><mml:math id="M160" display="inline"><mml:mi mathvariant="normal">≥</mml:mi></mml:math></inline-formula> 5<inline-formula><mml:math id="M161" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> mean daily   precipitation)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Climatic</oasis:entry>
         <oasis:entry colname="col2">low_prec_dur</oasis:entry>
         <oasis:entry colname="col3">average duration of dry periods (number of consecutive days <inline-formula><mml:math id="M162" display="inline"><mml:mi mathvariant="italic">&lt;</mml:mi></mml:math></inline-formula> 1 mm d<inline-formula><mml:math id="M163" display="inline"><mml:mrow><mml:msup><mml:mi/><mml:mi mathvariant="normal">−</mml:mi></mml:msup><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
</app>

<app id="App1.Ch1.S2">
  <label>Appendix B</label><title>Additional didactic examples</title>
      <p id="d2e4709">In Sect. <xref ref-type="sec" rid="Ch1.S3"/> and specifically Fig. <xref ref-type="fig" rid="F2"/> we showed five key didactic cases that can be used to understand the rankings in Figs. <xref ref-type="fig" rid="F3"/> and <xref ref-type="fig" rid="F4"/>. To further our understanding of the behaviors that can be quantified by measuring the entropy of the hidden states of the LSTM, we show four additional examples in Fig. <xref ref-type="fig" rid="FB1"/>. These additional examples also follow in the framework of multiple working hypotheses <xref ref-type="bibr" rid="bib1.bibx22" id="paren.90"/>, and in this section we briefly describe what can be learned from them.</p>
      <p id="d2e4726">Model 5 represents a case in which the added knowledge is lacking the degrees of freedom that the “true” model has. Thus, the LSTM has to take over and compensate using its internal hidden states, resulting in the high measurement of entropy of the LSTM. Although this behavior is also apparent in the variations of the parameters shown in Fig. <xref ref-type="fig" rid="FB1"/>e, measuring entropy there can result in the wrong assumption that a conceptual model with more reservoirs in Model 3 would more closely resemble the “true” model, as shown in Fig. <xref ref-type="fig" rid="F3"/> but this is not true. The true picture is given by the entropy of the LSTMs in Fig. <xref ref-type="fig" rid="F4"/>.</p>
      <p id="d2e4735">Models 6 to 8 all serve as cases where the conceptual model has a greater number of parameters than the “true” model, making them overparametrized but cases such as 6 and 8 resemble the “true” model very closely. Model 7 also has the “true” model embedded within, but the input relationship is distorted by the extra reservoir in <inline-formula><mml:math id="M164" display="inline"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>. </p>
      <p id="d2e4750">Returning to the idea of how much a model closely matches the “true” model, the ranking in Fig. <xref ref-type="fig" rid="F4"/> makes intuitive sense. The “true” model is furthest to the left followed by Models 4, 6 and 8 which use the same number of reservoirs and whose input-output relationships can match that of the “true” model (all have an exponential term). This grouping is followed by Model 7 which has the ability to match the output relationship using the second exponential reservoir but the input is not directly precipitation and evapotranspiration but some dampened product coming from the first reservoir. Next we have the divider between encoding “good” and “bad” physics. All of the previous models have the “true” model (in some sense) within their structure. This additional knowledge makes the task required of the LSTM easier, thus reducing its entropy. If, instead, we encode “bad” physics, we fall in cases where the model is able to perfectly fit the observed data but not because of the additional knowledge, but because of a more complex LSTM which now, in addition to prediction, has to overwrite our incorrect prior knowledge.</p>
      <p id="d2e4757">Model 3 is the worst offender as the usage of a two reservoirs system with none of the reservoirs having an exponential term in their output relationship is the case most dissimilar to the “true” model. Models 2 and 5 improve upon this condition, but still lack some semblance of the “true” model in their structure making them to also fall in the “bad” physics category.</p><fig id="FB1"><label>Figure B1</label><caption><p id="d2e4762">Additional examples on evaluating hybrid hydrological models by measuring the entropy of different model components.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f16.png"/>

      </fig>

</app>

<app id="App1.Ch1.S3">
  <label>Appendix C</label><title>Standardizing training pipelines</title>
      <p id="d2e4781">The results in the case study presented in Sect. <xref ref-type="sec" rid="Ch1.S4"/> follow the results of the previous study of <xref ref-type="bibr" rid="bib1.bibx3" id="text.91"/>. However the data and metrics reported between studies related to model variables and performance are not the same. Between studies we modified our training pipelines to adopt current standard practices <xref ref-type="bibr" rid="bib1.bibx72 bib1.bibx56" id="paren.92"/>, therefore there are differences between the metrics.</p>
      <p id="d2e4792">For the models with static parameters, these differences are shown in Fig. <xref ref-type="fig" rid="FC1"/> and Table <xref ref-type="table" rid="TC1"/>. In our previous study, the conceptual models were calibrated individually for each basin using the DREAM algorithm <xref ref-type="bibr" rid="bib1.bibx77" id="paren.93"/>. This procedure results in better performance at predicting streamflow than in regional training as we did for this current study. The drop in performance could be the due to the identification of regional sets of parameters, as in <xref ref-type="bibr" rid="bib1.bibx76" id="text.94"/>, but we did not pursue this finding further.</p>
      <p id="d2e4806">Then for the models with dynamic parameters, the differences are shown in Fig. <xref ref-type="fig" rid="FC2"/> and Table <xref ref-type="table" rid="TC2"/>.</p>
      <p id="d2e4815">As in all cases the differences in metrics between studies are small, we accept them while acknowledging that the models analyzed in this study are different than those in <xref ref-type="bibr" rid="bib1.bibx3" id="text.95"/>. Moreover, the main objective of the present study is not to set a “state-of-the-art” benchmark for a particular dataset and, accordingly, the overall message that we wish to communicate is not affected by the differences in performance between studies.</p><fig id="FC1"><label>Figure C1</label><caption><p id="d2e4823">Comparison of model performance for models with static parameters between studies.</p></caption>
        <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f17.png"/>

      </fig>

      <fig id="FC2"><label>Figure C2</label><caption><p id="d2e4834">Comparison of model performance for models with dynamic parameters between studies.</p></caption>
        <graphic xlink:href="https://hess.copernicus.org/articles/30/629/2026/hess-30-629-2026-f18.png"/>

      </fig>

<table-wrap id="TC1"><label>Table C1</label><caption><p id="d2e4846">Comparison of model performance for models with static parameters quantified by area under the NSE curve (AUC) and median NSE.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1">Model</oasis:entry>

         <oasis:entry colname="col2">Version</oasis:entry>

         <oasis:entry colname="col3">AUC</oasis:entry>

         <oasis:entry colname="col4">Median NSE</oasis:entry>

       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="1">SHM</oasis:entry>

         <oasis:entry colname="col2">TBONTB</oasis:entry>

         <oasis:entry colname="col3">0.243</oasis:entry>

         <oasis:entry colname="col4">0.760</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2">New</oasis:entry>

         <oasis:entry colname="col3">0.267</oasis:entry>

         <oasis:entry colname="col4">0.747</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="1">Bucket</oasis:entry>

         <oasis:entry colname="col2">TBONTB</oasis:entry>

         <oasis:entry colname="col3">0.381</oasis:entry>

         <oasis:entry colname="col4">0.590</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2">New</oasis:entry>

         <oasis:entry colname="col3">0.395</oasis:entry>

         <oasis:entry colname="col4">0.582</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col1" morerows="1">Nonsense</oasis:entry>

         <oasis:entry colname="col2">TBONTB</oasis:entry>

         <oasis:entry colname="col3">0.441</oasis:entry>

         <oasis:entry colname="col4">0.510</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">New</oasis:entry>

         <oasis:entry colname="col3">0.477</oasis:entry>

         <oasis:entry colname="col4">0.511</oasis:entry>

       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

<table-wrap id="TC2"><label>Table C2</label><caption><p id="d2e4970">Comparison of model performance quantified by area under the NSE curve (AUC) and median NSE.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1">Model</oasis:entry>

         <oasis:entry colname="col2">Version</oasis:entry>

         <oasis:entry colname="col3">AUC</oasis:entry>

         <oasis:entry colname="col4">Median NSE</oasis:entry>

       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="1">LSTM</oasis:entry>

         <oasis:entry colname="col2">TBONTB</oasis:entry>

         <oasis:entry colname="col3">0.120</oasis:entry>

         <oasis:entry colname="col4">0.870</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2">New</oasis:entry>

         <oasis:entry colname="col3">0.123</oasis:entry>

         <oasis:entry colname="col4">0.865</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="1">Hybrid SHM</oasis:entry>

         <oasis:entry colname="col2">TBONTB</oasis:entry>

         <oasis:entry colname="col3">0.216</oasis:entry>

         <oasis:entry colname="col4">0.844</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2">New</oasis:entry>

         <oasis:entry colname="col3">0.216</oasis:entry>

         <oasis:entry colname="col4">0.839</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="1">Hybrid Bucket</oasis:entry>

         <oasis:entry colname="col2">TBONTB</oasis:entry>

         <oasis:entry colname="col3">0.168</oasis:entry>

         <oasis:entry colname="col4">0.857</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2">New</oasis:entry>

         <oasis:entry colname="col3">0.147</oasis:entry>

         <oasis:entry colname="col4">0.852</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col1" morerows="1">Hybrid Nonsense</oasis:entry>

         <oasis:entry colname="col2">TBONTB</oasis:entry>

         <oasis:entry colname="col3">0.310</oasis:entry>

         <oasis:entry colname="col4">0.797</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">New</oasis:entry>

         <oasis:entry colname="col3">0.265</oasis:entry>

         <oasis:entry colname="col4">0.801</oasis:entry>

       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>


</app>
  </app-group><notes notes-type="codeavailability"><title>Code availability</title>

      <p id="d2e5123">An archived version of the codebase used for this study is provided in the repository indicated in the data availability section of this paper. We also used the Hy2DL library (<ext-link xlink:href="https://doi.org/10.5281/zenodo.17251944" ext-link-type="DOI">10.5281/zenodo.17251944</ext-link>, <xref ref-type="bibr" rid="bib1.bibx2" id="altparen.96"/>) and the UNITE toolbox, which is available at <uri>https://github.com/manuel-alvarez-chaves/unite_toolbox</uri> (last access: 25 March 2025; <ext-link xlink:href="https://doi.org/10.18419/DARUS-4188" ext-link-type="DOI">10.18419/DARUS-4188</ext-link>, <xref ref-type="bibr" rid="bib1.bibx7" id="altparen.97"/>).</p>
  </notes><notes notes-type="dataavailability"><title>Data availability</title>

      <p id="d2e5144">CAMELS-GB is available at <ext-link xlink:href="https://doi.org/10.5285/8344e4f3-d2ea-44f5-8afa-86d2987543a9" ext-link-type="DOI">10.5285/8344e4f3-d2ea-44f5-8afa-86d2987543a9</ext-link> <xref ref-type="bibr" rid="bib1.bibx25" id="paren.98"/>. All of the code for this project, model state dictionaries, model configurations, training logs and netCDF files of the results of this study have been archived at the data repository of the University of Stuttgart (DaRUS) and can be found at this link: <ext-link xlink:href="https://doi.org/10.18419/DARUS-4920" ext-link-type="DOI">10.18419/DARUS-4920</ext-link> <xref ref-type="bibr" rid="bib1.bibx6" id="paren.99"/>.</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e5162">The original idea of the paper was developed by all authors. The codes were written by EAE and MAC. The simulations were conducted by MAC. Results were discussed by all authors. The draft was prepared by MAC and AG. Reviewing and editing was provided by all authors. Funding was acquired by AG and UE. All authors have read and agreed to the current version of the paper.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e5168">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e5175">Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e5181">We acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy–EXC 2075–390740016 and project 507884992. We would also like to thank Hoshin V. Gupta for his encouragement and thoughtful discussion of the results in this paper and their relationship to model complexity. Finally, we thank Georgios Blougouras, Shijie Jiang, and one anonymous reviewer for their constructive comments, which provided insightful viewpoints on our work and helped us improve the perspective of the article.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e5186">This research has been supported by the Deutsche Forschungsgemeinschaft (grant nos. EXC 2075–390740016 and 507884992).This open-access publication was funded  by the University of Stuttgart.</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e5198">This paper was edited by Fabrizio Fenicia and reviewed by Georgios Blougouras, Shijie Jiang, and one anonymous referee.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Aboelyazeed et al.(2023)Aboelyazeed, Xu, Hoffman, Liu, Jones, Rackauckas, Lawson, and Shen</label><mixed-citation>Aboelyazeed, D., Xu, C., Hoffman, F. M., Liu, J., Jones, A. W., Rackauckas, C., Lawson, K., and Shen, C.: A differentiable, physics-informed ecosystem modeling and learning framework for large-scale inverse problems: demonstration with photosynthesis simulations, Biogeosciences, 20, 2671–2692, <ext-link xlink:href="https://doi.org/10.5194/bg-20-2671-2023" ext-link-type="DOI">10.5194/bg-20-2671-2023</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Acuna et al.(2025)</label><mixed-citation>Acuna, E., Álvarez Chaves, M., Dolich, A., and Manoj J, A.: Hy2DL: Hybrid Hydrological modeling using Deep Learning methods, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.17251944" ext-link-type="DOI">10.5281/zenodo.17251944</ext-link>,  2025.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Acuña Espinoza et al.(2024)Acuña Espinoza, Loritz, Álvarez Chaves, Bäuerle, and Ehret</label><mixed-citation>Acuña Espinoza, E., Loritz, R., Álvarez Chaves, M., Bäuerle, N., and Ehret, U.: To bucket or not to bucket? Analyzing the performance and interpretability of hybrid hydrological models with dynamic parameterization, Hydrol. Earth Syst. Sci., 28, 2705–2719, <ext-link xlink:href="https://doi.org/10.5194/hess-28-2705-2024" ext-link-type="DOI">10.5194/hess-28-2705-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Addor and Melsen(2019)</label><mixed-citation>Addor, N. and Melsen, L. A.: Legacy, Rather Than Adequacy, Drives the Selection of Hydrological Models, Water Resources Research, 55, 378–390, <ext-link xlink:href="https://doi.org/10.1029/2018WR022958" ext-link-type="DOI">10.1029/2018WR022958</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Addor et al.(2017)Addor, Newman, Mizukami, and Clark</label><mixed-citation>Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, <ext-link xlink:href="https://doi.org/10.5194/hess-21-5293-2017" ext-link-type="DOI">10.5194/hess-21-5293-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Álvarez Chaves(2025)</label><mixed-citation>Álvarez Chaves, M.: Replication Data for: An entropy-based evaluation of conceptual constraints in hybrid hydrological models, V1, Data Repository of the University of Stuttgart (DaRUS) [data set] and [code], <ext-link xlink:href="https://doi.org/10.18419/DARUS-4920" ext-link-type="DOI">10.18419/DARUS-4920</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Álvarez Chaves et al.(2024)</label><mixed-citation>Álvarez Chaves, M., Ehret, U., and Guthke, A.: UNITE Toolbox, V1, Data Repository of the University of Stuttgart (DaRUS) [code], <ext-link xlink:href="https://doi.org/10.18419/DARUS-4188" ext-link-type="DOI">10.18419/DARUS-4188</ext-link>,  2024.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Ansel et al.(2019)</label><mixed-citation>Ansel, J., Yang, E.,  He, H., Gimelshein, N.,  Jain, A., Voznesensky, M., Bao, B.,  Bell, P.,  Berard, D.,  Burovski, E., Chauhan, G.,  Chourdia, A., Constable, W.,  Desmaison, A., DeVito, Z., Ellison, E.,  Feng, W.,  Gong, J.,  Gschwind, M.,  Hirsh, B.,  Huang, S.,  Kalambarkar, K.,  Kirsch, L.,  Lazos, M.,  Lezcano, M.,  Liang, Y.,  Liang, J., Lu, Y.,  Luk, C. K., Maher, B.,  Pan, Y., Puhrsch, C., Reso, M.,  Saroufim, M.,  Siraichi, M. Y.,  Suk, H.,  Zhang, S.,  Suo, M.,  Tillet, P.,  Zhao, X., Wang, E.,  Zhou, K.,  Zou, R.,  Wang, X.,  Mathews, A., Wen, W.,  Chanan, G.,  Wu, P., and Chintala, S.: PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation, in: 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS '24), ACM, <ext-link xlink:href="https://doi.org/10.1145/3620665.3640366" ext-link-type="DOI">10.1145/3620665.3640366</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Azmi et al.(2021)Azmi, Ehret, Weijs, Ruddell, and Perdigão</label><mixed-citation>Azmi, E., Ehret, U., Weijs, S. V., Ruddell, B. L., and Perdigão, R. A. P.: Technical note: “Bit by bit”: a practical and general approach for evaluating model computational complexity vs. model performance, Hydrol. Earth Syst. Sci., 25, 1103–1115, <ext-link xlink:href="https://doi.org/10.5194/hess-25-1103-2021" ext-link-type="DOI">10.5194/hess-25-1103-2021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Bandai et al.(2024)Bandai, Ghezzehei, Jiang, Kidger, Chen, and Steefel</label><mixed-citation>Bandai, T., Ghezzehei, T. A., Jiang, P., Kidger, P., Chen, X., and Steefel, C. I.: Learning Constitutive Relations From Soil Moisture Data via Physically Constrained Neural Networks, Water Resources Research, 60, e2024WR037318, <ext-link xlink:href="https://doi.org/10.1029/2024WR037318" ext-link-type="DOI">10.1029/2024WR037318</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Beck et al.(2016)Beck, van Dijk, de Roo, Miralles, McVicar, Schellekens, and Bruijnzeel</label><mixed-citation>Beck, H. E., van Dijk, A. I. J. M., de Roo, A., Miralles, D. G., McVicar, T. R., Schellekens, J., and Bruijnzeel, L. A.: Global-scale regionalization of hydrologic model parameters, Water Resources Research, 52, 3599–3622, <ext-link xlink:href="https://doi.org/10.1002/2015WR018247" ext-link-type="DOI">10.1002/2015WR018247</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Beck et al.(2020)Beck, Pan, Lin, Seibert, van Dijk, and Wood</label><mixed-citation>Beck, H. E., Pan, M., Lin, P., Seibert, J., van Dijk, A. I. J. M., and Wood, E. F.: Global Fully Distributed Parameter Regionalization Based on Observed Streamflow From 4,229 Headwater Catchments, Journal of Geophysical Research: Atmospheres, 125, e2019JD031485, <ext-link xlink:href="https://doi.org/10.1029/2019JD031485" ext-link-type="DOI">10.1029/2019JD031485</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Beirlant et al.(1997)Beirlant, Dudewicz, Györfi, and Van der Meulen</label><mixed-citation>Beirlant, J., Dudewicz, E. J., Györfi, L., and Van der Meulen, E. C.: Nonparametric Entropy Estimation: An Overview, International Journal of Mathematical and Statistical Sciences, 6, 17–39, <uri>http://jimbeck.caltech.edu/summerlectures/references/Entropy estimation.pdf</uri> (last access: 20 January 2026), 1997.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Bergström and Forsman(1973)</label><mixed-citation>Bergström, S. and Forsman,  A.: Development of a conceptual deterministc rainfall-runoff model, Hydrology Research, 4, 147–170, <ext-link xlink:href="https://doi.org/10.2166/nh.1973.0012" ext-link-type="DOI">10.2166/nh.1973.0012</ext-link>, 1973.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Beven(2020)</label><mixed-citation>Beven, K.: Deep learning, hydrological processes and the uniqueness of place, Hydrological Processes, 34, 3608–3613, <ext-link xlink:href="https://doi.org/10.1002/hyp.13805" ext-link-type="DOI">10.1002/hyp.13805</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Bindas et al.(2024)Bindas, Tsai, Liu, Rahmani, Feng, Bian, Lawson, and Shen</label><mixed-citation>Bindas, T., Tsai, W.-P., Liu, J., Rahmani, F., Feng, D., Bian, Y., Lawson, K., and Shen, C.: Improving River Routing Using a Differentiable Muskingum-Cunge Model and Physics-Informed Machine Learning, Water Resources Research, 60, e2023WR035337, <ext-link xlink:href="https://doi.org/10.1029/2023WR035337" ext-link-type="DOI">10.1029/2023WR035337</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Blougouras et al.(2024)Blougouras, Reichstein, Migliavacca, Brenning, and Jiang</label><mixed-citation>Blougouras, G., Reichstein, M., Migliavacca, M., Brenning, A., and Jiang, S.: Interpretable Machine Learning to Uncover Key Compound Drivers of Hydrological Droughts, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9639, <ext-link xlink:href="https://doi.org/10.5194/egusphere-egu24-9639" ext-link-type="DOI">10.5194/egusphere-egu24-9639</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Bradbury et al.(2018)Bradbury, Frostig, Hawkins, Johnson, Leary, Maclaurin, Necula, Paszke, VanderPlas, Wanderman-Milne, and Zhang</label><mixed-citation>Bradbury, J., Frostig, R., Hawkins, P., Johnson, M. J., Leary, C., Maclaurin, D., Necula, G., Paszke, A., VanderPlas, J., Wanderman-Milne, S., and Zhang, Q.: JAX: composable transformations of Python+NumPy programs, GitHub [code], <uri>http://github.com/jax-ml/jax</uri> (last access: 20 January 2026), 2018.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Camus(1991)</label><mixed-citation> Camus, A.: The myth of Sisyphus and other essays, Vintage Books, New York, 1st vintage international edn., ISBN 978-0-679-73373-7, 1991.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Chlumsky et al.(2023)Chlumsky, Mai, Craig, and Tolson</label><mixed-citation>Chlumsky, R., Mai, J., Craig, J. R., and Tolson, B. A.: Advancement of a blended hydrologic model for robust model performance, Hydrol. Earth Syst. Sci. Discuss. [preprint], <ext-link xlink:href="https://doi.org/10.5194/hess-2023-69" ext-link-type="DOI">10.5194/hess-2023-69</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Clark et al.(2008)Clark, Slater, Rupp, Woods, Vrugt, Gupta, Wagener, and Hay</label><mixed-citation>Clark, M. P., Slater, A. G., Rupp, D. E., Woods, R. A., Vrugt, J. A., Gupta, H. V., Wagener, T., and Hay, L. E.: Framework for Understanding Structural Errors (FUSE): A modular framework to diagnose differences between hydrological models, Water Resources Research, 44, <ext-link xlink:href="https://doi.org/10.1029/2007WR006735" ext-link-type="DOI">10.1029/2007WR006735</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Clark et al.(2011)Clark, Kavetski, and Fenicia</label><mixed-citation>Clark, M. P., Kavetski, D., and Fenicia, F.: Pursuing the method of multiple working hypotheses for hydrological modeling, Water Resources Research, 47, <ext-link xlink:href="https://doi.org/10.1029/2010WR009827" ext-link-type="DOI">10.1029/2010WR009827</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Cover and Thomas(2006)</label><mixed-citation> Cover, T. M. and Thomas, J. A.: Elements of Information Theory, Wiley, Hoboken, N.J, 2 edn., ISBN 978-0-471-24195-9, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Coxon et al.(2020a)Coxon, Addor, Bloomfield, Freer, Fry, Hannaford, Howden, Lane, Lewis, Robinson, Wagener, and Woods</label><mixed-citation>Coxon, G., Addor, N., Bloomfield, J. P., Freer, J., Fry, M., Hannaford, J., Howden, N. J. K., Lane, R., Lewis, M., Robinson, E. L., Wagener, T., and Woods, R.: CAMELS-GB: hydrometeorological time series and landscape attributes for 671 catchments in Great Britain, Earth Syst. Sci. Data, 12, 2459–2483, <ext-link xlink:href="https://doi.org/10.5194/essd-12-2459-2020" ext-link-type="DOI">10.5194/essd-12-2459-2020</ext-link>, 2020a.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Coxon et al.(2020b)</label><mixed-citation>Coxon, G., Addor, N., Bloomfield, J. P.,  Freer, J.,  Fry, M., Hannaford, J., Howden, N. J. K., Lane, R.,  Lewis, M., Robinson, E. L., Wagener, T., and Woods, R.: Catchment attributes and hydro-meteorological timeseries for 671 catchments across Great Britain (CAMELS-GB), NERC Environmental Information Data Centre [data set], <ext-link xlink:href="https://doi.org/10.5285/8344e4f3-d2ea-44f5-8afa-86d2987543a9" ext-link-type="DOI">10.5285/8344e4f3-d2ea-44f5-8afa-86d2987543a9</ext-link>, 2020b.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Craig et al.(2020)Craig, Brown, Chlumsky, Jenkinson, Jost, Lee, Mai, Serrer, Sgro, Shafii, Snowdon, and Tolson</label><mixed-citation>Craig, J. R., Brown, G., Chlumsky, R., Jenkinson, R. W., Jost, G., Lee, K., Mai, J., Serrer, M., Sgro, N., Shafii, M., Snowdon, A. P., and Tolson, B. A.: Flexible watershed simulation with the Raven hydrological modelling framework, Environmental Modelling &amp; Software, 129, 104728, <ext-link xlink:href="https://doi.org/10.1016/j.envsoft.2020.104728" ext-link-type="DOI">10.1016/j.envsoft.2020.104728</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Dal Molin et al.(2021)Dal Molin, Kavetski, and Fenicia</label><mixed-citation>Dal Molin, M., Kavetski, D., and Fenicia, F.: SuperflexPy 1.3.0: an open-source Python framework for building, testing, and improving conceptual hydrological models, Geosci. Model Dev., 14, 7047–7072, <ext-link xlink:href="https://doi.org/10.5194/gmd-14-7047-2021" ext-link-type="DOI">10.5194/gmd-14-7047-2021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>De la Fuente et al.(2024)De la Fuente, Ehsani, Gupta, and Condon</label><mixed-citation>De la Fuente, L. A., Ehsani, M. R., Gupta, H. V., and Condon, L. E.: Toward interpretable LSTM-based modeling of hydrological systems, Hydrol. Earth Syst. Sci., 28, 945–971, <ext-link xlink:href="https://doi.org/10.5194/hess-28-945-2024" ext-link-type="DOI">10.5194/hess-28-945-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Ehret et al.(2020)Ehret, van Pruijssen, Bortoli, Loritz, Azmi, and Zehe</label><mixed-citation>Ehret, U., van Pruijssen, R., Bortoli, M., Loritz, R., Azmi, E., and Zehe, E.: Adaptive clustering: reducing the computational costs of distributed (hydrological) modelling by exploiting time-variable similarity among model elements, Hydrol. Earth Syst. Sci., 24, 4389–4411, <ext-link xlink:href="https://doi.org/10.5194/hess-24-4389-2020" ext-link-type="DOI">10.5194/hess-24-4389-2020</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Feng et al.(2022)Feng, Liu, Lawson, and Shen</label><mixed-citation>Feng, D., Liu, J., Lawson, K., and Shen, C.: Differentiable, Learnable, Regionalized Process‐Based Models With Multiphysical Outputs can Approach State‐Of‐The‐Art Hydrologic Prediction Accuracy, Water Resources Research, 58, e2022WR032404, <ext-link xlink:href="https://doi.org/10.1029/2022WR032404" ext-link-type="DOI">10.1029/2022WR032404</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Feng et al.(2023)Feng, Beck, Lawson, and Shen</label><mixed-citation>Feng, D., Beck, H., Lawson, K., and Shen, C.: The suitability of differentiable, physics-informed machine learning hydrologic models for ungauged regions and climate change impact assessment, Hydrol. Earth Syst. Sci., 27, 2357–2373, <ext-link xlink:href="https://doi.org/10.5194/hess-27-2357-2023" ext-link-type="DOI">10.5194/hess-27-2357-2023</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Feng et al.(2024a)Feng, Beck, de Bruijn, Sahu, Satoh, Wada, Liu, Pan, Lawson, and Shen</label><mixed-citation>Feng, D., Beck, H., de Bruijn, J., Sahu, R. K., Satoh, Y., Wada, Y., Liu, J., Pan, M., Lawson, K., and Shen, C.: Deep dive into hydrologic simulations at global scale: harnessing the power of deep learning and physics-informed differentiable models (<inline-formula><mml:math id="M165" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula>HBV-globe1.0-hydroDL), Geosci. Model Dev., 17, 7181–7198, <ext-link xlink:href="https://doi.org/10.5194/gmd-17-7181-2024" ext-link-type="DOI">10.5194/gmd-17-7181-2024</ext-link>, 2024a.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Feng et al.(2024b)Feng, Li, Xu, Wang, Zhang, Wu, Lai, Zeng, Tong, and Jiang</label><mixed-citation>Feng, J., Li, J., Xu, C., Wang, Z., Zhang, Z., Wu, X., Lai, C., Zeng, Z., Tong, H., and Jiang, S.: Viewing Soil Moisture Flash Drought Onset Mechanism and Their Changes Through XAI Lens: A Case Study in Eastern China, Water Resources Research, 60, e2023WR036297, <ext-link xlink:href="https://doi.org/10.1029/2023WR036297" ext-link-type="DOI">10.1029/2023WR036297</ext-link>, 2024b.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Fenicia et al.(2011)Fenicia, Kavetski, and Savenije</label><mixed-citation>Fenicia, F., Kavetski, D., and Savenije, H. H. G.: Elements of a flexible approach for conceptual hydrological modeling: 1. Motivation and theoretical development, Water Resources Research, 47, <ext-link xlink:href="https://doi.org/10.1029/2010WR010174" ext-link-type="DOI">10.1029/2010WR010174</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Frame et al.(2021)Frame, Kratzert, Raney II, Rahman, Salas, and Nearing</label><mixed-citation>Frame, J. M., Kratzert, F., Raney II, A., Rahman, M., Salas, F. R., and Nearing, G. S.: Post-Processing the National Water Model with Long Short-Term Memory Networks for Streamflow Predictions and Model Diagnostics, JAWRA Journal of the American Water Resources Association, 57, 885–905, <ext-link xlink:href="https://doi.org/10.1111/1752-1688.12964" ext-link-type="DOI">10.1111/1752-1688.12964</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Frame et al.(2023)Frame, Kratzert, Gupta, Ullrich, and Nearing</label><mixed-citation>Frame, J. M., Kratzert, F., Gupta, H. V., Ullrich, P., and Nearing, G. S.: On strictly enforced mass conservation constraints for modelling the Rainfall-Runoff process, Hydrological Processes, 37, e14847, <ext-link xlink:href="https://doi.org/10.1002/hyp.14847" ext-link-type="DOI">10.1002/hyp.14847</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Gajamannage et al.(2023)Gajamannage, Jayathilake, Park, and Bollt</label><mixed-citation>Gajamannage, K., Jayathilake, D. I., Park, Y., and Bollt, E. M.: Recurrent neural networks for dynamical systems: Applications to ordinary differential equations, collective motion, and hydrological modeling, Chaos: An Interdisciplinary Journal of Nonlinear Science, 33, 013109, <ext-link xlink:href="https://doi.org/10.1063/5.0088748" ext-link-type="DOI">10.1063/5.0088748</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Galatolo et al.(2010)Galatolo, Hoyrup, and Rojas</label><mixed-citation>Galatolo, S., Hoyrup, M., and Rojas, C.: Effective symbolic dynamics, random points, statistical behavior, complexity and entropy, Information and Computation, 208, 23–41, <ext-link xlink:href="https://doi.org/10.1016/j.ic.2009.05.001" ext-link-type="DOI">10.1016/j.ic.2009.05.001</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Gauch et al.(2021)Gauch, Mai, and Lin</label><mixed-citation>Gauch, M., Mai, J., and Lin, J.: The proper care and feeding of CAMELS: How limited training data affects streamflow prediction, Environmental Modelling &amp; Software, 135, 104926, <ext-link xlink:href="https://doi.org/10.1016/j.envsoft.2020.104926" ext-link-type="DOI">10.1016/j.envsoft.2020.104926</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>Gong et al.(2013)Gong, Gupta, Yang, Sricharan, and Hero III</label><mixed-citation>Gong, W., Gupta, H. V., Yang, D., Sricharan, K., and Hero III, A. O.: Estimating epistemic and aleatory uncertainties during hydrologic modeling: An information theoretic approach, Water Resources Research, 49, 2253–2273, <ext-link xlink:href="https://doi.org/10.1002/wrcr.20161" ext-link-type="DOI">10.1002/wrcr.20161</ext-link>, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>Goodfellow et al.(2016)Goodfellow, Bengio, and Courville</label><mixed-citation>Goodfellow, I., Bengio, Y., and Courville, A.: Deep Learning, Adaptive computation and machine learning, MIT Press, Cambridge, Massachusetts, ISBN 978-0-262-03561-3,  <uri>http://www.deeplearningbook.org</uri> (last access: 20 January 2026), 2016.</mixed-citation></ref>
      <ref id="bib1.bibx42"><label>Gupta et al.(2008)Gupta, Wagener, and Liu</label><mixed-citation>Gupta, H. V., Wagener, T., and Liu, Y.: Reconciling Theory with Observations: Elements of a Diagnostic Approach to Model Evaluation, Hydrological Processes, 22, 3802–3813, <ext-link xlink:href="https://doi.org/10.1002/hyp.6989" ext-link-type="DOI">10.1002/hyp.6989</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Harper(2015)</label><mixed-citation>Harper, M.: 10 years on from the Cumbrian and Carlisle Floods of 2005, <uri>https://environmentagency.blog.gov.uk/2015/01/08/10-years-on-from-the-cumbrian-and-carlisle-floods-of-2005</uri> (last access: 20 January 2026), 2015.</mixed-citation></ref>
      <ref id="bib1.bibx44"><label>Hochreiter and Schmidhuber(1997)</label><mixed-citation>Hochreiter, S. and Schmidhuber, J.: Long Short-Term Memory, Neural Computation, 9, 1735–1780, <ext-link xlink:href="https://doi.org/10.1162/neco.1997.9.8.1735" ext-link-type="DOI">10.1162/neco.1997.9.8.1735</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bibx45"><label>Horton et al.(2022)Horton, Schaefli, and Kauzlaric</label><mixed-citation>Horton, P., Schaefli, B., and Kauzlaric, M.: Why do we have so many different hydrological models? A review based on the case of Switzerland, WIREs Water, 9, e1574, <ext-link xlink:href="https://doi.org/10.1002/wat2.1574" ext-link-type="DOI">10.1002/wat2.1574</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx46"><label>Hrachowitz and Clark(2017)</label><mixed-citation>Hrachowitz, M. and Clark, M. P.: HESS Opinions: The complementary merits of competing modelling philosophies in hydrology, Hydrol. Earth Syst. Sci., 21, 3953–3973, <ext-link xlink:href="https://doi.org/10.5194/hess-21-3953-2017" ext-link-type="DOI">10.5194/hess-21-3953-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx47"><label>Höge et al.(2022)Höge, Scheidegger, Baity-Jesi, Albert, and Fenicia</label><mixed-citation>Höge, M., Scheidegger, A., Baity-Jesi, M., Albert, C., and Fenicia, F.: Improving hydrologic models for predictions and process understanding using neural ODEs, Hydrol. Earth Syst. Sci., 26, 5085–5102, <ext-link xlink:href="https://doi.org/10.5194/hess-26-5085-2022" ext-link-type="DOI">10.5194/hess-26-5085-2022</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx48"><label>Jiang et al.(2020)Jiang, Zheng, and Solomatine</label><mixed-citation>Jiang, S., Zheng, Y., and Solomatine, D.: Improving AI System Awareness of Geoscience Knowledge: Symbiotic Integration of Physical Approaches and Deep Learning, Geophysical Research Letters, 47, e2020GL088229, <ext-link xlink:href="https://doi.org/10.1029/2020GL088229" ext-link-type="DOI">10.1029/2020GL088229</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx49"><label>Kingma and Ba(2017)</label><mixed-citation>Kingma, D. P. and Ba, J.: Adam: A Method for Stochastic Optimization, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.1412.6980" ext-link-type="DOI">10.48550/arXiv.1412.6980</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx50"><label>Kirchner(2009)</label><mixed-citation>Kirchner, J. W.: Catchments as simple dynamical systems: Catchment characterization, rainfall-runoff modeling, and doing hydrology backward, Water Resources Research, 45, <ext-link xlink:href="https://doi.org/10.1029/2008WR006912" ext-link-type="DOI">10.1029/2008WR006912</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx51"><label>Kozachenko and Leonenko(1987)</label><mixed-citation>Kozachenko, L. F. and Leonenko, N. N.: A statistical estimate for the entropy of a random vector, Probl. Inf. Transm., 23, 95–101, <uri>https://zbmath.org/?q=an:0633.62005</uri> (last access: 20 January 2026), 1987.</mixed-citation></ref>
      <ref id="bib1.bibx52"><label>Kratzert et al.(2018)Kratzert, Klotz, Brenner, Schulz, and Herrnegger</label><mixed-citation>Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005–6022, <ext-link xlink:href="https://doi.org/10.5194/hess-22-6005-2018" ext-link-type="DOI">10.5194/hess-22-6005-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx53"><label>Kratzert et al.(2019a)Kratzert, Herrnegger, Klotz, Hochreiter, and Klambauer</label><mixed-citation>Kratzert, F., Herrnegger, M., Klotz, D., Hochreiter, S., and Klambauer, G.: NeuralHydrology – Interpreting LSTMs in Hydrology, in: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, edited by: Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K., and Müller, K.-R.,  Springer International Publishing, Cham, 347–362, ISBN 978-3-030-28954-6, <ext-link xlink:href="https://doi.org/10.1007/978-3-030-28954-6_19" ext-link-type="DOI">10.1007/978-3-030-28954-6_19</ext-link>, 2019a.</mixed-citation></ref>
      <ref id="bib1.bibx54"><label>Kratzert et al.(2019b)Kratzert, Klotz, Herrnegger, Sampson, Hochreiter, and Nearing</label><mixed-citation>Kratzert, F., Klotz, D., Herrnegger, M., Sampson, A. K., Hochreiter, S., and Nearing, G. S.: Toward Improved Predictions in Ungauged Basins: Exploiting the Power of Machine Learning, Water Resources Research, 55, 11344–11354, <ext-link xlink:href="https://doi.org/10.1029/2019WR026065" ext-link-type="DOI">10.1029/2019WR026065</ext-link>, 2019b.</mixed-citation></ref>
      <ref id="bib1.bibx55"><label>Kratzert et al.(2019c)Kratzert, Klotz, Shalev, Klambauer, Hochreiter, and Nearing</label><mixed-citation>Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, G.: Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089–5110, <ext-link xlink:href="https://doi.org/10.5194/hess-23-5089-2019" ext-link-type="DOI">10.5194/hess-23-5089-2019</ext-link>, 2019c.</mixed-citation></ref>
      <ref id="bib1.bibx56"><label>Kratzert et al.(2024)Kratzert, Gauch, Klotz, and Nearing</label><mixed-citation>Kratzert, F., Gauch, M., Klotz, D., and Nearing, G.: HESS Opinions: Never train a Long Short-Term Memory (LSTM) network on a single basin, Hydrol. Earth Syst. Sci., 28, 4187–4201, <ext-link xlink:href="https://doi.org/10.5194/hess-28-4187-2024" ext-link-type="DOI">10.5194/hess-28-4187-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx57"><label>Kuhn and Hacking(2012)</label><mixed-citation> Kuhn, T. S. and Hacking, I.: The structure of scientific revolutions, University of Chicago press, Chicago, 4th edn., ISBN 978-0-226-45811-3 978-0-226-45812-0, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx58"><label>Lees et al.(2021)Lees, Buechel, Anderson, Slater, Reece, Coxon, and Dadson</label><mixed-citation>Lees, T., Buechel, M., Anderson, B., Slater, L., Reece, S., Coxon, G., and Dadson, S. J.: Benchmarking data-driven rainfall–runoff models in Great Britain: a comparison of long short-term memory (LSTM)-based models with four lumped conceptual models, Hydrol. Earth Syst. Sci., 25, 5517–5534, <ext-link xlink:href="https://doi.org/10.5194/hess-25-5517-2021" ext-link-type="DOI">10.5194/hess-25-5517-2021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx59"><label>Lees et al.(2022)Lees, Reece, Kratzert, Klotz, Gauch, De Bruijn, Kumar Sahu, Greve, Slater, and Dadson</label><mixed-citation>Lees, T., Reece, S., Kratzert, F., Klotz, D., Gauch, M., De Bruijn, J., Kumar Sahu, R., Greve, P., Slater, L., and Dadson, S. J.: Hydrological concept formation inside long short-term memory (LSTM) networks, Hydrol. Earth Syst. Sci., 26, 3079–3101, <ext-link xlink:href="https://doi.org/10.5194/hess-26-3079-2022" ext-link-type="DOI">10.5194/hess-26-3079-2022</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx60"><label>Li et al.(2023)Li, Sun, Tian, and Ni</label><mixed-citation>Li, B., Sun, T., Tian, F., and Ni, G.: Enhancing Process-Based Hydrological Models with Embedded Neural Networks: A Hybrid Approach, Journal of Hydrology, 625, 130107, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2023.130107" ext-link-type="DOI">10.1016/j.jhydrol.2023.130107</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx61"><label>Loritz et al.(2024)Loritz, Dolich, Acuña Espinoza, Ebeling, Guse, Götte, Hassler, Hauffe, Heidbüchel, Kiesel, Mälicke, Müller-Thomy, Stölzle, and Tarasova</label><mixed-citation>Loritz, R., Dolich, A., Acuña Espinoza, E., Ebeling, P., Guse, B., Götte, J., Hassler, S. K., Hauffe, C., Heidbüchel, I., Kiesel, J., Mälicke, M., Müller-Thomy, H., Stölzle, M., and Tarasova, L.: CAMELS-DE: hydro-meteorological time series and attributes for 1582 catchments in Germany, Earth Syst. Sci. Data, 16, 5625–5642, <ext-link xlink:href="https://doi.org/10.5194/essd-16-5625-2024" ext-link-type="DOI">10.5194/essd-16-5625-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx62"><label>MacKay(2003)</label><mixed-citation> MacKay, D. J. C.: Information theory, inference, and learning algorithms, Cambridge University Press, Cambridge, UK, New York, ISBN 978-0-521-64298-9, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx63"><label>Montavon et al.(2018)Montavon, Samek, and Müller</label><mixed-citation>Montavon, G., Samek, W., and Müller, K.-R.: Methods for interpreting and understanding deep neural networks, Digital Signal Processing, 73, 1–15, <ext-link xlink:href="https://doi.org/10.1016/j.dsp.2017.10.011" ext-link-type="DOI">10.1016/j.dsp.2017.10.011</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx64"><label>Muñoz-Sabater et al.(2021)Muñoz-Sabater, Dutra, Agustí-Panareda, Albergel, Arduini, Balsamo, Boussetta, Choulga, Harrigan, Hersbach, Martens, Miralles, Piles, Rodríguez-Fernández, Zsoter, Buontempo, and Thépaut</label><mixed-citation>Muñoz-Sabater, J., Dutra, E., Agustí-Panareda, A., Albergel, C., Arduini, G., Balsamo, G., Boussetta, S., Choulga, M., Harrigan, S., Hersbach, H., Martens, B., Miralles, D. G., Piles, M., Rodríguez-Fernández, N. J., Zsoter, E., Buontempo, C., and Thépaut, J.-N.: ERA5-Land: a state-of-the-art global reanalysis dataset for land applications, Earth Syst. Sci. Data, 13, 4349–4383, <ext-link xlink:href="https://doi.org/10.5194/essd-13-4349-2021" ext-link-type="DOI">10.5194/essd-13-4349-2021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx65"><label>Nearing et al.(2020)Nearing, Sampson, Kratzert, and Frame</label><mixed-citation>Nearing, G. S., Sampson, A. K., Kratzert, F., and Frame, J.: Post-Processing a Conceptual Rainfall-Runoff Model with an LSTM, Earth ArXiv, <uri>https://eartharxiv.org/repository/view/122/</uri> (last access: 20 January 2026), 2020.</mixed-citation></ref>
      <ref id="bib1.bibx66"><label>Nearing et al.(2021)Nearing, Kratzert, Sampson, Pelissier, Klotz, Frame, Prieto, and Gupta</label><mixed-citation>Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D., Frame, J. M., Prieto, C., and Gupta, H. V.: What Role Does Hydrological Science Play in the Age of Machine Learning?, Water Resources Research, 57, e2020WR028091, <ext-link xlink:href="https://doi.org/10.1029/2020WR028091" ext-link-type="DOI">10.1029/2020WR028091</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx67"><label>Rackauckas et al.(2021)Rackauckas, Ma, Martensen, Warner, Zubov, Supekar, Skinner, Ramadhan, and Edelman</label><mixed-citation>Rackauckas, C., Ma, Y., Martensen, J., Warner, C., Zubov, K., Supekar, R., Skinner, D., Ramadhan, A., and Edelman, A.: Universal Differential Equations for Scientific Machine Learning, arxiv [preprint], <uri>http://arxiv.org/abs/2001.04385</uri>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx68"><label>Rahmani et al.(2023)Rahmani, Appling, Feng, Lawson, and Shen</label><mixed-citation>Rahmani, F., Appling, A., Feng, D., Lawson, K., and Shen, C.: Identifying Structural Priors in a Hybrid Differentiable Model for Stream Water Temperature Modeling, Water Resources Research, 59, e2023WR034420, <ext-link xlink:href="https://doi.org/10.1029/2023WR034420" ext-link-type="DOI">10.1029/2023WR034420</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx69"><label>Reichert and Mieleitner(2009)</label><mixed-citation>Reichert, P. and Mieleitner, J.: Analyzing input and structural uncertainty of nonlinear dynamic models with stochastic, time-dependent parameters, Water Resources Research, 45, <ext-link xlink:href="https://doi.org/10.1029/2009WR007814" ext-link-type="DOI">10.1029/2009WR007814</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx70"><label>Reichert et al.(2021)Reichert, Ammann, and Fenicia</label><mixed-citation>Reichert, P., Ammann, L., and Fenicia, F.: Potential and Challenges of Investigating Intrinsic Uncertainty of Hydrological Models With Stochastic,  Time‐Dependent Parameters, Water Resources Research, 57, e2020WR028400, <ext-link xlink:href="https://doi.org/10.1029/2020WR028400" ext-link-type="DOI">10.1029/2020WR028400</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx71"><label>Reichstein et al.(2019)Reichstein, Camps-Valls, Stevens, Jung, Denzler, Carvalhais, and Prabhat</label><mixed-citation>Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J., Carvalhais, N., and Prabhat: Deep learning and process understanding for data-driven Earth system science, Nature, 566, 195–204, <ext-link xlink:href="https://doi.org/10.1038/s41586-019-0912-1" ext-link-type="DOI">10.1038/s41586-019-0912-1</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx72"><label>Shen et al.(2023)Shen, Appling, Gentine, Bandai, Gupta, Tartakovsky, Baity-Jesi, Fenicia, Kifer, Li, Liu, Ren, Zheng, Harman, Clark, Farthing, Feng, Kumar, Aboelyazeed, Rahmani, Song, Beck, Bindas, Dwivedi, Fang, Höge, Rackauckas, Mohanty, Roy, Xu, and Lawson</label><mixed-citation>Shen, C., Appling, A. P., Gentine, P., Bandai, T., Gupta, H., Tartakovsky, A., Baity-Jesi, M., Fenicia, F., Kifer, D., Li, L., Liu, X., Ren, W., Zheng, Y., Harman, C. J., Clark, M., Farthing, M., Feng, D., Kumar, P., Aboelyazeed, D., Rahmani, F., Song, Y., Beck, H. E., Bindas, T., Dwivedi, D., Fang, K., Höge, M., Rackauckas, C., Mohanty, B., Roy, T., Xu, C., and Lawson, K.: Differentiable modelling to unify machine learning and physical models for geosciences, Nature Reviews Earth &amp; Environment, 4, 552–567, <ext-link xlink:href="https://doi.org/10.1038/s43017-023-00450-9" ext-link-type="DOI">10.1038/s43017-023-00450-9</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx73"><label>Song et al.(2024)Song, Knoben, Clark, Feng, Lawson, Sawadekar, and Shen</label><mixed-citation>Song, Y., Knoben, W. J. M., Clark, M. P., Feng, D., Lawson, K., Sawadekar, K., and Shen, C.: When ancient numerical demons meet physics-informed machine learning: adjoint-based gradients for implicit differentiable modeling, Hydrol. Earth Syst. Sci., 28, 3051–3077, <ext-link xlink:href="https://doi.org/10.5194/hess-28-3051-2024" ext-link-type="DOI">10.5194/hess-28-3051-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx74"><label>Spieler et al.(2020)Spieler, Mai, Craig, Tolson, and Schütze</label><mixed-citation>Spieler, D., Mai, J., Craig, J. R., Tolson, B. A., and Schütze, N.: Automatic Model Structure Identification for Conceptual Hydrologic Models, Water Resources Research, 56, e2019WR027009, <ext-link xlink:href="https://doi.org/10.1029/2019WR027009" ext-link-type="DOI">10.1029/2019WR027009</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx75"><label>Staudinger et al.(2025)Staudinger, Herzog, Loritz, Houska, Pool, Spieler, Wagner, Mai, Kiesel, Thober, Guse, and Ehret</label><mixed-citation>Staudinger, M., Herzog, A., Loritz, R., Houska, T., Pool, S., Spieler, D., Wagner, P. D., Mai, J., Kiesel, J., Thober, S., Guse, B., and Ehret, U.: How well do process-based and data-driven hydrological models learn from limited discharge data?, Hydrol. Earth Syst. Sci., 29, 5005–5029, <ext-link xlink:href="https://doi.org/10.5194/hess-29-5005-2025" ext-link-type="DOI">10.5194/hess-29-5005-2025</ext-link>, 2025. </mixed-citation></ref>
      <ref id="bib1.bibx76"><label>Tsai et al.(2021)Tsai, Feng, Pan, Beck, Lawson, Yang, Liu, and Shen</label><mixed-citation>Tsai, W.-P., Feng, D., Pan, M., Beck, H., Lawson, K., Yang, Y., Liu, J., and Shen, C.: From calibration to parameter learning: Harnessing the scaling effects of big data in geoscientific modeling, Nature Communications, 12, 5988, <ext-link xlink:href="https://doi.org/10.1038/s41467-021-26107-z" ext-link-type="DOI">10.1038/s41467-021-26107-z</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx77"><label>Vrugt(2016)</label><mixed-citation>Vrugt, J. A.: Markov chain Monte Carlo simulation using the DREAM software package: Theory, concepts, and MATLAB implementation, Environmental Modelling &amp; Software, 75, 273–316, <ext-link xlink:href="https://doi.org/10.1016/j.envsoft.2015.08.013" ext-link-type="DOI">10.1016/j.envsoft.2015.08.013</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx78"><label>Wang and Gupta(2024)</label><mixed-citation>Wang, Y.-H. and Gupta, H. V.: A Mass-Conserving-Perceptron for Machine-Learning-Based Modeling of Geoscientific Systems, Water Resources Research, 60, e2023WR036461, <ext-link xlink:href="https://doi.org/10.1029/2023WR036461" ext-link-type="DOI">10.1029/2023WR036461</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx79"><label>Waskom(2021)</label><mixed-citation>Waskom, M.: seaborn: statistical data visualization, Journal of Open Source Software, 6, 3021, <ext-link xlink:href="https://doi.org/10.21105/joss.03021" ext-link-type="DOI">10.21105/joss.03021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx80"><label>Weiler and Beven(2015)</label><mixed-citation>Weiler, M. and Beven, K.: Do we need a Community Hydrological Model?, Water Resources Research, 51, 7777–7784, <ext-link xlink:href="https://doi.org/10.1002/2014WR016731" ext-link-type="DOI">10.1002/2014WR016731</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx81"><label>Young et al.(1996)Young, Parkinson, and Lees</label><mixed-citation>Young, P., Parkinson, S., and Lees, M.: Simplicity out of complexity in environmental modelling: Occam's razor revisited, Journal of Applied Statistics, 23, 165–210, <ext-link xlink:href="https://doi.org/10.1080/02664769624206" ext-link-type="DOI">10.1080/02664769624206</ext-link>, 1996.</mixed-citation></ref>
      <ref id="bib1.bibx82"><label>Young and Beven(1994)</label><mixed-citation>Young, P. C. and Beven, K. J.: Data-based mechanistic modelling and the rainfall-flow non-linearity, Environmetrics, 5, 335–363, <ext-link xlink:href="https://doi.org/10.1002/env.3170050311" ext-link-type="DOI">10.1002/env.3170050311</ext-link>, 1994.</mixed-citation></ref>
      <ref id="bib1.bibx83"><label>Zhang et al.(2025)Zhang, Li, Hu, Shen, Xu, Chen, Chu, and Li</label><mixed-citation>Zhang, C., Li, H., Hu, Y., Shen, D., Xu, B., Chen, M., Chu, W., and Li, R.: A Differentiability-Based Processes and Parameters Learning Hydrologic Model for Advancing Runoff Prediction and Process Understanding, Journal of Hydrology, 661, 133594, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2025.133594" ext-link-type="DOI">10.1016/j.jhydrol.2025.133594</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx84"><label>Álvarez Chaves et al.(2024)Álvarez Chaves, Gupta, Ehret, and Guthke</label><mixed-citation>Álvarez Chaves, M., Gupta, H. V., Ehret, U., and Guthke, A.: On the Accurate Estimation of Information-Theoretic Quantities from Multi-Dimensional Sample Data, Entropy, 26, 387, <ext-link xlink:href="https://doi.org/10.3390/e26050387" ext-link-type="DOI">10.3390/e26050387</ext-link>, 2024.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>When physics gets in the way: an entropy-based evaluation of conceptual constraints in hybrid hydrological models</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Aboelyazeed et al.(2023)Aboelyazeed, Xu, Hoffman, Liu, Jones,
Rackauckas, Lawson, and Shen</label><mixed-citation>
      
Aboelyazeed, D., Xu, C., Hoffman, F. M., Liu, J., Jones, A. W., Rackauckas, C., Lawson, K., and Shen, C.: A differentiable, physics-informed ecosystem modeling and learning framework for large-scale inverse problems: demonstration with photosynthesis simulations, Biogeosciences, 20, 2671–2692, <a href="https://doi.org/10.5194/bg-20-2671-2023" target="_blank">https://doi.org/10.5194/bg-20-2671-2023</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Acuna et al.(2025)</label><mixed-citation>
      
Acuna, E., Álvarez Chaves, M., Dolich, A., and Manoj J, A.:
Hy2DL: Hybrid Hydrological modeling using Deep Learning methods,
Zenodo [code],
<a href="https://doi.org/10.5281/zenodo.17251944" target="_blank">https://doi.org/10.5281/zenodo.17251944</a>,  2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Acuña Espinoza et al.(2024)Acuña Espinoza, Loritz, Álvarez Chaves,
Bäuerle, and Ehret</label><mixed-citation>
      
Acuña Espinoza, E., Loritz, R., Álvarez Chaves, M., Bäuerle, N., and Ehret, U.: To bucket or not to bucket? Analyzing the performance and interpretability of hybrid hydrological models with dynamic parameterization, Hydrol. Earth Syst. Sci., 28, 2705–2719, <a href="https://doi.org/10.5194/hess-28-2705-2024" target="_blank">https://doi.org/10.5194/hess-28-2705-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Addor and Melsen(2019)</label><mixed-citation>
      
Addor, N. and Melsen, L. A.: Legacy, Rather Than Adequacy, Drives the
Selection of Hydrological Models, Water Resources Research, 55,
378–390, <a href="https://doi.org/10.1029/2018WR022958" target="_blank">https://doi.org/10.1029/2018WR022958</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Addor et al.(2017)Addor, Newman, Mizukami, and
Clark</label><mixed-citation>
      
Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, <a href="https://doi.org/10.5194/hess-21-5293-2017" target="_blank">https://doi.org/10.5194/hess-21-5293-2017</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Álvarez Chaves(2025)</label><mixed-citation>
      
Álvarez Chaves, M.: Replication Data for: An entropy-based evaluation of conceptual constraints in hybrid hydrological models, V1,
Data Repository of the University of Stuttgart (DaRUS) [data set] and [code], <a href="https://doi.org/10.18419/DARUS-4920" target="_blank">https://doi.org/10.18419/DARUS-4920</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Álvarez Chaves et al.(2024)</label><mixed-citation>
      
Álvarez Chaves, M., Ehret, U., and Guthke, A.:
UNITE Toolbox, V1, Data Repository of the University of Stuttgart (DaRUS) [code], <a href="https://doi.org/10.18419/DARUS-4188" target="_blank">https://doi.org/10.18419/DARUS-4188</a>,  2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Ansel et al.(2019)</label><mixed-citation>
      
Ansel, J., Yang, E.,  He, H., Gimelshein, N.,  Jain, A., Voznesensky, M., Bao, B.,  Bell, P.,  Berard, D.,  Burovski, E., Chauhan, G.,  Chourdia, A., Constable, W.,  Desmaison, A., DeVito, Z., Ellison, E.,  Feng, W.,  Gong, J.,  Gschwind, M.,  Hirsh, B.,  Huang, S.,  Kalambarkar, K.,  Kirsch, L.,  Lazos, M.,  Lezcano, M.,  Liang, Y.,  Liang, J., Lu, Y.,  Luk, C. K., Maher, B.,  Pan, Y., Puhrsch, C., Reso, M.,  Saroufim, M.,  Siraichi, M. Y.,  Suk, H.,  Zhang, S.,  Suo, M.,  Tillet, P.,  Zhao, X., Wang, E.,  Zhou, K.,  Zou, R.,  Wang, X.,  Mathews, A., Wen, W.,  Chanan, G.,  Wu, P., and Chintala, S.: PyTorch 2: Faster Machine Learning Through Dynamic Python Bytecode Transformation and Graph Compilation,
in: 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2 (ASPLOS '24),
ACM, <a href="https://doi.org/10.1145/3620665.3640366" target="_blank">https://doi.org/10.1145/3620665.3640366</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Azmi et al.(2021)Azmi, Ehret, Weijs, Ruddell, and
Perdigão</label><mixed-citation>
      
Azmi, E., Ehret, U., Weijs, S. V., Ruddell, B. L., and Perdigão, R. A. P.: Technical note: “Bit by bit”: a practical and general approach for evaluating model computational complexity vs. model performance, Hydrol. Earth Syst. Sci., 25, 1103–1115, <a href="https://doi.org/10.5194/hess-25-1103-2021" target="_blank">https://doi.org/10.5194/hess-25-1103-2021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Bandai et al.(2024)Bandai, Ghezzehei, Jiang, Kidger, Chen, and
Steefel</label><mixed-citation>
      
Bandai, T., Ghezzehei, T. A., Jiang, P., Kidger, P., Chen, X., and Steefel,
C. I.: Learning Constitutive Relations From Soil Moisture Data
via Physically Constrained Neural Networks, Water Resources Research,
60, e2024WR037318, <a href="https://doi.org/10.1029/2024WR037318" target="_blank">https://doi.org/10.1029/2024WR037318</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Beck et al.(2016)Beck, van Dijk, de Roo, Miralles, McVicar,
Schellekens, and Bruijnzeel</label><mixed-citation>
      
Beck, H. E., van Dijk, A. I. J. M., de Roo, A., Miralles, D. G., McVicar,
T. R., Schellekens, J., and Bruijnzeel, L. A.: Global-scale regionalization
of hydrologic model parameters, Water Resources Research, 52, 3599–3622,
<a href="https://doi.org/10.1002/2015WR018247" target="_blank">https://doi.org/10.1002/2015WR018247</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Beck et al.(2020)Beck, Pan, Lin, Seibert, van Dijk, and
Wood</label><mixed-citation>
      
Beck, H. E., Pan, M., Lin, P., Seibert, J., van Dijk, A. I. J. M., and Wood,
E. F.: Global Fully Distributed Parameter Regionalization Based on
Observed Streamflow From 4,229 Headwater Catchments, Journal of
Geophysical Research: Atmospheres, 125, e2019JD031485,
<a href="https://doi.org/10.1029/2019JD031485" target="_blank">https://doi.org/10.1029/2019JD031485</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Beirlant et al.(1997)Beirlant, Dudewicz, Györfi, and Van der
Meulen</label><mixed-citation>
      
Beirlant, J., Dudewicz, E. J., Györfi, L., and Van der Meulen, E. C.:
Nonparametric Entropy Estimation: An Overview, International Journal
of Mathematical and Statistical Sciences, 6, 17–39,
<a href="http://jimbeck.caltech.edu/summerlectures/references/Entropy estimation.pdf" target="_blank"/> (last access: 20 January 2026),
1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Bergström and Forsman(1973)</label><mixed-citation>
      
Bergström, S. and Forsman,  A.: Development of a
conceptual deterministc rainfall-runoff model, Hydrology Research, 4,
147–170, <a href="https://doi.org/10.2166/nh.1973.0012" target="_blank">https://doi.org/10.2166/nh.1973.0012</a>, 1973.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Beven(2020)</label><mixed-citation>
      
Beven, K.: Deep learning, hydrological processes and the uniqueness of place,
Hydrological Processes, 34, 3608–3613, <a href="https://doi.org/10.1002/hyp.13805" target="_blank">https://doi.org/10.1002/hyp.13805</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Bindas et al.(2024)Bindas, Tsai, Liu, Rahmani, Feng, Bian, Lawson,
and Shen</label><mixed-citation>
      
Bindas, T., Tsai, W.-P., Liu, J., Rahmani, F., Feng, D., Bian, Y., Lawson, K.,
and Shen, C.: Improving River Routing Using a Differentiable
Muskingum-Cunge Model and Physics-Informed Machine Learning,
Water Resources Research, 60, e2023WR035337, <a href="https://doi.org/10.1029/2023WR035337" target="_blank">https://doi.org/10.1029/2023WR035337</a>,
2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Blougouras et al.(2024)Blougouras, Reichstein, Migliavacca, Brenning,
and Jiang</label><mixed-citation>
      
Blougouras, G., Reichstein, M., Migliavacca, M., Brenning, A., and Jiang, S.: Interpretable Machine Learning to Uncover Key Compound Drivers of Hydrological Droughts, EGU General Assembly 2024, Vienna, Austria, 14–19 Apr 2024, EGU24-9639, <a href="https://doi.org/10.5194/egusphere-egu24-9639" target="_blank">https://doi.org/10.5194/egusphere-egu24-9639</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Bradbury et al.(2018)Bradbury, Frostig, Hawkins, Johnson, Leary,
Maclaurin, Necula, Paszke, VanderPlas, Wanderman-Milne, and
Zhang</label><mixed-citation>
      
Bradbury, J., Frostig, R., Hawkins, P., Johnson, M. J., Leary, C., Maclaurin,
D., Necula, G., Paszke, A., VanderPlas, J., Wanderman-Milne, S., and Zhang,
Q.: JAX: composable transformations of Python+NumPy programs,
GitHub [code], <a href="http://github.com/jax-ml/jax" target="_blank"/> (last access: 20 January 2026), 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Camus(1991)</label><mixed-citation>
      
Camus, A.: The myth of Sisyphus and other essays, Vintage Books, New York,
1st vintage international edn., ISBN 978-0-679-73373-7, 1991.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Chlumsky et al.(2023)Chlumsky, Mai, Craig, and
Tolson</label><mixed-citation>
      
Chlumsky, R., Mai, J., Craig, J. R., and Tolson, B. A.: Advancement of a blended hydrologic model for robust model performance, Hydrol. Earth Syst. Sci. Discuss. [preprint], <a href="https://doi.org/10.5194/hess-2023-69" target="_blank">https://doi.org/10.5194/hess-2023-69</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Clark et al.(2008)Clark, Slater, Rupp, Woods, Vrugt, Gupta, Wagener,
and Hay</label><mixed-citation>
      
Clark, M. P., Slater, A. G., Rupp, D. E., Woods, R. A., Vrugt, J. A., Gupta,
H. V., Wagener, T., and Hay, L. E.: Framework for Understanding
Structural Errors (FUSE): A modular framework to diagnose differences
between hydrological models, Water Resources Research, 44,
<a href="https://doi.org/10.1029/2007WR006735" target="_blank">https://doi.org/10.1029/2007WR006735</a>, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Clark et al.(2011)Clark, Kavetski, and Fenicia</label><mixed-citation>
      
Clark, M. P., Kavetski, D., and Fenicia, F.: Pursuing the method of multiple
working hypotheses for hydrological modeling, Water Resources Research, 47,
<a href="https://doi.org/10.1029/2010WR009827" target="_blank">https://doi.org/10.1029/2010WR009827</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Cover and Thomas(2006)</label><mixed-citation>
      
Cover, T. M. and Thomas, J. A.: Elements of Information Theory, Wiley,
Hoboken, N.J, 2 edn., ISBN 978-0-471-24195-9, 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Coxon et al.(2020a)Coxon, Addor, Bloomfield, Freer, Fry, Hannaford,
Howden, Lane, Lewis, Robinson, Wagener, and Woods</label><mixed-citation>
      
Coxon, G., Addor, N., Bloomfield, J. P., Freer, J., Fry, M., Hannaford, J., Howden, N. J. K., Lane, R., Lewis, M., Robinson, E. L., Wagener, T., and Woods, R.: CAMELS-GB: hydrometeorological time series and landscape attributes for 671 catchments in Great Britain, Earth Syst. Sci. Data, 12, 2459–2483, <a href="https://doi.org/10.5194/essd-12-2459-2020" target="_blank">https://doi.org/10.5194/essd-12-2459-2020</a>, 2020a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Coxon et al.(2020b)</label><mixed-citation>
      
Coxon, G., Addor, N., Bloomfield, J. P.,  Freer, J.,  Fry, M., Hannaford, J., Howden, N. J. K., Lane, R.,  Lewis, M., Robinson, E. L., Wagener, T., and Woods, R.: Catchment attributes and hydro-meteorological timeseries for 671 catchments across Great Britain (CAMELS-GB), NERC Environmental Information Data Centre [data set], <a href="https://doi.org/10.5285/8344e4f3-d2ea-44f5-8afa-86d2987543a9" target="_blank">https://doi.org/10.5285/8344e4f3-d2ea-44f5-8afa-86d2987543a9</a>,
2020b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Craig et al.(2020)Craig, Brown, Chlumsky, Jenkinson, Jost, Lee, Mai,
Serrer, Sgro, Shafii, Snowdon, and Tolson</label><mixed-citation>
      
Craig, J. R., Brown, G., Chlumsky, R., Jenkinson, R. W., Jost, G., Lee, K.,
Mai, J., Serrer, M., Sgro, N., Shafii, M., Snowdon, A. P., and Tolson, B. A.:
Flexible watershed simulation with the Raven hydrological modelling
framework, Environmental Modelling &amp; Software, 129, 104728,
<a href="https://doi.org/10.1016/j.envsoft.2020.104728" target="_blank">https://doi.org/10.1016/j.envsoft.2020.104728</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Dal Molin et al.(2021)Dal Molin, Kavetski, and
Fenicia</label><mixed-citation>
      
Dal Molin, M., Kavetski, D., and Fenicia, F.: SuperflexPy 1.3.0: an open-source Python framework for building, testing, and improving conceptual hydrological models, Geosci. Model Dev., 14, 7047–7072, <a href="https://doi.org/10.5194/gmd-14-7047-2021" target="_blank">https://doi.org/10.5194/gmd-14-7047-2021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>De la Fuente et al.(2024)De la Fuente, Ehsani, Gupta, and
Condon</label><mixed-citation>
      
De la Fuente, L. A., Ehsani, M. R., Gupta, H. V., and Condon, L. E.: Toward interpretable LSTM-based modeling of hydrological systems, Hydrol. Earth Syst. Sci., 28, 945–971, <a href="https://doi.org/10.5194/hess-28-945-2024" target="_blank">https://doi.org/10.5194/hess-28-945-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Ehret et al.(2020)Ehret, van Pruijssen, Bortoli, Loritz, Azmi, and
Zehe</label><mixed-citation>
      
Ehret, U., van Pruijssen, R., Bortoli, M., Loritz, R., Azmi, E., and Zehe, E.: Adaptive clustering: reducing the computational costs of distributed (hydrological) modelling by exploiting time-variable similarity among model elements, Hydrol. Earth Syst. Sci., 24, 4389–4411, <a href="https://doi.org/10.5194/hess-24-4389-2020" target="_blank">https://doi.org/10.5194/hess-24-4389-2020</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Feng et al.(2022)Feng, Liu, Lawson, and
Shen</label><mixed-citation>
      
Feng, D., Liu, J., Lawson, K., and Shen, C.: Differentiable, Learnable,
Regionalized Process‐Based Models With Multiphysical Outputs
can Approach State‐Of‐The‐Art Hydrologic Prediction
Accuracy, Water Resources Research, 58, e2022WR032404,
<a href="https://doi.org/10.1029/2022WR032404" target="_blank">https://doi.org/10.1029/2022WR032404</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Feng et al.(2023)Feng, Beck, Lawson, and
Shen</label><mixed-citation>
      
Feng, D., Beck, H., Lawson, K., and Shen, C.: The suitability of differentiable, physics-informed machine learning hydrologic models for ungauged regions and climate change impact assessment, Hydrol. Earth Syst. Sci., 27, 2357–2373, <a href="https://doi.org/10.5194/hess-27-2357-2023" target="_blank">https://doi.org/10.5194/hess-27-2357-2023</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Feng et al.(2024a)Feng, Beck, de Bruijn, Sahu, Satoh,
Wada, Liu, Pan, Lawson, and Shen</label><mixed-citation>
      
Feng, D., Beck, H., de Bruijn, J., Sahu, R. K., Satoh, Y., Wada, Y., Liu, J., Pan, M., Lawson, K., and Shen, C.: Deep dive into hydrologic simulations at global scale: harnessing the power of deep learning and physics-informed differentiable models (<i>δ</i>HBV-globe1.0-hydroDL), Geosci. Model Dev., 17, 7181–7198, <a href="https://doi.org/10.5194/gmd-17-7181-2024" target="_blank">https://doi.org/10.5194/gmd-17-7181-2024</a>, 2024a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Feng et al.(2024b)Feng, Li, Xu, Wang, Zhang, Wu, Lai,
Zeng, Tong, and Jiang</label><mixed-citation>
      
Feng, J., Li, J., Xu, C., Wang, Z., Zhang, Z., Wu, X., Lai, C., Zeng, Z., Tong,
H., and Jiang, S.: Viewing Soil Moisture Flash Drought Onset
Mechanism and Their Changes Through XAI Lens: A Case Study
in Eastern China, Water Resources Research, 60, e2023WR036297,
<a href="https://doi.org/10.1029/2023WR036297" target="_blank">https://doi.org/10.1029/2023WR036297</a>, 2024b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Fenicia et al.(2011)Fenicia, Kavetski, and
Savenije</label><mixed-citation>
      
Fenicia, F., Kavetski, D., and Savenije, H. H. G.: Elements of a flexible
approach for conceptual hydrological modeling: 1. Motivation and
theoretical development, Water Resources Research, 47,
<a href="https://doi.org/10.1029/2010WR010174" target="_blank">https://doi.org/10.1029/2010WR010174</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Frame et al.(2021)Frame, Kratzert, Raney II, Rahman, Salas, and
Nearing</label><mixed-citation>
      
Frame, J. M., Kratzert, F., Raney II, A., Rahman, M., Salas, F. R., and
Nearing, G. S.: Post-Processing the National Water Model with Long
Short-Term Memory Networks for Streamflow Predictions and Model
Diagnostics, JAWRA Journal of the American Water Resources Association, 57,
885–905, <a href="https://doi.org/10.1111/1752-1688.12964" target="_blank">https://doi.org/10.1111/1752-1688.12964</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Frame et al.(2023)Frame, Kratzert, Gupta, Ullrich, and
Nearing</label><mixed-citation>
      
Frame, J. M., Kratzert, F., Gupta, H. V., Ullrich, P., and Nearing, G. S.: On
strictly enforced mass conservation constraints for modelling the
Rainfall-Runoff process, Hydrological Processes, 37, e14847,
<a href="https://doi.org/10.1002/hyp.14847" target="_blank">https://doi.org/10.1002/hyp.14847</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Gajamannage et al.(2023)Gajamannage, Jayathilake, Park, and
Bollt</label><mixed-citation>
      
Gajamannage, K., Jayathilake, D. I., Park, Y., and Bollt, E. M.: Recurrent
neural networks for dynamical systems: Applications to ordinary
differential equations, collective motion, and hydrological modeling, Chaos:
An Interdisciplinary Journal of Nonlinear Science, 33, 013109,
<a href="https://doi.org/10.1063/5.0088748" target="_blank">https://doi.org/10.1063/5.0088748</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Galatolo et al.(2010)Galatolo, Hoyrup, and
Rojas</label><mixed-citation>
      
Galatolo, S., Hoyrup, M., and Rojas, C.: Effective symbolic dynamics, random
points, statistical behavior, complexity and entropy, Information and
Computation, 208, 23–41, <a href="https://doi.org/10.1016/j.ic.2009.05.001" target="_blank">https://doi.org/10.1016/j.ic.2009.05.001</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Gauch et al.(2021)Gauch, Mai, and Lin</label><mixed-citation>
      
Gauch, M., Mai, J., and Lin, J.: The proper care and feeding of CAMELS: How
limited training data affects streamflow prediction, Environmental Modelling
&amp; Software, 135, 104926, <a href="https://doi.org/10.1016/j.envsoft.2020.104926" target="_blank">https://doi.org/10.1016/j.envsoft.2020.104926</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Gong et al.(2013)Gong, Gupta, Yang, Sricharan, and
Hero III</label><mixed-citation>
      
Gong, W., Gupta, H. V., Yang, D., Sricharan, K., and Hero III, A. O.:
Estimating epistemic and aleatory uncertainties during hydrologic modeling:
An information theoretic approach, Water Resources Research, 49,
2253–2273, <a href="https://doi.org/10.1002/wrcr.20161" target="_blank">https://doi.org/10.1002/wrcr.20161</a>, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Goodfellow et al.(2016)Goodfellow, Bengio, and
Courville</label><mixed-citation>
      
Goodfellow, I., Bengio, Y., and Courville, A.: Deep Learning, Adaptive
computation and machine learning, MIT Press, Cambridge, Massachusetts, ISBN
978-0-262-03561-3,  <a href="http://www.deeplearningbook.org" target="_blank"/> (last access: 20 January 2026), 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Gupta et al.(2008)Gupta, Wagener, and Liu</label><mixed-citation>
      
Gupta, H. V., Wagener, T., and Liu, Y.: Reconciling Theory with Observations:
Elements of a Diagnostic Approach to Model Evaluation, Hydrological
Processes, 22, 3802–3813, <a href="https://doi.org/10.1002/hyp.6989" target="_blank">https://doi.org/10.1002/hyp.6989</a>, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Harper(2015)</label><mixed-citation>
      
Harper, M.: 10 years on from the Cumbrian and Carlisle Floods of 2005,
<a href="https://environmentagency.blog.gov.uk/2015/01/08/10-years-on-from-the-cumbrian-and-carlisle-floods-of-2005" target="_blank"/> (last access: 20 January 2026),
2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Hochreiter and Schmidhuber(1997)</label><mixed-citation>
      
Hochreiter, S. and Schmidhuber, J.: Long Short-Term Memory, Neural
Computation, 9, 1735–1780, <a href="https://doi.org/10.1162/neco.1997.9.8.1735" target="_blank">https://doi.org/10.1162/neco.1997.9.8.1735</a>, 1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Horton et al.(2022)Horton, Schaefli, and Kauzlaric</label><mixed-citation>
      
Horton, P., Schaefli, B., and Kauzlaric, M.: Why do we have so many different
hydrological models? A review based on the case of Switzerland, WIREs
Water, 9, e1574, <a href="https://doi.org/10.1002/wat2.1574" target="_blank">https://doi.org/10.1002/wat2.1574</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Hrachowitz and Clark(2017)</label><mixed-citation>
      
Hrachowitz, M. and Clark, M. P.: HESS Opinions: The complementary merits of competing modelling philosophies in hydrology, Hydrol. Earth Syst. Sci., 21, 3953–3973, <a href="https://doi.org/10.5194/hess-21-3953-2017" target="_blank">https://doi.org/10.5194/hess-21-3953-2017</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Höge et al.(2022)Höge, Scheidegger, Baity-Jesi, Albert, and
Fenicia</label><mixed-citation>
      
Höge, M., Scheidegger, A., Baity-Jesi, M., Albert, C., and Fenicia, F.: Improving hydrologic models for predictions and process understanding using neural ODEs, Hydrol. Earth Syst. Sci., 26, 5085–5102, <a href="https://doi.org/10.5194/hess-26-5085-2022" target="_blank">https://doi.org/10.5194/hess-26-5085-2022</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Jiang et al.(2020)Jiang, Zheng, and
Solomatine</label><mixed-citation>
      
Jiang, S., Zheng, Y., and Solomatine, D.: Improving AI System Awareness
of Geoscience Knowledge: Symbiotic Integration of Physical
Approaches and Deep Learning, Geophysical Research Letters, 47,
e2020GL088229, <a href="https://doi.org/10.1029/2020GL088229" target="_blank">https://doi.org/10.1029/2020GL088229</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Kingma and Ba(2017)</label><mixed-citation>
      
Kingma, D. P. and Ba, J.: Adam: A Method for Stochastic Optimization,
arXiv [preprint], <a href="https://doi.org/10.48550/arXiv.1412.6980" target="_blank">https://doi.org/10.48550/arXiv.1412.6980</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Kirchner(2009)</label><mixed-citation>
      
Kirchner, J. W.: Catchments as simple dynamical systems: Catchment
characterization, rainfall-runoff modeling, and doing hydrology backward,
Water Resources Research, 45, <a href="https://doi.org/10.1029/2008WR006912" target="_blank">https://doi.org/10.1029/2008WR006912</a>, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Kozachenko and Leonenko(1987)</label><mixed-citation>
      
Kozachenko, L. F. and Leonenko, N. N.: A statistical estimate for the entropy
of a random vector, Probl. Inf. Transm., 23, 95–101,
<a href="https://zbmath.org/?q=an:0633.62005" target="_blank"/> (last access: 20 January 2026), 1987.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Kratzert et al.(2018)Kratzert, Klotz, Brenner, Schulz, and
Herrnegger</label><mixed-citation>
      
Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005–6022, <a href="https://doi.org/10.5194/hess-22-6005-2018" target="_blank">https://doi.org/10.5194/hess-22-6005-2018</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>Kratzert et al.(2019a)Kratzert, Herrnegger, Klotz,
Hochreiter, and Klambauer</label><mixed-citation>
      
Kratzert, F., Herrnegger, M., Klotz, D., Hochreiter, S., and Klambauer, G.:
NeuralHydrology – Interpreting LSTMs in Hydrology, in:
Explainable AI: Interpreting, Explaining and Visualizing Deep
Learning, edited by: Samek, W., Montavon, G., Vedaldi, A., Hansen, L. K.,
and Müller, K.-R.,  Springer International Publishing, Cham, 347–362,
ISBN 978-3-030-28954-6, <a href="https://doi.org/10.1007/978-3-030-28954-6_19" target="_blank">https://doi.org/10.1007/978-3-030-28954-6_19</a>,
2019a.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>Kratzert et al.(2019b)Kratzert, Klotz, Herrnegger,
Sampson, Hochreiter, and Nearing</label><mixed-citation>
      
Kratzert, F., Klotz, D., Herrnegger, M., Sampson, A. K., Hochreiter, S., and
Nearing, G. S.: Toward Improved Predictions in Ungauged Basins:
Exploiting the Power of Machine Learning, Water Resources Research,
55, 11344–11354, <a href="https://doi.org/10.1029/2019WR026065" target="_blank">https://doi.org/10.1029/2019WR026065</a>, 2019b.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>Kratzert et al.(2019c)Kratzert, Klotz, Shalev,
Klambauer, Hochreiter, and Nearing</label><mixed-citation>
      
Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, G.: Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089–5110, <a href="https://doi.org/10.5194/hess-23-5089-2019" target="_blank">https://doi.org/10.5194/hess-23-5089-2019</a>, 2019c.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>Kratzert et al.(2024)Kratzert, Gauch, Klotz, and
Nearing</label><mixed-citation>
      
Kratzert, F., Gauch, M., Klotz, D., and Nearing, G.: HESS Opinions: Never train a Long Short-Term Memory (LSTM) network on a single basin, Hydrol. Earth Syst. Sci., 28, 4187–4201, <a href="https://doi.org/10.5194/hess-28-4187-2024" target="_blank">https://doi.org/10.5194/hess-28-4187-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>Kuhn and Hacking(2012)</label><mixed-citation>
      
Kuhn, T. S. and Hacking, I.: The structure of scientific revolutions,
University of Chicago press, Chicago, 4th edn., ISBN 978-0-226-45811-3
978-0-226-45812-0, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>Lees et al.(2021)Lees, Buechel, Anderson, Slater, Reece, Coxon, and
Dadson</label><mixed-citation>
      
Lees, T., Buechel, M., Anderson, B., Slater, L., Reece, S., Coxon, G., and Dadson, S. J.: Benchmarking data-driven rainfall–runoff models in Great Britain: a comparison of long short-term memory (LSTM)-based models with four lumped conceptual models, Hydrol. Earth Syst. Sci., 25, 5517–5534, <a href="https://doi.org/10.5194/hess-25-5517-2021" target="_blank">https://doi.org/10.5194/hess-25-5517-2021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>Lees et al.(2022)Lees, Reece, Kratzert, Klotz, Gauch, De Bruijn,
Kumar Sahu, Greve, Slater, and Dadson</label><mixed-citation>
      
Lees, T., Reece, S., Kratzert, F., Klotz, D., Gauch, M., De Bruijn, J., Kumar Sahu, R., Greve, P., Slater, L., and Dadson, S. J.: Hydrological concept formation inside long short-term memory (LSTM) networks, Hydrol. Earth Syst. Sci., 26, 3079–3101, <a href="https://doi.org/10.5194/hess-26-3079-2022" target="_blank">https://doi.org/10.5194/hess-26-3079-2022</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib60"><label>Li et al.(2023)Li, Sun, Tian, and Ni</label><mixed-citation>
      
Li, B., Sun, T., Tian, F., and Ni, G.: Enhancing Process-Based Hydrological
Models with Embedded Neural Networks: A Hybrid Approach, Journal of
Hydrology, 625, 130107, <a href="https://doi.org/10.1016/j.jhydrol.2023.130107" target="_blank">https://doi.org/10.1016/j.jhydrol.2023.130107</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib61"><label>Loritz et al.(2024)Loritz, Dolich, Acuña Espinoza, Ebeling, Guse,
Götte, Hassler, Hauffe, Heidbüchel, Kiesel, Mälicke, Müller-Thomy,
Stölzle, and Tarasova</label><mixed-citation>
      
Loritz, R., Dolich, A., Acuña Espinoza, E., Ebeling, P., Guse, B., Götte, J., Hassler, S. K., Hauffe, C., Heidbüchel, I., Kiesel, J., Mälicke, M., Müller-Thomy, H., Stölzle, M., and Tarasova, L.: CAMELS-DE: hydro-meteorological time series and attributes for 1582 catchments in Germany, Earth Syst. Sci. Data, 16, 5625–5642, <a href="https://doi.org/10.5194/essd-16-5625-2024" target="_blank">https://doi.org/10.5194/essd-16-5625-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib62"><label>MacKay(2003)</label><mixed-citation>
      
MacKay, D. J. C.: Information theory, inference, and learning algorithms,
Cambridge University Press, Cambridge, UK, New York, ISBN 978-0-521-64298-9,
2003.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib63"><label>Montavon et al.(2018)Montavon, Samek, and
Müller</label><mixed-citation>
      
Montavon, G., Samek, W., and Müller, K.-R.: Methods for interpreting and
understanding deep neural networks, Digital Signal Processing, 73, 1–15,
<a href="https://doi.org/10.1016/j.dsp.2017.10.011" target="_blank">https://doi.org/10.1016/j.dsp.2017.10.011</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib64"><label>Muñoz-Sabater et al.(2021)Muñoz-Sabater, Dutra,
Agustí-Panareda, Albergel, Arduini, Balsamo, Boussetta, Choulga,
Harrigan, Hersbach, Martens, Miralles, Piles, Rodríguez-Fernández,
Zsoter, Buontempo, and Thépaut</label><mixed-citation>
      
Muñoz-Sabater, J., Dutra, E., Agustí-Panareda, A., Albergel, C., Arduini, G., Balsamo, G., Boussetta, S., Choulga, M., Harrigan, S., Hersbach, H., Martens, B., Miralles, D. G., Piles, M., Rodríguez-Fernández, N. J., Zsoter, E., Buontempo, C., and Thépaut, J.-N.: ERA5-Land: a state-of-the-art global reanalysis dataset for land applications, Earth Syst. Sci. Data, 13, 4349–4383, <a href="https://doi.org/10.5194/essd-13-4349-2021" target="_blank">https://doi.org/10.5194/essd-13-4349-2021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib65"><label>Nearing et al.(2020)Nearing, Sampson, Kratzert, and
Frame</label><mixed-citation>
      
Nearing, G. S., Sampson, A. K., Kratzert, F., and Frame, J.: Post-Processing
a Conceptual Rainfall-Runoff Model with an LSTM, Earth ArXiv,
<a href="https://eartharxiv.org/repository/view/122/" target="_blank"/> (last access: 20 January 2026), 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib66"><label>Nearing et al.(2021)Nearing, Kratzert, Sampson, Pelissier, Klotz,
Frame, Prieto, and Gupta</label><mixed-citation>
      
Nearing, G. S., Kratzert, F., Sampson, A. K., Pelissier, C. S., Klotz, D.,
Frame, J. M., Prieto, C., and Gupta, H. V.: What Role Does Hydrological
Science Play in the Age of Machine Learning?, Water Resources
Research, 57, e2020WR028091, <a href="https://doi.org/10.1029/2020WR028091" target="_blank">https://doi.org/10.1029/2020WR028091</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib67"><label>Rackauckas et al.(2021)Rackauckas, Ma, Martensen, Warner, Zubov,
Supekar, Skinner, Ramadhan, and Edelman</label><mixed-citation>
      
Rackauckas, C., Ma, Y., Martensen, J., Warner, C., Zubov, K., Supekar, R.,
Skinner, D., Ramadhan, A., and Edelman, A.: Universal Differential
Equations for Scientific Machine Learning, arxiv [preprint],
<a href="http://arxiv.org/abs/2001.04385" target="_blank"/>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib68"><label>Rahmani et al.(2023)Rahmani, Appling, Feng, Lawson, and
Shen</label><mixed-citation>
      
Rahmani, F., Appling, A., Feng, D., Lawson, K., and Shen, C.: Identifying
Structural Priors in a Hybrid Differentiable Model for Stream
Water Temperature Modeling, Water Resources Research, 59,
e2023WR034420, <a href="https://doi.org/10.1029/2023WR034420" target="_blank">https://doi.org/10.1029/2023WR034420</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib69"><label>Reichert and Mieleitner(2009)</label><mixed-citation>
      
Reichert, P. and Mieleitner, J.: Analyzing input and structural uncertainty of
nonlinear dynamic models with stochastic, time-dependent parameters, Water
Resources Research, 45, <a href="https://doi.org/10.1029/2009WR007814" target="_blank">https://doi.org/10.1029/2009WR007814</a>, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib70"><label>Reichert et al.(2021)Reichert, Ammann, and
Fenicia</label><mixed-citation>
      
Reichert, P., Ammann, L., and Fenicia, F.: Potential and Challenges of Investigating Intrinsic Uncertainty of Hydrological Models With Stochastic,  Time‐Dependent Parameters, Water Resources Research, 57, e2020WR028400, <a href="https://doi.org/10.1029/2020WR028400" target="_blank">https://doi.org/10.1029/2020WR028400</a>,
2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib71"><label>Reichstein et al.(2019)Reichstein, Camps-Valls, Stevens, Jung,
Denzler, Carvalhais, and Prabhat</label><mixed-citation>
      
Reichstein, M., Camps-Valls, G., Stevens, B., Jung, M., Denzler, J.,
Carvalhais, N., and Prabhat: Deep learning and process understanding for
data-driven Earth system science, Nature, 566, 195–204,
<a href="https://doi.org/10.1038/s41586-019-0912-1" target="_blank">https://doi.org/10.1038/s41586-019-0912-1</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib72"><label>Shen et al.(2023)Shen, Appling, Gentine, Bandai, Gupta, Tartakovsky,
Baity-Jesi, Fenicia, Kifer, Li, Liu, Ren, Zheng, Harman, Clark, Farthing,
Feng, Kumar, Aboelyazeed, Rahmani, Song, Beck, Bindas, Dwivedi, Fang, Höge,
Rackauckas, Mohanty, Roy, Xu, and Lawson</label><mixed-citation>
      
Shen, C., Appling, A. P., Gentine, P., Bandai, T., Gupta, H., Tartakovsky, A.,
Baity-Jesi, M., Fenicia, F., Kifer, D., Li, L., Liu, X., Ren, W., Zheng, Y.,
Harman, C. J., Clark, M., Farthing, M., Feng, D., Kumar, P., Aboelyazeed, D.,
Rahmani, F., Song, Y., Beck, H. E., Bindas, T., Dwivedi, D., Fang, K., Höge,
M., Rackauckas, C., Mohanty, B., Roy, T., Xu, C., and Lawson, K.:
Differentiable modelling to unify machine learning and physical models for
geosciences, Nature Reviews Earth &amp; Environment, 4, 552–567,
<a href="https://doi.org/10.1038/s43017-023-00450-9" target="_blank">https://doi.org/10.1038/s43017-023-00450-9</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib73"><label>Song et al.(2024)Song, Knoben, Clark, Feng, Lawson, Sawadekar, and
Shen</label><mixed-citation>
      
Song, Y., Knoben, W. J. M., Clark, M. P., Feng, D., Lawson, K., Sawadekar, K., and Shen, C.: When ancient numerical demons meet physics-informed machine learning: adjoint-based gradients for implicit differentiable modeling, Hydrol. Earth Syst. Sci., 28, 3051–3077, <a href="https://doi.org/10.5194/hess-28-3051-2024" target="_blank">https://doi.org/10.5194/hess-28-3051-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib74"><label>Spieler et al.(2020)Spieler, Mai, Craig, Tolson, and
Schütze</label><mixed-citation>
      
Spieler, D., Mai, J., Craig, J. R., Tolson, B. A., and Schütze, N.: Automatic
Model Structure Identification for Conceptual Hydrologic Models,
Water Resources Research, 56, e2019WR027009, <a href="https://doi.org/10.1029/2019WR027009" target="_blank">https://doi.org/10.1029/2019WR027009</a>,
2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib75"><label>Staudinger et al.(2025)Staudinger, Herzog, Loritz, Houska, Pool,
Spieler, Wagner, Mai, Kiesel, Thober, Guse, and Ehret</label><mixed-citation>
      
Staudinger, M., Herzog, A., Loritz, R., Houska, T., Pool, S., Spieler, D., Wagner, P. D., Mai, J., Kiesel, J., Thober, S., Guse, B., and Ehret, U.: How well do process-based and data-driven hydrological models learn from limited discharge data?, Hydrol. Earth Syst. Sci., 29, 5005–5029, <a href="https://doi.org/10.5194/hess-29-5005-2025" target="_blank">https://doi.org/10.5194/hess-29-5005-2025</a>, 2025.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib76"><label>Tsai et al.(2021)Tsai, Feng, Pan, Beck, Lawson, Yang, Liu, and
Shen</label><mixed-citation>
      
Tsai, W.-P., Feng, D., Pan, M., Beck, H., Lawson, K., Yang, Y., Liu, J., and
Shen, C.: From calibration to parameter learning: Harnessing the scaling
effects of big data in geoscientific modeling, Nature Communications, 12,
5988, <a href="https://doi.org/10.1038/s41467-021-26107-z" target="_blank">https://doi.org/10.1038/s41467-021-26107-z</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib77"><label>Vrugt(2016)</label><mixed-citation>
      
Vrugt, J. A.: Markov chain Monte Carlo simulation using the DREAM
software package: Theory, concepts, and MATLAB implementation,
Environmental Modelling &amp; Software, 75, 273–316,
<a href="https://doi.org/10.1016/j.envsoft.2015.08.013" target="_blank">https://doi.org/10.1016/j.envsoft.2015.08.013</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib78"><label>Wang and Gupta(2024)</label><mixed-citation>
      
Wang, Y.-H. and Gupta, H. V.: A Mass-Conserving-Perceptron for
Machine-Learning-Based Modeling of Geoscientific Systems, Water
Resources Research, 60, e2023WR036461, <a href="https://doi.org/10.1029/2023WR036461" target="_blank">https://doi.org/10.1029/2023WR036461</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib79"><label>Waskom(2021)</label><mixed-citation>
      
Waskom, M.: seaborn: statistical data visualization, Journal of Open Source
Software, 6, 3021, <a href="https://doi.org/10.21105/joss.03021" target="_blank">https://doi.org/10.21105/joss.03021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib80"><label>Weiler and Beven(2015)</label><mixed-citation>
      
Weiler, M. and Beven, K.: Do we need a Community Hydrological Model?,
Water Resources Research, 51, 7777–7784, <a href="https://doi.org/10.1002/2014WR016731" target="_blank">https://doi.org/10.1002/2014WR016731</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib81"><label>Young et al.(1996)Young, Parkinson, and Lees</label><mixed-citation>
      
Young, P., Parkinson, S., and Lees, M.: Simplicity out of complexity in
environmental modelling: Occam's razor revisited, Journal of Applied
Statistics, 23, 165–210, <a href="https://doi.org/10.1080/02664769624206" target="_blank">https://doi.org/10.1080/02664769624206</a>, 1996.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib82"><label>Young and Beven(1994)</label><mixed-citation>
      
Young, P. C. and Beven, K. J.: Data-based mechanistic modelling and the
rainfall-flow non-linearity, Environmetrics, 5, 335–363,
<a href="https://doi.org/10.1002/env.3170050311" target="_blank">https://doi.org/10.1002/env.3170050311</a>, 1994.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib83"><label>Zhang et al.(2025)Zhang, Li, Hu, Shen, Xu, Chen, Chu, and
Li</label><mixed-citation>
      
Zhang, C., Li, H., Hu, Y., Shen, D., Xu, B., Chen, M., Chu, W., and Li, R.: A
Differentiability-Based Processes and Parameters Learning Hydrologic Model
for Advancing Runoff Prediction and Process Understanding, Journal of
Hydrology, 661, 133594, <a href="https://doi.org/10.1016/j.jhydrol.2025.133594" target="_blank">https://doi.org/10.1016/j.jhydrol.2025.133594</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib84"><label>Álvarez Chaves et al.(2024)Álvarez Chaves, Gupta, Ehret, and
Guthke</label><mixed-citation>
      
Álvarez Chaves, M., Gupta, H. V., Ehret, U., and Guthke, A.: On the Accurate
Estimation of Information-Theoretic Quantities from
Multi-Dimensional Sample Data, Entropy, 26, 387,
<a href="https://doi.org/10.3390/e26050387" target="_blank">https://doi.org/10.3390/e26050387</a>, 2024.

    </mixed-citation></ref-html>--></article>
