<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">HESS</journal-id><journal-title-group>
    <journal-title>Hydrology and Earth System Sciences</journal-title>
    <abbrev-journal-title abbrev-type="publisher">HESS</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Hydrol. Earth Syst. Sci.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1607-7938</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/hess-30-3497-2026</article-id><title-group><article-title>Testing discharge assimilation strategies to enhance short-range AI-based operational rainfall–runoff forecasts</article-title><alt-title>Discharge assimilation strategies to enhance operational AI-based forecasts</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Saint-Fleur</surname><given-names>Bob E.</given-names></name>
          <email>bob.saint-fleur@univ-eiffel.fr</email>
        <ext-link>https://orcid.org/0000-0002-4774-7799</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Gaume</surname><given-names>Eric</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-7260-9793</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Surmont</surname><given-names>Florian</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Akil</surname><given-names>Nicolas</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Theriez</surname><given-names>Dominique</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>GERS-EE, Université Gustave Eiffel, Allée des Ponts et Chaussées, 44344 Bouguenais, France</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Aquasys Entreprise, 2 rue de Nantes, 44710 Port-Saint-Père, France</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Bob E. Saint-Fleur (bob.saint-fleur@univ-eiffel.fr)</corresp></author-notes><pub-date><day>11</day><month>June</month><year>2026</year></pub-date>
      
      <volume>30</volume>
      <issue>11</issue>
      <fpage>3497</fpage><lpage>3527</lpage>
      <history>
        <date date-type="received"><day>29</day><month>August</month><year>2025</year></date>
           <date date-type="rev-request"><day>30</day><month>September</month><year>2025</year></date>
           <date date-type="rev-recd"><day>27</day><month>March</month><year>2026</year></date>
           <date date-type="accepted"><day>18</day><month>May</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Bob E. Saint-Fleur et al.</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026.html">This article is available from https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026.html</self-uri><self-uri xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026.pdf">The full text article is available as a PDF file from https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e123">Effective discharge forecasts are essential in operational hydrology. The accuracy of such forecasts, particularly in short lead times, is generally increased through the integration of recent measurements of observed discharge; commonly known as discharge assimilation (DA). Recent studies have demonstrated the effectiveness of deep learning (DL) approaches for rainfall–runoff (RR) modeling, particularly Long Short-Term Memory (LSTM) networks,  outperforming traditional approaches. However, most of these studies do not include DA procedures, which may limit their operational forecast performance. This study suggests and evaluates three DA strategies that incorporate discharge from either recent discharge measurements or forecasts from a pre-trained rainfall–runoff model. The proposed strategies, based on a Multilayer Perceptron (MLP) as orchestrator, include: (1) the integration of recently observed discharges, (2) the integration of both recent discharge observations and pre-trained model forecasts, and (3) the post-processing of model forecast errors. Experiments are implemented using two large datasets, CAMELS-US and CAMELS-FR, and two established benchmark models (BM): the trained LSTM model from Kratzert et al. (2019) and the conceptual Sacramento Soil Moisture Accounting (SAC-SMA) model from Newman et al. (2017), covering both deep learning and conceptual RR simulation approaches. The considered lead times range from 1 to 7 d, covering both short- and mid-term horizons. The approaches are evaluated within two forecast frameworks: (1) perfect meteorological forecasts over the forecasting lead time and (2) ensemble meteorological forecasts. The two frameworks yield contrasting outcomes. When evaluated under the perfect forecast framework, the application of DA leads to substantial improvements in forecast performance, although the magnitude of these gains depends on the initial performance of the benchmark models and the forecasting lead time. Improvements are consistently significant for the SAC-SMA cases, while for the LSTM cases, gains are observed mainly for basins where the LSTM initially underperforms. However, the ensemble forecast evaluation yields unexpected results: the performance ranking of the tested models changes markedly compared to the perfect forecast framework. The LSTM model, in particular, appears penalized by the under-dispersion of its forecast ensembles. Although this underdispersion could be partly attributable to the underdispersion of the forecast archives tested, it persists even when the model is driven by the high spread climatology-based ensemble. This finding underscores the importance of ensuring reliable ensemble dispersion for the efficient operational deployment of AI-based hydrological forecasts.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>Agence Nationale de la Recherche</funding-source>
<award-id>ANR-24-LCV2-0015-01</award-id>
<award-id>DOS0231020/00</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d2e135">Discharge forecasting models are essential in operational hydrology, whether for water resource or related-risk management. Their importance is set to increase as climate-related threats intensify <xref ref-type="bibr" rid="bib1.bibx57 bib1.bibx49 bib1.bibx53" id="paren.1"/>. However, providing accurate discharge forecasts remains challenging due to the complexity of rainfall–runoff (RR) processes, model imperfections, and uncertainty in input data, particularly regarding the quality of weather forecasts.</p>
      <p id="d2e141">Over decades, significant efforts have been made to address the challenges of hydrological modeling, leading to the development of various models and approaches. In the era of artificial intelligence (AI), notable advances have been achieved, with recent studies demonstrating the outstanding performance of deep learning models (DL) relative to traditional RR models <xref ref-type="bibr" rid="bib1.bibx33 bib1.bibx27" id="paren.2"/>. Commonly used DL architectures include multilayer perceptrons (MLPs) <xref ref-type="bibr" rid="bib1.bibx28 bib1.bibx56" id="paren.3"/>, recurrent neural networks (RNNs) such as Long Short-Term Memory (LSTM) networks <xref ref-type="bibr" rid="bib1.bibx32 bib1.bibx33 bib1.bibx17 bib1.bibx70 bib1.bibx52" id="paren.4"/>, and more recently, Transformers <xref ref-type="bibr" rid="bib1.bibx51" id="paren.5"/>. Nonetheless, most hydrological models in the literature are evaluated mainly under perfect weather scenarios, which may overestimate their performance in an operational forecasting framework. Although simulation models can be incorporated into forecasting systems, either as assimilable data or driven by forecasted forcings, their development frequently overlooks key components such as discharge assimilation, persistence analysis, and ensemble (probabilistic) assessment.</p>
      <p id="d2e156">Persistence analysis, introduced by <xref ref-type="bibr" rid="bib1.bibx30" id="text.6"/>, evaluates a model's performance relative to a naive baseline, which simply translates the current observation to the target lead time. This analysis, which serves as a relevant benchmark for assessing the predictive ability of models, is rarely considered in most hydrological modeling studies. Discharge assimilation (DA), on the other hand, which consists of dynamically providing real-time discharge data to a running forecast model, is essential in operational forecasting <xref ref-type="bibr" rid="bib1.bibx7 bib1.bibx6 bib1.bibx50" id="paren.7"/>. By ensuring regular updates of the model states, DA allows one to reduce the impact of uncertainties associated with meteorological forecasts and model structures, thus keeping the model aligned with evolving hydrological conditions. Several DA techniques exist, and their efficacy often depends on the reliability of the underlying model and/or the techniques used <xref ref-type="bibr" rid="bib1.bibx18 bib1.bibx42 bib1.bibx72" id="paren.8"/>. For direct DA strategies, the importance of DA is typically more pronounced at shorter lead times. However, suboptimal models may over-rely on the assimilated discharge data, which may overshadow the contribution of the forcings, leading towards naive models <xref ref-type="bibr" rid="bib1.bibx55" id="paren.9"/>. Thus, DA methods can improve the operational application of RR forecasting models but are not straightforward to calibrate and implement efficiently.</p>
      <p id="d2e171">In the following, two benchmark models are considered to evaluate the added value of DA procedures: the regional LSTM model of <xref ref-type="bibr" rid="bib1.bibx33" id="text.10"/> and the basin-specific conceptual SAC-SMA model from <xref ref-type="bibr" rid="bib1.bibx43" id="text.11"/>. Three different discharge assimilation (DA) strategies that take into account past observed discharges to generate forecasts will be tested. For simpler implementation, including time and resource efficiency, a MultiLayer Perceptron (MLP) network <xref ref-type="bibr" rid="bib1.bibx54" id="paren.12"/> is used as the orchestrator in these DA methods. MLP networks have been largely adopted over recent decades <xref ref-type="bibr" rid="bib1.bibx68 bib1.bibx67" id="paren.13"/>, and several studies have shown their effectiveness in RR modeling <xref ref-type="bibr" rid="bib1.bibx4 bib1.bibx44 bib1.bibx28" id="paren.14"/>. Although recent studies have demonstrated the superior performance of models such as LSTM <xref ref-type="bibr" rid="bib1.bibx32" id="paren.15"/> networks or transformers <xref ref-type="bibr" rid="bib1.bibx35" id="paren.16"/>, MLPs have been used in this study not only as a forecasting orchestrator but also as an alternative for RR modeling due to the relative simplicity of their implementation. Therefore, as a possible future work, the hereby developed MLP can be involved in a comparison with other classical “data assimilation” techniques, such as the Ensemble Kalman Filter <xref ref-type="bibr" rid="bib1.bibx12" id="paren.17"/>.</p>
      <p id="d2e200">As discharge assimilation procedures generally lose effectiveness at extended lead times, forecasts are evaluated at both short- and mid-term horizons. These lead times are defined with respect to the basin response times, estimated based on a rainfall-discharge cross-correlation analysis. To ensure operational relevance and reflect real-world forecasting practices, two scenarios are considered for the weather forecast data: (1) assuming weather forecasts are perfect; (2) using ensemble-based forecasts. Accordingly, forecast performance is assessed using both deterministic and probabilistic metrics.</p>
      <p id="d2e203">The experiments are based on two widely used large-scale hydrometeorological datasets, CAMELS-US <xref ref-type="bibr" rid="bib1.bibx1" id="paren.18"/> and CAMELS-FR <xref ref-type="bibr" rid="bib1.bibx16" id="paren.19"/>. Ensemble-based forecasts are obtained using historical meteorological observations, hindcast products, and forecast archives from the ECMWF platform.</p>
      <p id="d2e212">This paper is structured as follows: Sect. <xref ref-type="sec" rid="Ch1.S2"/> introduces the data and methods, Sect. <xref ref-type="sec" rid="Ch1.S2.SS1"/> presents the datasets, and Sect. <xref ref-type="sec" rid="Ch1.S2.SS2"/> to <xref ref-type="sec" rid="Ch1.S2.SS5"/> present the DA strategies, forecasting approaches, evaluation metrics, and experimental design. The results for the deterministic and ensemble forecasts are successively presented and discussed in Sect. <xref ref-type="sec" rid="Ch1.S3"/>, followed by Sect. <xref ref-type="sec" rid="Ch1.S4"/> with an extension of the analysis to the French basins and using more recent forecast products. Section <xref ref-type="sec" rid="Ch1.S5"/> presents the main conclusions.</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e232">Train-test-validation split for both CAMELS-US and CAMELS-FR datasets. The test set (yellow), training set (blue), validation set (salmon), and combined training&amp;validation set (marron) are indicated; the latter corresponds to training performed using cross-validation. MLP<sup>*</sup> denotes the orchestrator used for discharge assimilation. Note that the entire modeling process of the DA strategies is carried out exclusively on the test period of the initial LSTM (or the benchmark) models.</p></caption>
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f01.png"/>

      </fig>

<table-wrap id="T1" specific-use="star"><label>Table 1</label><caption><p id="d2e253">Used features from the two datasets.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="6">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:colspec colnum="6" colname="col6" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Type</oasis:entry>
         <oasis:entry colname="col2">Variables</oasis:entry>
         <oasis:entry colname="col3">Description</oasis:entry>
         <oasis:entry colname="col4">Unit</oasis:entry>
         <oasis:entry colname="col5">CAMELS-US</oasis:entry>
         <oasis:entry colname="col6">CAMELS-FR</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Forcings</oasis:entry>
         <oasis:entry colname="col2">PET</oasis:entry>
         <oasis:entry colname="col3">Potential Evapotranspiration</oasis:entry>
         <oasis:entry colname="col4">mm d<sup>−1</sup></oasis:entry>
         <oasis:entry colname="col5">x</oasis:entry>
         <oasis:entry colname="col6">x</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">PRCP</oasis:entry>
         <oasis:entry colname="col3">Rainfall</oasis:entry>
         <oasis:entry colname="col4">mm/day</oasis:entry>
         <oasis:entry colname="col5">x</oasis:entry>
         <oasis:entry colname="col6">x</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">SRAD</oasis:entry>
         <oasis:entry colname="col3">Incident Solar radiation</oasis:entry>
         <oasis:entry colname="col4">W m<sup>−2</sup></oasis:entry>
         <oasis:entry colname="col5">x</oasis:entry>
         <oasis:entry colname="col6">x</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Tmax</oasis:entry>
         <oasis:entry colname="col3">Daily maximum temperature</oasis:entry>
         <oasis:entry colname="col4">°C</oasis:entry>
         <oasis:entry colname="col5">x</oasis:entry>
         <oasis:entry colname="col6">x</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Tmin</oasis:entry>
         <oasis:entry colname="col3">Daily minimum temperature</oasis:entry>
         <oasis:entry colname="col4">°C</oasis:entry>
         <oasis:entry colname="col5">x</oasis:entry>
         <oasis:entry colname="col6">x</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Vp</oasis:entry>
         <oasis:entry colname="col3">Vapor Pressure</oasis:entry>
         <oasis:entry colname="col4">Pa</oasis:entry>
         <oasis:entry colname="col5">x</oasis:entry>
         <oasis:entry colname="col6">x</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Target variable</oasis:entry>
         <oasis:entry colname="col2">Q.OBS</oasis:entry>
         <oasis:entry colname="col3">Observed discharge</oasis:entry>
         <oasis:entry colname="col4">mm d<sup>−1</sup></oasis:entry>
         <oasis:entry colname="col5">USGS</oasis:entry>
         <oasis:entry colname="col6">x</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Model outputs</oasis:entry>
         <oasis:entry colname="col2">Q.SAC</oasis:entry>
         <oasis:entry colname="col3">SAC-SMA simulated discharge</oasis:entry>
         <oasis:entry colname="col4">mm d<sup>−1</sup></oasis:entry>
         <oasis:entry colname="col5">
                  <xref ref-type="bibr" rid="bib1.bibx43" id="text.20"/>
                </oasis:entry>
         <oasis:entry colname="col6">–</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Q.LSTM</oasis:entry>
         <oasis:entry colname="col3">LSTM simulated discharge</oasis:entry>
         <oasis:entry colname="col4">mm d<sup>−1</sup></oasis:entry>
         <oasis:entry colname="col5">
                  <xref ref-type="bibr" rid="bib1.bibx33" id="text.21"/>
                </oasis:entry>
         <oasis:entry colname="col6">Current study</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Materials and methods</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Dataset</title>
      <p id="d2e571">The CAMELS-US dataset <xref ref-type="bibr" rid="bib1.bibx1" id="paren.22"/> consists of basin-averaged hydrometeorological time series, catchment attributes, and daily streamflow observations from the United States Geological Survey (USGS) for 671 catchments across the Contiguous United States (CONUS). The meteorological forcings are available from Daymet, NLDAS, and Maurer sources. CAMELS-FR provides the same types of data for French catchments, of which a subset of 338 basins is considered in this study. As this study builds upon the benchmark works of <xref ref-type="bibr" rid="bib1.bibx33" id="text.23"/> and <xref ref-type="bibr" rid="bib1.bibx43" id="text.24"/>, hereafter referred to as LSTM and SAC-SMA, it is limited to the same subset of 531 basins, the Maurer forcings, and the 1989–2008 period used in these previous works using the CAMELS-US dataset. For the CAMELS-FR dataset, an LSTM has been developed from scratch under the same approach as in <xref ref-type="bibr" rid="bib1.bibx33" id="text.25"/>, then considered an equivalent benchmark. The usage of these variables is summarized in Table <xref ref-type="table" rid="T1"/>.</p>
      <p id="d2e588">The added value of the proposed DA strategies is evaluated for two types of RR models: (a) the LSTM proposed in <xref ref-type="bibr" rid="bib1.bibx33" id="text.26"/>, which was trained regionally and incorporates basin-specific static attributes, and (b) the conceptual global model SAC-SMA from <xref ref-type="bibr" rid="bib1.bibx43" id="text.27"/>. As in <xref ref-type="bibr" rid="bib1.bibx33" id="text.28"/>, the SAC-SMA model has been chosen as a reference to illustrate the performance of conceptual RR models, which remain widely used for operational discharge forecasting.</p>
      <p id="d2e600">The train-test-validation split is illustrated in Fig. <xref ref-type="fig" rid="F1"/>. It depicts how the data is divided for training, validation, and evaluation of the models. While the splitting of the initial models is mainly shown for reporting purposes, it provides a clear view of how the data are positioned for the tested DA strategies.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Discharge assimilation procedures</title>
      <p id="d2e613">As outlined in Fig. <xref ref-type="fig" rid="F2"/> and described in Eqs. (<xref ref-type="disp-formula" rid="Ch1.E1"/>) to (<xref ref-type="disp-formula" rid="Ch1.E5"/>), three discharge assimilation procedures are tested, integrating either recent discharge measurements or simulations from the two RR models and using MLP as the orchestrator.</p>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e624">Discharge assimilation set-up: DA1, MLP Alone; DA2,MLP fed with RR model forecasts (MLP <inline-formula><mml:math id="M7" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> LSTM or MLP <inline-formula><mml:math id="M8" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> SAC <inline-formula><mml:math id="M9" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula> SMA); DA3, Post-treatment of RR forecasting errors  noted as LSTM_eCorr and SAC-SMA_eCorr.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f02.png"/>

        </fig>

      <p id="d2e654"><list list-type="order">
            <list-item>

      <p id="d2e659">DA-1: direct forecast of discharges <inline-formula><mml:math id="M10" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> over the forecast horizon <inline-formula><mml:math id="M11" display="inline"><mml:mrow><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula> with an MLP, fed with the past observed discharges <inline-formula><mml:math id="M12" display="inline"><mml:mrow><mml:msup><mml:mi>Q</mml:mi><mml:mi mathvariant="normal">o</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, observed meteorological variables <inline-formula><mml:math id="M13" display="inline"><mml:mrow><mml:msup><mml:mi>X</mml:mi><mml:mi mathvariant="normal">o</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula>, as well as meteorological forecasts <inline-formula><mml:math id="M14" display="inline"><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:math></inline-formula> (see Eq. <xref ref-type="disp-formula" rid="Ch1.E1"/>).

                  <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M15" display="block"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msubsup><mml:mi>Q</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>p</mml:mi><mml:mo>:</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mi mathvariant="normal">o</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>n</mml:mi><mml:mo>:</mml:mo><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>X</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>n</mml:mi><mml:mo>:</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mi mathvariant="normal">o</mml:mi></mml:msubsup></mml:mrow></mml:mfenced></mml:mrow></mml:math></disp-formula></p>
            </list-item>
            <list-item>

      <p id="d2e816">DA-2: the same approach as in DA-1 but with the forecasts of the RR model <inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:msup><mml:mi>Q</mml:mi><mml:mi mathvariant="normal">s</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> (either SAC-SMA or LSTM) as additional input variables (see Eq. <xref ref-type="disp-formula" rid="Ch1.E2"/>).

                  <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M17" display="block"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mfenced open="(" close=")"><mml:mrow><mml:msubsup><mml:mi>Q</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>p</mml:mi><mml:mo>:</mml:mo><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow><mml:mi mathvariant="normal">s</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>Q</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>p</mml:mi><mml:mo>:</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mi mathvariant="normal">o</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>n</mml:mi><mml:mo>:</mml:mo><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>X</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>:</mml:mo><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>n</mml:mi></mml:mrow><mml:mi mathvariant="normal">o</mml:mi></mml:msubsup></mml:mrow></mml:mfenced></mml:mrow></mml:math></disp-formula></p>
            </list-item>
            <list-item>

      <p id="d2e946">DA-3: post-processing of the prediction errors of the RR model <inline-formula><mml:math id="M18" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ε</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (again SAC-SMA or LSTM). In this strategy, the orchestrator is used to forecast the errors (<inline-formula><mml:math id="M19" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi mathvariant="italic">ε</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) of the RR model over the horizon <inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula> and the prediction errors are then added to the forecasts of the RR model. The assimilation procedure then proceeds in three steps (see Eqs. <xref ref-type="disp-formula" rid="Ch1.E3"/>, <xref ref-type="disp-formula" rid="Ch1.E4"/>, and <xref ref-type="disp-formula" rid="Ch1.E5"/>).

                      <disp-formula specific-use="align" content-type="numbered"><mml:math id="M21" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E3"><mml:mtd><mml:mtext>3</mml:mtext></mml:mtd><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mi mathvariant="italic">ε</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mi>Q</mml:mi><mml:mi>t</mml:mi><mml:mi mathvariant="normal">o</mml:mi></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mi>Q</mml:mi><mml:mi>t</mml:mi><mml:mi mathvariant="normal">s</mml:mi></mml:msubsup></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E4"><mml:mtd><mml:mtext>4</mml:mtext></mml:mtd><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:msub><mml:mover accent="true"><mml:mi mathvariant="italic">ε</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi mathvariant="italic">ε</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>p</mml:mi><mml:mo>:</mml:mo><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>Q</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>p</mml:mi><mml:mo>:</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mi mathvariant="normal">o</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>X</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>n</mml:mi><mml:mo>:</mml:mo><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>X</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>n</mml:mi><mml:mo>:</mml:mo><mml:mi>t</mml:mi></mml:mrow><mml:mi mathvariant="normal">o</mml:mi></mml:msubsup></mml:mrow></mml:mfenced></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E5"><mml:mtd><mml:mtext>5</mml:mtext></mml:mtd><mml:mtd><mml:mstyle class="stylechange" displaystyle="true"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mi>Q</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow><mml:mi mathvariant="normal">s</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mover accent="true"><mml:mi mathvariant="italic">ε</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula></p>
            </list-item>
          </list></p>
      <p id="d2e1200">In the previous equations, <inline-formula><mml:math id="M22" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M23" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> are the sequence lengths for the forcing and the assimilated discharge. These values will be fixed based on the mean response time of the basins using a RR cross-correlation analysis; see Fig. <xref ref-type="fig" rid="F5"/>. As suggested in <xref ref-type="bibr" rid="bib1.bibx55" id="text.29"/>, to prevent the models from relying disproportionately on assimilated discharge rather than forcing, we imposed <inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>≥</mml:mo><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d2e1235">In summary, seven (7) different model configurations are compared: the five (5) DA procedures (unfolded from DA1, DA2, DA3) presented in this section, plus the two (2) direct forecasts from both pre-trained models, SAC-SMA and LSTM, which serve as benchmarks to evaluate the efficiency of the tested DA strategies. The direct forecasts from the benchmark models were assumed to be unchanged for the tested lead time; therefore, no further training was necessary.</p>
      <p id="d2e1238">In all the considered DA strategies and for each basin, the MLPs were trained (i.e., calibrated) 20 times, accounting for the random initialization (seeds) of their parameter values, leading to 20 different possible trained models. Likewise, 8 seeds have been considered for the LSTM and 10 for the SAC-SMA model. This aims to account for the uncertainties and variability induced by model initialization during training. The DA strategies are trained based on the series of mean simulated values of both benchmark models (SAC-SMA and LSTM). The predictions thus consist of an ensemble of 20 runs for the DA strategies and 8 and 10 runs for the LSTM and SAC-SMA benchmark forecasts without assimilation, respectively. The performances of the ensemble simulations (dispersed by random initialization) are analyzed based on their mean values in the first part of this paper (Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/>).</p>
</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Forecasting setup and forecast products</title>
<sec id="Ch1.S2.SS3.SSS1">
  <label>2.3.1</label><title>Forecasting setup</title>
      <p id="d2e1258">In this study, the explored lead times range from 1 to 7 d. As illustrated in Eqs. (<xref ref-type="disp-formula" rid="Ch1.E1"/>), (<xref ref-type="disp-formula" rid="Ch1.E2"/>), and (<xref ref-type="disp-formula" rid="Ch1.E5"/>), the input feature selection for forecasting models incorporating discharge assimilation may be affected by the lead time (<inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:math></inline-formula>). In most feedforward architectures, a separate model is calibrated for each lead time. The alternative which consists of calibrating a single model across the entire range of lead times, either jointly or recursively, is generally inefficient, as it substantially amplifies the forecast uncertainty <xref ref-type="bibr" rid="bib1.bibx11 bib1.bibx61 bib1.bibx36" id="paren.30"/>. This behavior has also been observed in the present study (results not presented herein). It is also worth noting that single-step models may not guaranty continuity of the outputs through successive lead times.</p>
      <p id="d2e1280">The forecasting framework is summarized in Fig. <xref ref-type="fig" rid="F3"/>, which illustrates how past observations, assimilated discharge, and forecasted forcing data are integrated. The implementation in DA1 and DA2 procedures is straightforward for both the <italic>perfect</italic> and <italic>ensemble</italic> forecast strategies. However, for DA3 under the ensemble scenario, the corrected quantity corresponds to the forecast member <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> for which the forecasted error <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi mathvariant="italic">ε</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow><mml:mi>i</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> (in Eq. <xref ref-type="disp-formula" rid="Ch1.E4"/>) is issued, where <inline-formula><mml:math id="M28" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> indicates the forecast member.</p>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e1349">Forecasting assumptions setup.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f03.png"/>

          </fig>

      <p id="d2e1359">All the proposed DA strategies are trained using the <italic>perfect weather forecast</italic> configuration and then evaluated under both the <bold>perfect</bold> and the <italic>ensemble-based</italic> forecast conditions. The ensemble forecast evaluation is conducted using three sources of meteorological forcing: (1) a no-skill ensemble generated from past observations using a date-to-date sampling strategy, referred to as “Climatology”; (2) hindcast (reforecast) products; and (3) real-time forecast archives provided by the European Centre for Medium-Range Weather Forecasts (ECMWF). Hindcasts correspond to retrospective forecasts produced for past dates to establish a stable statistical reference for ensemble analysis, whereas real-time forecasts are operational predictions issued daily for current and future conditions. Their preparation for this paper is described in Sect. <xref ref-type="sec" rid="Ch1.S2.SS3.SSS2"/>.</p>
</sec>
<sec id="Ch1.S2.SS3.SSS2">
  <label>2.3.2</label><title>Hindcasts, forecast archives and the climatology approach</title>
      <p id="d2e1381">The operational evaluation is implemented using the sub-seasonal to seasonal (S2S) dataset <xref ref-type="bibr" rid="bib1.bibx65" id="paren.31"/>, developed through a joint initiative project of the <italic>World Weather Research Programme</italic> (WWRP) and the <italic>World Climate Research Programme</italic> (WCRP). At the time of writing this paper, the S2S database is hosted at ECMWF as an extension of the TIGGE archive. Overall, two forecast products are used: hindcast and real-time forecast archives. Since we evaluated the benchmark models (LSTM and SACSMA) over the 1989–1991 period, only hindcast-based evaluation is implemented on the CAMELS-US dataset because real-time archives are not provided for that period. Consequently, the hindcast product used is from the Bureau of Meteorology (BoM) database <xref ref-type="bibr" rid="bib1.bibx25" id="paren.32"/>. Nevertheless, to complement this analysis, the ensemble evaluation has been extended to the french basins using the CAMELS-FR dataset <xref ref-type="bibr" rid="bib1.bibx16" id="paren.33"/>. This extension was specifically implemented on the two main DA approaches tested (LSTM and DA1), using both hindcast and forecast archives for the recent period of 2018–2021. On the ECMWF data portal (<uri>https://apps.ecmwf.int/datasets/data/s2s-realtime-instantaneous-accum-ecmf/levtype=sfc/type=cf/</uri>, last access: 10 February 2026), BoM and ECMWF forecasts are provided as separate products, allowing the use of both hindcast products and real-time forecast archives.</p>
      <p id="d2e1403">For the present analysis, the perturbed forecast from the BoM dataset was retrieved for up to 7 d of lead time, with all its 32 members. The same method was applied to gather the ECMWF forecast archives (50 members) and hindcast (10 members) products. It is worth noting that these open data are available mostly for 6 to 8 d a month.</p>
      <p id="d2e1406">We also implement the “Climatology” approach as a baseline, which represents the simplest alternative to archived weather forecasts. It is constructed by resampling historical meteorological observations. Although more sophisticated sampling strategies could be implemented, for example, by selecting periods of similar hydrological conditions <xref ref-type="bibr" rid="bib1.bibx24" id="paren.34"/>, the present study adopts a simple date-to-date sampling strategy. For a current date (<inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>) within the evaluation period (1989–1991 for CAMELS-US or 2018–2021 for CAMELS-FR), the sequence spanning the lead time (<inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>:</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mn mathvariant="normal">0</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) is defined. The same calendar sequence (day and month) is then extracted for each complete year in the remaining period (1991–2008 or 1989–2017), generating 18 ensemble members for the CAMELS-US cases and 29 members for the CAMELS-FR. This approach constitutes a typical <italic>no-skill</italic> or <italic>poor-man's</italic> ensemble, as its construction does not explicitly account for the predictability of non-periodic variables such as rainfall data. Nevertheless, it is conceptually similar to the Ensemble Prediction (ESP) framework introduced by <xref ref-type="bibr" rid="bib1.bibx15" id="text.35"/> and widely used in previous studies <xref ref-type="bibr" rid="bib1.bibx24 bib1.bibx14" id="paren.36"/>.</p>
      <p id="d2e1461">At the other end of the evaluation spectrum, the “Perfect forecast” configuration is also implemented. In this case, the forecasted meteorological variables are assumed to be equal to the actual observed values at the corresponding future lead time in the evaluation year. This configuration is particularly useful for estimating the theoretical upper bound of the performance of the models. Overall, four forecast configurations are considered in this study: <italic>Perfect</italic> Mode, <italic>Climatology</italic> Mode, <italic>Hindcast</italic> Mode, and <italic>Real-time Forecast</italic> Mode based on meteorological forecast archives.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS4">
  <label>2.4</label><title>Evaluation metrics</title>
      <p id="d2e1485">Numerous metrics are proposed in the literature to evaluate the skills of hydrometeorological forecasting models <xref ref-type="bibr" rid="bib1.bibx40 bib1.bibx58 bib1.bibx8 bib1.bibx34 bib1.bibx21 bib1.bibx48" id="paren.37"/>: evaluating the efficiency for deterministic and ensemble predictions, as well as reliability and resolution for ensemble predictions <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx59" id="paren.38"/>. The selected evaluation metrics are presented below.</p>
<sec id="Ch1.S2.SS4.SSS1">
  <label>2.4.1</label><title>Forecasting efficiency</title>
      <p id="d2e1501">The <italic>efficiency</italic> is a measure of the proximity between the observed values <inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and the predicted values <inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. The commonly used metrics for deterministic forecasts are based on the sum of squared errors: Nash–Sutcliffe Efficiency (NSE), Eq. (<xref ref-type="disp-formula" rid="Ch1.E6"/>) <xref ref-type="bibr" rid="bib1.bibx41" id="paren.39"/>, the Kling–Gupta Efficiency (KGE) <xref ref-type="bibr" rid="bib1.bibx19" id="paren.40"/>, and the Persistency Criterion (PERS), Eq. (<xref ref-type="disp-formula" rid="Ch1.E7"/>) <xref ref-type="bibr" rid="bib1.bibx30 bib1.bibx13 bib1.bibx2" id="paren.41"/>.

                  <disp-formula specific-use="align" content-type="numbered"><mml:math id="M33" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="Ch1.E6"><mml:mtd><mml:mtext>6</mml:mtext></mml:mtd><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi mathvariant="normal">NSE</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="Ch1.E7"><mml:mtd><mml:mtext>7</mml:mtext></mml:mtd><mml:mtd><mml:mstyle displaystyle="true" class="stylechange"/></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="normal">PERS</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula>

            NSE and PERS are scores that measure the proportion of the sum of square errors of an unskilled model explained by the calibrated (or trained) forecasting model. The unskilled benchmark model for NSE is the trivial mean model (<inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula>), and for PERS the persistent model (<inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mi>h</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>). Both criteria range from 1 (perfect model) to <inline-formula><mml:math id="M36" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula>. A negative value indicates that the model produces higher errors and, consequently, performs worse than the unskilled benchmark models. It should be noted that it is more difficult to achieve a positive PERS than a positive NSE, particularly at short lead times.</p>
      <p id="d2e1787">For ensemble forecasts, the Continuous Ranked Probability Score (CRPS), Eq. (<xref ref-type="disp-formula" rid="Ch1.E8"/>) <xref ref-type="bibr" rid="bib1.bibx23 bib1.bibx38" id="paren.42"/>, is commonly used.

              <disp-formula id="Ch1.E8" content-type="numbered"><label>8</label><mml:math id="M37" display="block"><mml:mtable rowspacing="0.2ex" class="split" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:mi mathvariant="normal">CRPS</mml:mi></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>T</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:msub><mml:mi mathvariant="normal">CRPS</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mi mathvariant="normal">with</mml:mi><mml:mspace linebreak="nobreak" width="0.25em"/><mml:msub><mml:mi mathvariant="normal">CRPS</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∫</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow><mml:mi mathvariant="normal">∞</mml:mi></mml:munderover><mml:msup><mml:mfenced open="[" close="]"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:msub><mml:mn mathvariant="bold">1</mml:mn><mml:mrow><mml:mfenced open="{" close="}"><mml:mrow><mml:mi>y</mml:mi><mml:mo>≥</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:mfenced></mml:mrow></mml:msub></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mi>y</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

            where, for time step <inline-formula><mml:math id="M38" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>, <inline-formula><mml:math id="M39" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the cumulative distribution of the ensemble forecasts, <inline-formula><mml:math id="M40" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the observed value, <inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the predicted value, <inline-formula><mml:math id="M42" display="inline"><mml:mover accent="true"><mml:mi>Q</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:math></inline-formula> is the time average of the observed values, and <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:msub><mml:mn mathvariant="bold">1</mml:mn><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:mi>y</mml:mi><mml:mo>≥</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the Heaviside-step function for a binary <inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:mn mathvariant="normal">0</mml:mn><mml:mo>|</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> outcome. The CRPS ranges from 0 (perfect models) to <inline-formula><mml:math id="M45" display="inline"><mml:mrow><mml:mo>+</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula> (low-quality models). Note that the CRPS is the mean absolute error of the model in the case of a deterministic forecast (i.e. ensemble constituted of a unique member).</p>
</sec>
<sec id="Ch1.S2.SS4.SSS2">
  <label>2.4.2</label><title>Forecasting reliability</title>
      <p id="d2e2002">An ensemble forecast is considered reliable (or statistically consistent) when the ensemble spread adequately reflects forecast uncertainty, such that the observations are statistically indistinguishable from the ensemble members <xref ref-type="bibr" rid="bib1.bibx60 bib1.bibx69 bib1.bibx20 bib1.bibx10" id="paren.43"/>. The resulting distribution of the ranks of a sufficient number of observations, as proposed in <xref ref-type="bibr" rid="bib1.bibx20" id="text.44"/> and <xref ref-type="bibr" rid="bib1.bibx60" id="text.45"/>, provides a visual verification of the reliability of the ensemble forecasts. The lack of reliability may take different forms: (i) a tendency to overestimate (resp. underestimate), leading to an over-representation of the lower (resp. higher) ranks in the rank diagram; (ii) under- or over-dispersions of the ensembles, resulting in a <italic>U-shape</italic> or <italic>dome</italic> shape of the rank diagrams. Figure <xref ref-type="fig" rid="F4"/> shows the rank diagrams of the evaluation period (1989–1991) throughout the remaining period (1991–2008), for the daily rainfall and PET data.</p>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e2025">Rank diagrams for daily precipitation and PET for the Climatology-based ensemble (left panel) and Hindcast products (right panel) over the CAMELS-US basins for 3 d lead time. The plots correspond to the evaluation of the test-period (1989–1991) within the remaining 1991–2008 period. The error bars represent variability across the 56 basins considered, and the red line denotes the expected uniform distribution. For ease comparison, the ensembles have been condensed into 10 classes from 17 and 32 members, respectively.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f04.png"/>

          </fig>

      <p id="d2e2034">The rank diagram of the climatological ensemble does not reveal any major deviations from the expected uniform distribution (Fig. <xref ref-type="fig" rid="F4"/>), suggesting the absence of obvious biases in this ensemble. However, the uniformity is not observed in the hindcast product, which exhibits noticeable underdispersion that may be reflected in the forecasted discharges. This underdispersion, which varies within the lead times (see Appendix <xref ref-type="fig" rid="FA8"/>), remains an open question in the present study. Nonetheless, as mentioned by <xref ref-type="bibr" rid="bib1.bibx20" id="text.46"/>, rank diagrams may mask certain defaults in ensemble forecasts; therefore, they will be complemented here by spread-skill ratio (SSR) analysis.

              <disp-formula id="Ch1.E9" content-type="numbered"><label>9</label><mml:math id="M46" display="block"><mml:mtable class="split" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mi mathvariant="normal">SSR</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:msqrt><mml:mrow><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>T</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup></mml:mrow></mml:msqrt><mml:msqrt><mml:mrow><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>T</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mi>t</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt></mml:mfrac></mml:mstyle><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mi mathvariant="normal">with</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>t</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msubsup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mi>N</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mover accent="true"><mml:mi>x</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

            where <inline-formula><mml:math id="M47" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M48" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> denote forecast and observed values; <inline-formula><mml:math id="M49" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M50" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>, the full-set and individual forecast members; <inline-formula><mml:math id="M51" display="inline"><mml:mi>T</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M52" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>, the evaluation period length and time step. The spread-skill ratio (Eq. <xref ref-type="disp-formula" rid="Ch1.E9"/>) is a widely used metric to evaluate the reliability of ensemble forecasts. It compares the ensemble spread (the forecast uncertainty) with the actual forecast error (skill) of the ensemble mean. As formalized by <xref ref-type="bibr" rid="bib1.bibx69" id="text.47"/>, it is typically calculated as the ratio of the square root of the mean of the ensemble variance (spread) to the root mean squared error (RMSE) of the ensemble mean. Values close to one indicate a well-calibrated ensemble, while values below (above) one reveal under- (over-) dispersion.</p>
</sec>
<sec id="Ch1.S2.SS4.SSS3">
  <label>2.4.3</label><title>Forecasting resolution</title>
      <p id="d2e2248">In ensemble forecast verification, resolution refers to the ability of a model to discriminate between events and non-events: i.e., the exceedance or non-exceedance of a given threshold discharge for hydrological predictions. Commonly used metrics for such evaluation include the Brier score <xref ref-type="bibr" rid="bib1.bibx9" id="paren.48"/> and the AUC score (Area Under the Curve) estimated based on a ROC (Receiver Operating Characteristic) curve. <list list-type="bullet"><list-item>
      <p id="d2e2256"><italic>Brier score (BS)</italic></p></list-item></list>

              <disp-formula id="Ch1.E10" content-type="numbered"><label>10</label><mml:math id="M53" display="block"><mml:mrow><mml:mi mathvariant="normal">BS</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>o</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></disp-formula>

            <inline-formula><mml:math id="M54" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> is the number of time steps, <inline-formula><mml:math id="M55" display="inline"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the forecast probability of the event according to the ensemble, and <inline-formula><mml:math id="M56" display="inline"><mml:mrow><mml:msub><mml:mi>o</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the observed boolean outcome (1 if the event occurs and 0 otherwise).</p>
      <p id="d2e2336">The Brier score values range from 0 (perfect) to 1 and are equal to 0.25 for a random detection model (i.e., the no-skill model). <list list-type="bullet"><list-item>
      <p id="d2e2341"><italic>ROC curves and AUC</italic></p></list-item></list> To elaborate on the ROC curve; given a selected target discharge threshold, each rank of the ensemble is selected in turn as the forecast value for event detection. The True positive rate (TPR: proportion of observed events predicted as events) and the False positive rate (FPR: proportion of non-events predicted as events) are computed for each ensemble rank over the evaluation period. The ROC curve relates TPR and FPR. The AUC is the estimated area under the ROC curve. It takes its value between 1 (perfect model, TPR <inline-formula><mml:math id="M57" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 1 and FPR <inline-formula><mml:math id="M58" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0 for all ranks) and 0. The ROC curve of a random detection model corresponds to the diagonal (i.e., TPR <inline-formula><mml:math id="M59" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> FPR <inline-formula><mml:math id="M60" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> proportion of predicted events). The AUC value of this random detection model is equal to 0.5.</p>
      <p id="d2e2374">The forecast resolution may depend on the chosen discharge threshold. To evaluate the ensemble forecasts, several threshold values are tested based on the quantile of the observed discharge series. The considered quantile probabilities are <inline-formula><mml:math id="M61" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> of 0.01, 0.05, 0.10, 0.25, 0.50, 0.75, 0.90, 0.95, and 0.99. For a given discharge threshold <inline-formula><mml:math id="M62" display="inline"><mml:mrow><mml:mi>Q</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:math></inline-formula>, an event is  recorded whenever discharge values cross this threshold. For thresholds below the median (<inline-formula><mml:math id="M63" display="inline"><mml:mrow><mml:mi>x</mml:mi><mml:mo>≤</mml:mo><mml:mn mathvariant="normal">0.5</mml:mn></mml:mrow></mml:math></inline-formula>), events correspond to low-flow conditions, whereas high-flow (flood) conditions correspond to thresholds above the median (<inline-formula><mml:math id="M64" display="inline"><mml:mrow><mml:mi>x</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.5</mml:mn></mml:mrow></mml:math></inline-formula>). Exceedance is defined based on crossings from above (recession curve) or below (rising curve) the threshold, respectively.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS5">
  <label>2.5</label><title>Experimental settings</title>
<sec id="Ch1.S2.SS5.SSS1">
  <label>2.5.1</label><title>Input sequence size and lead time selection strategy</title>
      <p id="d2e2435">The sizes of the input sequences of the MLPs have been set based on cross-correlation diagrams; see Fig. <xref ref-type="fig" rid="F5"/> for the CAMELS-US dataset and Appendix <xref ref-type="fig" rid="FA1"/> for the CAMELS-FR dataset. The median cross-correlation coefficients were considered in the 531 basins. Following <xref ref-type="bibr" rid="bib1.bibx37" id="text.49"/>, a limit value has been chosen for the autocorrelation coefficient for discharges of 0.2 to fix the length <inline-formula><mml:math id="M65" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> of the input sequence for the assimilated discharges. The sequence size (<inline-formula><mml:math id="M66" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>) of the forcing has been set to 30 d as an arbitrary value along the flattened portion of the RR cross-correlogram.</p>

      <fig id="F5" specific-use="star"><label>Figure 5</label><caption><p id="d2e2461">Rainfall–Discharge cross- and auto-correlation on the CAMELS-US dataset; see Appendix <xref ref-type="fig" rid="FA1"/> for the CAMELS-FR case. The chosen sizes (<inline-formula><mml:math id="M67" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M68" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>) of the input sequences are marked with the dashed lines and an orange-colored dot.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f05.png"/>

          </fig>

      <p id="d2e2486">The correlation coefficients between observed discharges and daily rainfall amounts are highest for lag times between 1 and 3 d, suggesting that the basins of the CAMELS-US sample have, on average, short response times, typically of less than 3 d. This ensures that the evaluated 1 to 7 d lead times cover both short- and mid-range forecast horizons. According to the response times of the basins, it is anticipated that short-term predictions 1 d ahead will be partly controlled by past observed rainfalls, whereas mid-term 3 to 7 d forecasts will be mostly determined by predicted rainfalls.</p>
</sec>
<sec id="Ch1.S2.SS5.SSS2">
  <label>2.5.2</label><title>Basin sub-sampling for the climatological ensemble runs</title>
      <p id="d2e2497">The evaluation of the ensemble-based forecast may be numerically demanding: 7 lead times, 7 model configurations, 20 randomly initialized models, 10 to 50 forecast members, and numerous trials for model hyperparameter searching and training. To keep reasonable computation times, the ensemble-based evaluations were conducted on a subset of 56 basins from the initial set of 531 basins. This subset of basins was selected uniformly according to their NSE rank from <xref ref-type="bibr" rid="bib1.bibx33" id="text.50"/>, covering the same range of basins as the initial sample of 531 basins (Fig. <xref ref-type="fig" rid="F6"/>). For the CAMELS-FR basins, the selection was based on the completeness of the discharge time series, with total missing data not exceeding 90 d, while ensuring that all regions (basins coded from A to Y) are represented. The lists of selected basins are provided within the code availability.</p>

      <fig id="F6" specific-use="star"><label>Figure 6</label><caption><p id="d2e2507">Cumulative distribution of the NSE scores of the 531 US-basins from the regional LSTM of <xref ref-type="bibr" rid="bib1.bibx33" id="text.51"/> and the selected subset of 56 basins.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f06.png"/>

          </fig>

</sec>
<sec id="Ch1.S2.SS5.SSS3">
  <label>2.5.3</label><title>Softwares and hyperparameter settings</title>
      <p id="d2e2527">For the orchestrator (MLP) configurations, the hyper-parameters listed in Table <xref ref-type="table" rid="T2"/> were optimized using exhaustive grid search and cross-validation with respect to the used datasets. The hyperparameter subset was derived from a larger space using 20 randomly selected basins, retaining the most frequent configurations. The hidden sizes ranged from a single layer of 30 neurons to four layers with multiples of 30 neurons. Five levels of learning rates (10<sup>−1</sup> to 10<sup>−5</sup>) were primarily tested, and two have been retained based on their occurrences as the best values.</p>

<table-wrap id="T2"><label>Table 2</label><caption><p id="d2e2559">Model hyper-parameter setup.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="2">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Parameters</oasis:entry>
         <oasis:entry colname="col2">Parameter space</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Hidden layers [size,]</oasis:entry>
         <oasis:entry colname="col2">[120, 90] [120, 90, 60]</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Activation</oasis:entry>
         <oasis:entry colname="col2">[relu, tanh]</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Learning rate</oasis:entry>
         <oasis:entry colname="col2">[0.01, 0.001]</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Solver</oasis:entry>
         <oasis:entry colname="col2">ADAM</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Early-stopping</oasis:entry>
         <oasis:entry colname="col2">True</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">No_iter_no_change</oasis:entry>
         <oasis:entry colname="col2">15</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Validation fraction</oasis:entry>
         <oasis:entry colname="col2">[0.2]</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Random number of random</oasis:entry>
         <oasis:entry colname="col2">20</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">initialisation</oasis:entry>
         <oasis:entry colname="col2"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Sequence depth</oasis:entry>
         <oasis:entry colname="col2">30 on forcings, 10 on</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">assimilated data</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e2684">The experiments developed in this study are essentially based on open-source software and the Python 3.9 programming language <xref ref-type="bibr" rid="bib1.bibx64" id="paren.52"/>. Our modeling framework is based on the Scikit-Learn library <xref ref-type="bibr" rid="bib1.bibx45" id="paren.53"/>. Data analysis, processing, and visualization are performed mainly using Pandas <xref ref-type="bibr" rid="bib1.bibx39" id="paren.54"/>, Numpy <xref ref-type="bibr" rid="bib1.bibx63" id="paren.55"/>, Seaborn <xref ref-type="bibr" rid="bib1.bibx66" id="paren.56"/>, Matplotlib <xref ref-type="bibr" rid="bib1.bibx26" id="paren.57"/>, and Xskillscore <xref ref-type="bibr" rid="bib1.bibx5" id="paren.58"/>. The model development was carried out using Jupyter Notebook <xref ref-type="bibr" rid="bib1.bibx31" id="paren.59"/>, Anaconda <xref ref-type="bibr" rid="bib1.bibx3" id="paren.60"/>, and PyCharm <xref ref-type="bibr" rid="bib1.bibx29" id="paren.61"/>.</p>

      <fig id="F7" specific-use="star"><label>Figure 7</label><caption><p id="d2e2721">Examples of hydrographs on basin 01055000 of the CAMELS-US dataset for a 3 d lead time rainfall–runoff forecasting. SAC-SMA (left panels) and LSTM (right panels) cases are shown separately; rows indicates (benchmark models, DA1, DA2, DA3), and columns points to the meteorological forecasting approaches (Perfect, Climatology and Hindcast). Since hindcast products are available only 6 times a month, their outputs are discontinued and shown through box-plots for easier display. Color ranges are used to highlight ensemble quantile ranges <inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mn mathvariant="normal">75</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M72" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> [0.125, 0.875], <inline-formula><mml:math id="M73" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mn mathvariant="normal">95</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M74" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> [0.025, 0.975] and [Min, Max] values, while the observed discharge values are marked with orange-dots.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f07.png"/>

          </fig>

</sec>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Results on the CAMELS-US dataset</title>
      <p id="d2e2776">The performance of the three discharge assimilation (DA) approaches is evaluated against the benchmark models across the considered forecast scenarios. This comparison emphasizes the differences in model behavior between an idealized setting (perfect forecast scenario) and several ensemble-based forecasts. The variability of the scores across the basins is illustrated using boxplots and error bars. To introduce the results, an example of hydrographs (observed and forecasted) is presented in Fig. <xref ref-type="fig" rid="F7"/>. This example corresponds to basin No. 01055000 from the CAMELS-US dataset over the period from 31 March to 15 May 1991, and includes the three tested forecasting approaches (Perfect, Climatology, and Hindcast) along with both benchmark models (LSTM and SACSMA). All presented results concern the test set.</p>
      <p id="d2e2781">Figure <xref ref-type="fig" rid="F7"/> shows what each of the forecast results looks like. For illustration purposes, we selected a case in which both the benchmark models and the meteorological hindcast provide accurate results. The performance metrics and scores of the different approaches for the various lead times, evaluated in the full set of 531 CAMELS-US basins, are presented and discussed in the following sections.</p>

<table-wrap id="T3" specific-use="star"><label>Table 3</label><caption><p id="d2e2789">NSE and improvements at a 1 d lead time across tested discharge assimilation approaches and benchmark models from several studies.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="17">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="center"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:colspec colnum="6" colname="col6" align="center"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:colspec colnum="9" colname="col9" align="right"/>
     <oasis:colspec colnum="10" colname="col10" align="left"/>
     <oasis:colspec colnum="11" colname="col11" align="center"/>
     <oasis:colspec colnum="12" colname="col12" align="right"/>
     <oasis:colspec colnum="13" colname="col13" align="right"/>
     <oasis:colspec colnum="14" colname="col14" align="right"/>
     <oasis:colspec colnum="15" colname="col15" align="left"/>
     <oasis:colspec colnum="16" colname="col16" align="center"/>
     <oasis:colspec colnum="17" colname="col17" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry rowsep="1" namest="col2" nameend="col4"><xref ref-type="bibr" rid="bib1.bibx42" id="text.63"/></oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry rowsep="1" namest="col6" nameend="col9">This study CAMELS-US </oasis:entry>
         <oasis:entry colname="col10"/>
         <oasis:entry rowsep="1" namest="col11" nameend="col14">This study CAMELS-US </oasis:entry>
         <oasis:entry colname="col15"/>
         <oasis:entry rowsep="1" namest="col16" nameend="col17">CAMELS-FR </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Approach</oasis:entry>
         <oasis:entry colname="col2">LSTM</oasis:entry>
         <oasis:entry colname="col3">AR</oasis:entry>
         <oasis:entry colname="col4">DA</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">LSTM</oasis:entry>
         <oasis:entry colname="col7">DA1</oasis:entry>
         <oasis:entry colname="col8">DA2</oasis:entry>
         <oasis:entry colname="col9">DA3</oasis:entry>
         <oasis:entry colname="col10"/>
         <oasis:entry colname="col11">SAC-</oasis:entry>
         <oasis:entry colname="col12">DA1</oasis:entry>
         <oasis:entry colname="col13">DA2</oasis:entry>
         <oasis:entry colname="col14">DA3</oasis:entry>
         <oasis:entry colname="col15"/>
         <oasis:entry colname="col16">LSTM</oasis:entry>
         <oasis:entry colname="col17">DA1</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">NSE</oasis:entry>
         <oasis:entry colname="col2">0.80</oasis:entry>
         <oasis:entry colname="col3">0.88</oasis:entry>
         <oasis:entry colname="col4">0.86</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6">0.74</oasis:entry>
         <oasis:entry colname="col7">0.80</oasis:entry>
         <oasis:entry colname="col8">0.83</oasis:entry>
         <oasis:entry colname="col9">0.82</oasis:entry>
         <oasis:entry colname="col10"/>
         <oasis:entry colname="col11">0.67</oasis:entry>
         <oasis:entry colname="col12">0.80</oasis:entry>
         <oasis:entry colname="col13">0.82</oasis:entry>
         <oasis:entry colname="col14">0.80</oasis:entry>
         <oasis:entry colname="col15"/>
         <oasis:entry colname="col16">0.91</oasis:entry>
         <oasis:entry colname="col17">0.95</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Gain<sup>*</sup></oasis:entry>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">10 %</oasis:entry>
         <oasis:entry colname="col4">8 %</oasis:entry>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6"/>
         <oasis:entry colname="col7">8 %</oasis:entry>
         <oasis:entry colname="col8">12 %</oasis:entry>
         <oasis:entry colname="col9">11 %</oasis:entry>
         <oasis:entry colname="col10"/>
         <oasis:entry colname="col11"/>
         <oasis:entry colname="col12">19 %</oasis:entry>
         <oasis:entry colname="col13">22 %</oasis:entry>
         <oasis:entry colname="col14">19 %</oasis:entry>
         <oasis:entry colname="col15"/>
         <oasis:entry colname="col16"/>
         <oasis:entry colname="col17">4 %</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p id="d2e2792"><sup>*</sup> Gains are estimated relative to the benchmark model NSE. In <xref ref-type="bibr" rid="bib1.bibx42" id="text.62"/>, AR and DA refer to the methods called auto-regressive and discharge assimilation respectively. SAC- refers to SAC-SMA model.</p></table-wrap-foot></table-wrap>

      <fig id="F8" specific-use="star"><label>Figure 8</label><caption><p id="d2e3045">Box-plots of the persistence (PERS) scores. The figures are ordered with SACSMA-cases first, followed by LSTM-cases. Lead times (1–7 d) are shown on <inline-formula><mml:math id="M77" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> axis, while scores are displayed on the <inline-formula><mml:math id="M78" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axis. Color-codes distinguish the approaches: benchmark models (red), DA1 or MLP (green), DA2 (orange) and DA3 (gold). DA1 is replicated in both benchmark cases. In the legend, <inline-formula><mml:math id="M79" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>-LSTM stands for DA2 or MLP-informed by LSTM, <inline-formula><mml:math id="M80" display="inline"><mml:mi>e</mml:mi></mml:math></inline-formula>-LSTM stands for DA3 or error post-processing approach on the LSTM case.</p></caption>
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f08.png"/>

      </fig>

<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Performances of the DA approaches based on perfect meteorological forecasts</title>
<sec id="Ch1.S3.SS1.SSS1">
  <label>3.1.1</label><title>Forecasting efficiency</title>
      <p id="d2e3096">As an introduction, Table <xref ref-type="table" rid="T3"/> displays an overview of the NSE values and gains of the discharge assimilation methods tested in this study for the 1 d lead time forecast, compared to the results published in <xref ref-type="bibr" rid="bib1.bibx42" id="text.64"/>, which tested discharge assimilation using an LSTM on the same CAMELS-US dataset. Note that the test period differs between the study of <xref ref-type="bibr" rid="bib1.bibx42" id="text.65"/> (1989–1999) and the present study (1989–1991). Table <xref ref-type="table" rid="T3"/> also includes the results obtained on the CAMELS-FR dataset, which are presented in more detail in Sect. 4. It shows that NSE scores are highly dependent on the datasets and that the relative gains from discharge assimilation methods tend to be greater when the benchmark models have lower NSE values.</p>
      <p id="d2e3109">Overall, the improvements achieved by the DA strategies developed here are globally consistent with those reported in <xref ref-type="bibr" rid="bib1.bibx42" id="text.66"/>. NSE gains range from 8 % to 12 % for the LSTM model and reach up to 22 % for the conceptual SAC-SMA model. As explained in Sect. 2, the remaining analysis is based on the persistence analysis (Fig. <xref ref-type="fig" rid="F8"/>), which provides more contrasted results than the NSE score.</p>
      <p id="d2e3117">As expected, the PERS scores (Fig. <xref ref-type="fig" rid="F8"/>) are lower at short lead times. This is a common outcome in persistence analysis, as models generally struggle to outperform the persistent model at very short horizons: the smaller the discharge variations, the more difficult they are to predict. Furthermore, in agreement with previous studies, performances are significantly higher for the LSTM-cases than for the SAC-SMA and this trend persists even when DA procedures are implemented.</p>
      <p id="d2e3122">The DA1 method appears to be more effective than the SAC-SMA benchmark in all the lead times tested. However, it only clearly outperforms the benchmark LSTM model in the 1 d lead time  when the initial PERS scores of the LSTM are modest: the median PERS values lower than 0.5 and the PERS values lower than 0 for 20 % of the basins.</p>
      <p id="d2e3126">For further clarity, Fig. <xref ref-type="fig" rid="F9"/> summarizes the gain in PERS scores achieved by the different DA procedures relative to their corresponding benchmark models. Almost without surprise, these gains are highly dependent on the initial PERS value of the benchmark model. Three classes of initial benchmark PERS values are considered in the figure to illustrate this dependency: (<inline-formula><mml:math id="M81" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula>, 0], [0, 0.5], and [0.5, 1]. The two other DA strategies, DA2 and DA3, both based on the benchmark models (i.e., MLP-informed and error post-processing), prove to be effective, as they consistently improve the performance of the benchmark forecasting models on which they are based. DA2 outperforms DA1, while the DA3 approach generally enhances performance or, at least, preserves performance when it is already high.</p>

      <fig id="F9" specific-use="star"><label>Figure 9</label><caption><p id="d2e3143">Gain on Persistence. Where, Gain <inline-formula><mml:math id="M82" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> DA <inline-formula><mml:math id="M83" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula> Benchmark. For lighter nomenclature, the following names have been respectively used: MLP simple (MLP), MLP Informed by BM (<inline-formula><mml:math id="M84" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>-SAC or <inline-formula><mml:math id="M85" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>-LSTM), benchmarks error post-processed (<inline-formula><mml:math id="M86" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula>SAC or <inline-formula><mml:math id="M87" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula>LSTM).</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f09.png"/>

          </fig>

      <p id="d2e3195">The two Figs. <xref ref-type="fig" rid="F8"/> and <xref ref-type="fig" rid="F9"/> show that the gains are larger for shorter lead times, more pronounced for SAC-SMA than for LSTM due to the overall lower scores of the SAC-SMA model, and lower for basins where the initial model already performs well. In general, the ranking of the approaches tested is strongly dependent on the initial skills of the benchmark models. DA2 appears to be the most effective approach overall, followed by DA3 (Fig. <xref ref-type="fig" rid="F8"/>).</p>
      <p id="d2e3204">These results demonstrate the effectiveness of the proposed DA strategies in improving forecast performance under a perfect meteorological forecast scenario. Gains are particularly significant for the 1 d lead times. However, the added value of the proposed DA decreases rapidly with increasing lead times (Fig. <xref ref-type="fig" rid="F9"/>). This decline can be explained by both the increase in the PERS benchmark models with lead time and the overall short response times of the basins in the CAMELS-US dataset, which typically range from one to a few days. This is depicted by the cross-correlation analysis (Fig. <xref ref-type="fig" rid="F5"/>), which explains the reduced influence of past discharges on future trajectories for horizons exceeding these response times.</p>
      <p id="d2e3211">The next step consists of assessing whether these conclusions remain valid when taking into account uncertainties in meteorological forecasts, a situation that corresponds to the operational implementation of rainfall–runoff forecasting models. To streamline the discussion while considering the superiority of the LSTM model compared to Sac-SMA and other possible conceptual rainfall–runoff models, only the LSTM cases are considered in the remainder of the present manuscript.</p>
</sec>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Performances of the DA approaches under ensemble-based forecast scenarios</title>
      <p id="d2e3223">As a reminder, for the examples based on the CAMELS-US dataset, the ensemble forecast scenario is implemented using the historical meteorological records (i.e., Climatology) and the BoM hindcast data. According to the adopted sampling strategy, the Climatology-ensemble consists of 18 members (1991–2008), whereas the hindcast ensemble consists of 32 members, as provided by the data source. The hindcast product is discontinuous, as only 6 predictions are issued within a month. To account for model uncertainties, these ensembles are further expanded through random model initialization: 8 realizations for the LSTM and 20 for the DA approaches. To limit computational costs, the evaluation is conducted in a representative subset of 56 basins, selected to cover the range of LSTM NSE (test) values observed in the 531 basins of the CAMELS-US dataset, as shown in Fig. <xref ref-type="fig" rid="F6"/>. Three key properties of the forecast ensembles are evaluated here: (1) their efficiency based on the CRPS score, (2) their reliability based on rank diagrams complemented with spread/skill ratios (SSR), and (3) their resolution using Brier and AUC scores.</p>

      <fig id="F10" specific-use="star"><label>Figure 10</label><caption><p id="d2e3230">Box-plots of the CRPS scores for the 56 tested basins for the 1989–1991 test period. Lead times of 1 to 7 d are shown in <inline-formula><mml:math id="M88" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> axis, while sores are displayed in <inline-formula><mml:math id="M89" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axis where 0 denotes perfect forecasts. Colors indicated the LSTM benchmark (red), DA1 (green), DA2 (dark-orange), DA3 (gold), Persistent Model (gray) and Past-Observed discharge (dark gray).</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f10.png"/>

        </fig>

<sec id="Ch1.S3.SS2.SSS1">
  <label>3.2.1</label><title>Forecast efficiency</title>
      <p id="d2e3260">Figure <xref ref-type="fig" rid="F10"/> presents the CRPS values for both forecast scenarios based on climatological ensembles and forecasts in all approaches and lead times tested for DA. Two baseline models are included in the analysis: the persistent model (forecast equal to the last observed discharge) and the past-observed (P.O) discharge model, which consists of discharge observations from previous years on the considered date. Since the Persistent-Model produces a single deterministic prediction, its CRPS is reduced to the mean absolute error (MAE). In contrast, the P.O model comprises 18 members and is therefore treated like all other ensemble forecasts.</p>
      <p id="d2e3265">According to Fig. <xref ref-type="fig" rid="F10"/>, all tested models appear more efficient than the persistent baseline model for all lead times, even when accounting for uncertainties in the ensemble forecasts. The performance gap between these models and the persistent model becomes larger as the forecast lead time increases.</p>
      <p id="d2e3270">In the climatology-based scenario (left-most), the models consistently outperform the baseline observed in the past (P.O.), signifying that all tested models and approaches remain informative even at the larger lead times. However, this pattern is not consistently observed in the hindcast-based scenario for lead times exceeding 3 d. The observed biases in the BoM hindcast products for the period 1989–1991, and for the estimated basin average daily PET and rainfall (Fig. <xref ref-type="fig" rid="F4"/>), clearly limit the efficiency of ensemble forecasts based on these hindcasts for the CAMELS-US basins. A detailed analysis of the structure of these biases, along with the development of an appropriate bias correction method <xref ref-type="bibr" rid="bib1.bibx73 bib1.bibx71" id="paren.67"/> would be essential to fully exploit the potential of these hindcasts. However, this likely complex task was beyond the scope of the present study.</p>
      <p id="d2e3278">Finally, the observed trends in Fig. <xref ref-type="fig" rid="F10"/> (climatology) are consistent with those observed using the PERS criterion under the perfect meteorological forecast scenario (Fig. <xref ref-type="fig" rid="F8"/>) with some nuances. The DA approaches, including DA1 (simple MLP), remain effective, as they globally improve the performance of the LSTM model or, at least, do not degrade it for any of the tested lead times. Their added values are also more pronounced at shorter lead times.</p>
</sec>
<sec id="Ch1.S3.SS2.SSS2">
  <label>3.2.2</label><title>Forecast reliability</title>
      <p id="d2e3294">Figure <xref ref-type="fig" rid="F11"/> shows the rank diagrams for both the <italic>climatology-</italic> and <italic>hindcast-based</italic> scenarios with the CAMELS-US dataset. The ensemble members have been grouped into 10 classes for all models to facilitate comparisons. The figure is organized vertically, and the results of the LSTM benchmark model are shown in the first row, followed by the three tested DA approaches.</p>

      <fig id="F11" specific-use="star"><label>Figure 11</label><caption><p id="d2e3307">Rank diagrams for the LSTM-cases models and the DA strategies. <inline-formula><mml:math id="M90" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> axis (10 rank classes), <inline-formula><mml:math id="M91" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axis (proportion of observed values in each class), median ratio and error-bars indicating the distributions for the 56 basins. Colors indicate the lead times.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f11.png"/>

          </fig>

      <p id="d2e3330">Reliable forecasts are expected to yield uniformly distributed rank diagrams, indicating ensemble forecasts in which actual events are evenly distributed across all forecast member ranks. It should be noted first that the rank diagrams are similar across all lead times for a given model and meteorological ensemble product, and they differ between models, indicating that the rainfall–runoff forecasting model, including the discharge assimilation procedure, has an impact on the spread and possible biases of the forecast ensembles. The rank diagrams indicate that the hindcast biases (Appendix <xref ref-type="fig" rid="FA6"/>) propagate in all models and methods tested, providing an explanation for the lower observed CRPS values compared to the climatology-based scenario. The U-shape of the hindcast-based forecast rank diagram suggests that the forecast ensembles may be, on average, under-dispersed. This pattern is not evident when looking at model outputs (Fig. <xref ref-type="fig" rid="F7"/>), but it seems to be confirmed by the spread-skill ratios, which are significantly lower for hindcast-based forecasts than those of the climatology-based forecasts (Fig. <xref ref-type="fig" rid="F12"/>). A slight deviation from the uniform distribution also appears in the rank diagram of the LSTM climatology-based ensemble forecasts. The spread-skill ratios of the LSTM model appear similar to those of the DA1 (MLP), DA2, and DA3 approaches, suggesting that the possible biases in the ensembles generated by the rainfall–runoff model are more complex than simple systematic under-dispersion.</p>

      <fig id="F12" specific-use="star"><label>Figure 12</label><caption><p id="d2e3343">Box-plots of the spread-skill ratio for both climatology- (left panels) and hindcast-based (right panels) forecast scenario for the subset of 56 basins. LSTM-cases (LSTM, DA1, DA2 or i-LSTM, DA3 or e-LSTM) cases are shown including the Past-observed (P.O.) discharge model.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f12.png"/>

          </fig>


</sec>
<sec id="Ch1.S3.SS2.SSS3">
  <label>3.2.3</label><title>Forecast resolution</title>
      <p id="d2e3362">The Brier's and AUC scores evaluate the ability of an ensemble forecast to anticipate events and non-events; for instance, the exceedance or non-exceedance of a selected discharge threshold. Their values are presented in Fig. <xref ref-type="fig" rid="F13"/> and <xref ref-type="fig" rid="F14"/>. As expected, the resolution of all forecasting approaches tested decreases with increasing lead times; i.e., the computed brier scores and AUC get closer to the values obtained based on past observations only for all thresholds as the lead time increases. The resolution analysis also confirms the poor skill of the ensemble forecasts based on the hindcast as used here, which is particularly noticeable in the Brier scores (Fig. <xref ref-type="fig" rid="F13"/>).</p>

      <fig id="F13" specific-use="star"><label>Figure 13</label><caption><p id="d2e3373">Brier's Scores for event detection based on thresholds using discharge quantile (<inline-formula><mml:math id="M92" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) for both low flow (<inline-formula><mml:math id="M93" display="inline"><mml:mrow><mml:mi>q</mml:mi><mml:mo>≤</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) and high flow (<inline-formula><mml:math id="M94" display="inline"><mml:mrow><mml:mi>q</mml:mi><mml:mo>&gt;</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mi>x</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) values. Median scores and error-bars  are shown, indicating the dispersion across the subset of 56 test basins. Past-observed discharge is also evaluated as a poor man's discharge forecast and represented by the <inline-formula><mml:math id="M95" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> symbols.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f13.png"/>

          </fig>

      <p id="d2e3430">All the approaches tested outperform the random detection model (Brier <inline-formula><mml:math id="M96" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 0.25) and generally surpass the P.O model under <italic>climatology-based</italic> forecasts, with little to no improvement under the <italic>hindcast-based</italic> forecast. The earlier finding that climatology-based ensembles tend to outperform hindcast-based forecasts in terms of forecast resolution is also observed here.</p>
      <p id="d2e3447">The previous conclusions also hold for the model resolution: the proposed DA strategies prove effective. They either significantly improve the skill of the LSTM benchmark or, at least, do not degrade its initial performance.</p>
      <p id="d2e3450">This is particularly clear in the Brier scores (Fig. <xref ref-type="fig" rid="F13"/>), especially for short lead times and intermediate discharge thresholds. The AUC graph shows less pronounced contrasts (Fig. <xref ref-type="fig" rid="F14"/>). For a more in-depth comparison between methods, an example of the Roc curves obtained for the threshold quantile 0.95 is presented in Appendix <xref ref-type="fig" rid="FB2"/>. It illustrates the complexity of the comparison: the relative ranking of the models depends on the lead times, criteria, range of considered discharge values or thresholds, and also the target probability of detection in the ROC curve.</p>

      <fig id="F14" specific-use="star"><label>Figure 14</label><caption><p id="d2e3461">AUC scores for events based on flow quantile [0.1, 0.25, 0.5, 0.75, 0.9], with drought/flood detection shifting at quantile 0.5. These shown values correspond to the median AUC values across 56 basins. Climatology and Hindcast are shown in rows, lead times 1-3-7 d are presented in columns. Past-observed discharge is also evaluated as a poor man's discharge forecast and represented by the <inline-formula><mml:math id="M97" display="inline"><mml:mi>X</mml:mi></mml:math></inline-formula> symbols.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f14.png"/>

          </fig>

      <p id="d2e3477">Two additional observations can be drawn from the AUC figure (Fig. <xref ref-type="fig" rid="F14"/>), despite its limited contrast. First, the gap between AUC values based on past observed discharges and those of the tested forecasting approaches is particularly pronounced for high-threshold quantiles at a 1 d lead time. This suggests that the tested approaches are particularly well-suited to predicting the exceedance of high discharge values, which is consistent with the fact that the standard root mean square error criterion, known to place greater emphasis on large discharge values <xref ref-type="bibr" rid="bib1.bibx62" id="paren.68"/>, has been used to train all models and methods.</p>
      <p id="d2e3485">More surprisingly, for large discharge thresholds at 3 and 7 d lead times, the AUC scores obtained  with hindcast-based approaches exceed those associated with climatology-based forecasts. This indicates that, despite their apparently lower overall skill, the hindcast products used contain valuable information compared to climatology for predicting intense rainfall-triggered events.</p>
      <p id="d2e3489">These observations further illustrate how conclusions drawn from model comparisons depend on the target variable used to train the model, the range of values considered, and the evaluation metric used. At this stage of the analysis, the following partial conclusions can be drawn: <list list-type="bullet"><list-item>
      <p id="d2e3494">The proposed discharge assimilation procedures, particularly DA2 and DA3, prove to be effective, as they either significantly improve or at least do not degrade the performance of the LSTM benchmark model across all considered lead times and evaluation criteria;</p></list-item><list-item>
      <p id="d2e3498">Evaluating rainfall–runoff forecasts based on meteorological ensembles is a necessary complement to analyzes that are often conducted under the implicit assumption of perfect meteorological forecasts. In the present case, this approach reveals that the superiority of the LSTM and LSTM-based discharge assimilation methods over the proposed simpler MLP model, observed for lead times greater than two days, disappears once meteorological uncertainties are taken into account.</p></list-item></list></p>
      <p id="d2e3501">Nevertheless, the analysis is limited by the low skill of the available hindcast products for the selected test period (1989–1991) in the CAMELS-US dataset. It is therefore proposed in Sect. <xref ref-type="sec" rid="Ch1.S4"/> to implement some of the tested approaches on a more recent dataset (CAMELS-FR), for which additional ensemble meteorological forecast products are available. The objective of this extension is twofold: (1) to assess the robustness and generality of the conclusions drawn from the CAMELS-US case study, and (2) to evaluate ensemble forecasting skill using more recent and probably higher-quality meteorological ensemble forecasts produced by the European Center for Medium-Range Weather Forecasts (ECMWF).</p>
      <p id="d2e3507">In line with the conclusions of this section, and for the sake of simplicity, the analysis in Sect. <xref ref-type="sec" rid="Ch1.S4"/> is restricted to the benchmark LSTM model and the DA1 (MLP) strategy, evaluated under the same framework as previously. The analysis relies on hindcast products as well as forecast archives, providing an evaluation of the predictive skill of these two ensemble rainfall–runoff forecasting models, had they been implemented in the past.</p>
</sec>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Extension to the CAMELS-FR dataset</title>
      <p id="d2e3522">To ensure consistency with previous studies, such as <xref ref-type="bibr" rid="bib1.bibx33" id="text.69"/> for the CAMELS-US dataset and <xref ref-type="bibr" rid="bib1.bibx22" id="text.70"/> for French basins, Fig. <xref ref-type="fig" rid="F15"/> illustrates the position of the NSE values for the LSTM and DA1 approaches implemented in this extended analysis using the CAMELS-FR dataset <xref ref-type="bibr" rid="bib1.bibx16" id="paren.71"/>. The results indicate that the trained LSTM achieves a high level of performance on the CAMELS-FR dataset, with median NSE values reaching 0.9.</p>

      <fig id="F15" specific-use="star"><label>Figure 15</label><caption><p id="d2e3538">NSE scores comparison between LSTM and SACSMA for 531 US-basins with <xref ref-type="bibr" rid="bib1.bibx33" id="text.72"/>, LSTM and GR4J <xref ref-type="bibr" rid="bib1.bibx47" id="paren.73"/> on 365 French basins with <xref ref-type="bibr" rid="bib1.bibx22" id="text.74"/>, and the ongoing LSTM vs GR4J and MLP (DA1) for 338 basins from the CAMELS-FR dataset.</p></caption>
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f15.png"/>

      </fig>

      <p id="d2e3556">Furthermore, consistent with the comparison presented in Sect. <xref ref-type="sec" rid="Ch1.S3"/>, the DA1 outperforms the LSTM at the 1 d lead time and exhibits NSE values comparable to those of the LSTM model at longer lead times. The NSE values obtained on the same datasets with the conceptual GR4J model <xref ref-type="bibr" rid="bib1.bibx47" id="paren.75"/>, a reference model in France, are also shown. These results confirm that AI-based rainfall–runoff forecasts outperform traditional conceptual rainfall–runoff models on the French dataset, although the performance gap is less pronounced than that reported in <xref ref-type="bibr" rid="bib1.bibx33" id="text.76"/> for US basins.</p>

      <fig id="F16" specific-use="star"><label>Figure 16</label><caption><p id="d2e3570">Example of 3 d lead time forecasted hydrograph for the basin K132181010 from 16 January to 27 March 2020. LSTM and DA1 are displayed in columns, while the 4 forecast approaches (Perfect, Climatology, Hindcast and Forecast Archives) are in rows. Given discontinuity in the two last forecast products, they are represented using box-plots.</p></caption>
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f16.png"/>

      </fig>

      <p id="d2e3579">It can also be observed in Fig. <xref ref-type="fig" rid="F15"/> that the NSE values increase from left to right. Since the LSTM architectures and implementation strategies are similar across the considered studies <fn id="Ch1.Footn1"><p id="d2e3584">Regionally trained LSTM models with static attributes of basins, input sequence lengths of 270 d, a loss function of mean square error, and a hidden size of 256.</p></fn>, this increase may be partly explained by the improvement over time of the model training algorithms but is probably mainly attributable to the datasets; the recently published CAMELS-FR dataset consists of records from basins with limited anthropogenic influence and has undergone extensive quality control <xref ref-type="bibr" rid="bib1.bibx16" id="paren.77"/>.</p>
      <p id="d2e3591">Figure <xref ref-type="fig" rid="F16"/> illustrates, using an example of hydrographs from the CAMELS-FR experiment, what the outcomes of the various approaches tested look like. The 3 d lead time forecast is presented here, while the corresponding 1 and 7 d lead times are provided in Appendix <xref ref-type="fig" rid="FB5"/>. Nevertheless, no general conclusion can be drawn from this isolated example regarding the relative performance of the various methods. Furthermore, direct pairwise comparisons between <italic>hindcast</italic> and <italic>forecast</italic> archives are not possible, as the dates for which the hindcast and forecast archives are available are not strictly aligned. The aggregated evaluation metrics are presented hereafter.</p>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Model efficiency analysis</title>
      <p id="d2e3611">The PERS scores obtained by the LSTM and DA1 approaches for the CAMELS-FR (Appendix <xref ref-type="fig" rid="FB6"/>) exhibit trends similar to those observed in the CAMELS-US analysis; however, the median PERS of the MLP (DA1) method remains higher than that of LSTM up to the 5 d lead time. This can be partly explained by the difference in hydrological inertia of the basins between the two datasets, as shown in Appendix <xref ref-type="fig" rid="FA1"/> and previously discussed by <xref ref-type="bibr" rid="bib1.bibx46" id="text.78"/>. In the same line of thought, the spread of PERS scores for the DA1 remains more limited than that of the LSTM model across all the tested lead times. While these differences may also partly originate from variations in dataset quality and initial model performance, they also reflect the contribution of the assimilated discharges, which certainly plays a key role.</p>
</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Ensemble forecast analysis (efficiency, reliability and resolution)</title>
      <p id="d2e3629">In this subsection, the complete ensemble analysis is provided, using CRPS scores (Fig. <xref ref-type="fig" rid="F17"/>) for the efficiency analysis, the Rank diagram (Fig. <xref ref-type="fig" rid="F18"/>) for reliability, and Brier's scores (Fig. <xref ref-type="fig" rid="F19"/>) for the resolution of the ensemble forecasts. The Spread-Skill ratio and the AUC scores are provided in Appendices <xref ref-type="fig" rid="FB7"/> and <xref ref-type="fig" rid="FB9"/>, respectively.</p>

      <fig id="F17" specific-use="star"><label>Figure 17</label><caption><p id="d2e3644">CRPS scores for the LSTM and the DA1 (MLP) with the CAMELS-FR dataset.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f17.png"/>

        </fig>

      <fig id="F18" specific-use="star"><label>Figure 18</label><caption><p id="d2e3655">Rank diagrams for the benchmark models and the DA strategies. <inline-formula><mml:math id="M98" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> axis (10 rank classes), <inline-formula><mml:math id="M99" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axis (proportion of observed values in each class), median ratio and error-bars indicating the maximum and minimum ratios for the 56 test basins. Colors indicate the lead times.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f18.png"/>

        </fig>

      <fig id="F19" specific-use="star"><label>Figure 19</label><caption><p id="d2e3681">Brier's Scores for the LSTM and the DA1 (MLP) with the CAMELS-FR dataset.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f19.png"/>

        </fig>

      <p id="d2e3690">As shown in Fig. <xref ref-type="fig" rid="F17"/>, CRPS values are generally lower here than those reported previously, with most values falling below 0.5 across all forecasting approaches, including the Climatology-based method. All tested methods (LSTM and DA1) successfully outperform both the persistent model and the no-skill past observed (P.O) discharge ensembles.</p>
      <p id="d2e3695">Unlike in the CAMELS-US case, meteorological ensemble forecast products (hindcast and forecast archives) demonstrate better performance than the climatology-based ensemble, particularly for lead times exceeding 2 d. This is further supported by the CRPSS scores (Appendix <xref ref-type="fig" rid="FB8"/>), estimated using the climatology-based forecast as a reference, which indicate that both forecast products outperform this baseline.</p>
      <p id="d2e3700">Note that this result, counterbalancing the pessimistic conclusion drawn in Sect. <xref ref-type="sec" rid="Ch1.S3"/> regarding meteorological hindcasts, is obtained despite the significant biases observed in both the hindcast and forecast products used in this French experiment (see Figs. <xref ref-type="fig" rid="FA7"/> and <xref ref-type="fig" rid="FA8"/>).</p>
      <p id="d2e3709">Consistent with the persistence criterion, the CRPS values obtained with the DA1 (MLP) approach are, on average, lower (better) than those of the LSTM model across all meteorological ensembles and lead times, except for the hindcast at lead times exceeding 4 d.</p>
      <p id="d2e3712">The rank diagrams (Fig. <xref ref-type="fig" rid="F18"/>) reveal biases affecting all forecast ensembles. With the exception of the climatology-based MLP forecasts, an excessively high proportion of observed discharges falls outside the [0.1, 0.9] quantile range of the forecast ensembles. This is partly explained by the biases in the hindcast products and the forecast archives (Fig. <xref ref-type="fig" rid="FA7"/>). However, as these proportions are higher for the LSTM model, it is likely that this model also introduces additional biases when combined with weather forecast ensembles.</p>
      <p id="d2e3720">This issue certainly deserves further investigation to support a more efficient operational implementation of LSTM-based rainfall–runoff forecasting models. Biases in forecast ensembles reduce the resolution of the forecasts, as the probability of exceedance is less accurately represented.</p>
      <p id="d2e3723">Consistent with the analysis in Sect. <xref ref-type="sec" rid="Ch1.S3"/>, all models and approaches outperform both the random detection and the past-observed discharge ensemble baselines. However, unlike in Sect. <xref ref-type="sec" rid="Ch1.S3"/>, this statement clearly holds across all meteorological ensembles, lead times, and evaluation metrics (Brier in Fig. <xref ref-type="fig" rid="F19"/> and AUC in Fig. <xref ref-type="fig" rid="FB9"/>).</p>
      <p id="d2e3734">The resolution of the DA1 (MLP) strategy appears higher than that of the LSTM across most tested configurations, with the exception of the Brier scores computed for the low discharge threshold at a 7 d lead time (Fig. <xref ref-type="fig" rid="F19"/>). In this case, the CRPS of the LSTM models appears, on average, lower than that of the DA1, suggesting some consistency across metrics in capturing various properties of the forecasts.</p>
      <p id="d2e3739">Two specific patterns identified in the AUC analysis in Sect. <xref ref-type="sec" rid="Ch1.S3"/> are also visible in Fig. <xref ref-type="fig" rid="FB9"/>. First, the gap between AUC values based on past-observed (P.O) discharges and those of the forecasting models is particularly pronounced for high-threshold quantiles at a 1 d lead time. Second, for large-discharge thresholds at a 7 d lead time, the AUC scores obtained with hindcast- and archive-based ensemble forecasts clearly exceed those of the climatology-based forecast. This confirms the ability of the weather forecast products to predict significant rainfall events up to one week in advance.</p>
      <p id="d2e3746">Overall, this extended analysis, which incorporates forecast archives, yields satisfactory results. It confirms the findings of Sect. <xref ref-type="sec" rid="Ch1.S3"/> and reinforces the relevance of the forecasting and discharge assimilation (DA) approaches evaluated in this study. The main findings are as follows: (1) the gain of the DA1 strategy compared to the rainfall–runoff LSTM simulation model is consistently observed, although it is lower for the CAMELS-FR basins, partly due to the initial high performance of the LSTM; (2) the complementarity of the two forecast evaluation frameworks (deterministic vs. ensemble-based) further highlights the importance of ensemble-based evaluation in operational hydrometeorological forecasting. Ensemble-based forecasting also emphasizes the superiority of the DA1 approach over the rainfall–runoff LSTM across the tested lead times and evaluation metrics. Finally, this extended analysis suggests a higher quality of ensemble weather forecast products over the recent period (2018–2021) used to evaluate the DA approaches in the CAMELS-FR basins.</p>
</sec>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <label>5</label><title>Conclusions</title>
      <p id="d2e3761">This work aimed to evaluate the added value of discharge assimilation (DA) procedures for rainfall–runoff forecasting, particularly in the context of AI-based operational hydrometeorological applications. Three DA strategies are compared against two benchmark models (LSTM and SAC-SMA) that do not incorporate DA. These DA strategies are evaluated under both a traditional perfect weather forecast (deterministic) framework and an ensemble-based forecast framework, using no-skill past observed forcing (climatology), hindcast products, and forecast archives. Additional emphasis is provided through comparisons with both a persistent model and past-observed (P.O) discharge ensembles. The experiments have been conducted on the widely used CAMELS-US dataset and extended to the recently published CAMELS-FR dataset.</p>
      <p id="d2e3764">While all tested approaches consistently outperform both the persistent model and the P.O baselines, the various DA procedures appear to be globally effective. They generally improve, or at least do not significantly degrade, the forecasting performance of the benchmark models on which they are based. Within the perfect meteorological forecast evaluation framework, DA approaches consistently improve the SAC-SMA forecasts, while improvements for the LSTM are mainly observed at short lead times and in basins where the benchmark LSTM model initially underperformed. These more limited gains further highlight the strong performance of the LSTM model in rainfall–runoff simulation and forecasting, as already demonstrated in numerous studies <xref ref-type="bibr" rid="bib1.bibx33 bib1.bibx18 bib1.bibx22 bib1.bibx42 bib1.bibx72" id="paren.79"/>. This behavior is consistently observed across both CAMELS-US and CAMELS-FR basins. Due to the higher hydrological inertia of the CAMELS-FR basins compared to those of the CAMELS-US, the added value of the tested DA strategies remains significant at longer forecasting lead times.</p>
      <p id="d2e3770">Several interesting insights emerge from the ensemble-based evaluation framework. The DA1 (MLP) approach, which incorporates past observed discharges, appears to outperform the LSTM model across all the tested lead times. This conclusion holds particularly for the assessment criteria characterizing the resolution of the forecasts (Brier's scores and AUC); i.e., the ability to detect in advance the exceedance of a discharge threshold. The LSTM model appears penalized by the limited reliability of its forecast ensembles (biases observed on the rank diagrams). This ensemble evaluation suggests that the performance of the LSTM forecasts could be improved in the future through the implementation of post-processing techniques such as ensemble bias correction.</p>
      <p id="d2e3773">The tested DA methods are implemented using a relatively simple MLP orchestrator, which already provides satisfactory results. Although this choice aligns with the objective of developing frugal AI solutions, there remains clear potential for improvement by exploring more advanced AI techniques and alternative data assimilation strategies, such as the Ensemble Kalman Filter <xref ref-type="bibr" rid="bib1.bibx12" id="paren.80"/> or an auto-regressive approach as in <xref ref-type="bibr" rid="bib1.bibx42" id="text.81"/>.</p>
      <p id="d2e3783">It is observed that model performances are globally higher for high observed discharge values than for low flows. This is likely related to the use of the mean squared error loss function during training <xref ref-type="bibr" rid="bib1.bibx62" id="paren.82"/>. The investigation of alternative loss functions tailored to different flow levels, therefore, represents a promising direction for future research, particularly for the development of AI-based low-flow forecasting models. Moreover, as ensemble discharge forecasts are becoming an operational standard, it may be beneficial to train models directly using ensemble-based metrics, for example, by optimizing the Brier's score for event detection purposes.</p>
      <p id="d2e3789">Further work could also focus on a more thorough analysis of meteorological and hydrological ensemble spreads, as well as on the application of ensemble bias correction methods to improve the resolution of forecast products.</p>
</sec>

      
      </body>
    <back><app-group>

<app id="App1.Ch1.S1">
  <label>Appendix A</label><title>Data and specificity</title>

      <fig id="FA1"><label>Figure A1</label><caption><p id="d2e3805">Cross- and auto-correlation analysis between the rainfall and the discharge for the CAMELS-FR dataset. Orange dots denote the position of the <inline-formula><mml:math id="M100" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M101" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> used on the CAMELS-US dataset, whereas the teal one indicate the corresponding cross-correlation scores for the CAMELS-FR dataset. This means, following the same approach to setup the size of the input sequences, larger values would have been used on the CAMELS-FR cases.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f20.png"/>

      </fig>

      <fig id="FA2"><label>Figure A2</label><caption><p id="d2e3832">Rank diagrams for Rainfall and PET on CAMELS-FR dataset comparing the test period to the remaining historical observations.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f21.png"/>

      </fig>

      <fig id="FA3"><label>Figure A3</label><caption><p id="d2e3846">Rank diagrams for the daily precipitation and PET for the climatological ensembles (left panel) and Hindcast (right panel) products for the CAMELS-FR dataset. Plots correspond to 1989–2017 and evaluated for the test period 2017–2021. The error-bars represent variability for the 56 tested basins, the red line denotes the expected uniform distribution</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f22.png"/>

      </fig>

<fig id="FA4"><label>Figure A4</label><caption><p id="d2e3860">Rank diagrams of the test period against the remaining data for the discharge (discharge climatology) for CAMELS-US (left panel) and CAMELS-FR (right panel) datasets.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f23.png"/>

      </fig>

      <fig id="FA5"><label>Figure A5</label><caption><p id="d2e3873">Dispersion analysis of the climatology for all the features in both CAMELS-US (left panel) and CAMELS-FR (right panel) datasets. For easier visualization, the 18 and 29 members the two datasets have been forced to be displayed on 10 classes per graphic.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f24.png"/>

      </fig>

      <fig id="FA6"><label>Figure A6</label><caption><p id="d2e3886">Dispersion analysis of the forecast products on the CAMELS-US case.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f25.png"/>

      </fig>

<fig id="FA7"><label>Figure A7</label><caption><p id="d2e3901">Dispersion analysis of the forecast products on the CAMELS-FR case.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f26.png"/>

      </fig>

      <fig id="FA8"><label>Figure A8</label><caption><p id="d2e3914">Rank diagrams for daily precipitation and PET for the Hindcast-based ensemble, for lead times 1, 3, and 7 d for both US-basins (top panels) and  FR-basins (bottom panels). The plots correspond to the evaluation of the respective test-period within the respective forecast data. The error bars represent variability across the 56 basins considered, and the red line denotes the expected uniform distribution. For ease comparison, the ensembles have been condensed into 10 classes from 32 and 10 members, respectively. Under-dispersion trend of the hindcast products appears diminished within increasing lead times.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f27.png"/>

      </fig>


</app>

<app id="App1.Ch1.S2">
  <label>Appendix B</label><title>Hydrograms</title>
      <p id="d2e3935">Figure <xref ref-type="fig" rid="FB2"/> provides an illustration of the ROC curves based on which the AUC values have been calculated, as well as the variability of the ROC curve shapes across the 56 test basins. One ROC curve and one AUC value are computed for each basin and each forecasting method tested.</p>

      <fig id="FB1"><label>Figure B1</label><caption><p id="d2e3942">Example of hydrograph for 1 d lead times on the CAMELS-US dataset for both SACS-SMA (left panels) and LSTM (right panels) cases for basin No. 01055000.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f28.png"/>

      </fig>

<fig id="FB2"><label>Figure B2</label><caption><p id="d2e3957">ROC curves for flood detection (<inline-formula><mml:math id="M102" display="inline"><mml:mrow><mml:mi>q</mml:mi><mml:mo>≥</mml:mo><mml:msub><mml:mi>Q</mml:mi><mml:mn mathvariant="normal">95</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>) for 1, 3 and 7 d lead times. Results are style-coded: <italic>MLP Simple</italic> (green solid, DA-1), <italic>MLP informed by benchmark</italic> (dashed, DA-2), <italic>Benchmark ePP</italic> (dot-dashed, DA-3), <italic>initial Benchmark</italic> (dotted). Benchmark cases are color-coded: <italic>SACSMA</italic> (blue to pink, first row), <italic>LSTM</italic> (red to orange, second row). Halos show the variability across the 56 basins around the median values.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f29.png"/>

      </fig>

      <fig id="FB3"><label>Figure B3</label><caption><p id="d2e4004">Rank diagrams for the benchmark SACSMA-cases and the DA strategies. <inline-formula><mml:math id="M103" display="inline"><mml:mi>x</mml:mi></mml:math></inline-formula> axis (10 rank classes), <inline-formula><mml:math id="M104" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula> axis (proportion of observed values in each class), median ratio and error-bars indicating the distributions of the 56 basins. Colors indicate the lead times.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f30.png"/>

      </fig>

<fig id="FB4"><label>Figure B4</label><caption><p id="d2e4032">Example of hydrograph for 7 d lead times on the CAMELS-US dataset for both SACS-SMA (left panels) and LSTM (right panels) cases for basin No. 01055000.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f31.png"/>

      </fig>

<fig id="FB5"><label>Figure B5</label><caption><p id="d2e4047">Example of hydrograph for 1 and 7 d lead times on the CAMELS-FR dataset for the basin K132181010.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f32.png"/>

      </fig>

      <fig id="FB6"><label>Figure B6</label><caption><p id="d2e4060">Persistence scores for LSTM and the MLP (DA1) on the CAMELS-FR dataset.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f33.png"/>

      </fig>

<fig id="FB7"><label>Figure B7</label><caption><p id="d2e4074">SSR for the LSTM and the DA1 (MLP) with the CAMELS-FR dataset.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f34.png"/>

      </fig>

      <fig id="FB8"><label>Figure B8</label><caption><p id="d2e4087">CRPSS of forecast products against the Climatology-based scenario for the CAMELS-FR dataset.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f35.png"/>

      </fig>

      <fig id="FB9"><label>Figure B9</label><caption><p id="d2e4101">AUC scores for the LSTM and the DA1 (MLP) with the CAMELS-FR dataset.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3497/2026/hess-30-3497-2026-f36.png"/>

      </fig>


</app>
  </app-group><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d2e4118">All data used in this study are drawn from the CAMELS-US (<uri>https://gdex.ucar.edu/dataset/camels.html</uri>, last access: 17 April 2025) and CAMELS-FR (<uri>https://entrepot.recherche.data.gouv.fr/dataverse/CAMELS-FR</uri>, last access: 17 April 2025) datasets. The processed version of these datasets supporting this study is made available on Zenodo (<ext-link xlink:href="https://doi.org/10.5281/zenodo.19825677" ext-link-type="DOI">10.5281/zenodo.19825677</ext-link>), including the necessary instructions to ensure reproducibility. The benchmark models used in this study (LSTM and SAC-SMA) are described in their original publications, which should be consulted for methodological details. Their adapted code versions used in this work are publicly released on Zenodo (SACSMA: <ext-link xlink:href="https://doi.org/10.5281/zenodo.20379006" ext-link-type="DOI">10.5281/zenodo.20379006</ext-link>; LSTM: <ext-link xlink:href="https://doi.org/10.5281/zenodo.20379019" ext-link-type="DOI">10.5281/zenodo.20379019</ext-link>). The code for the MLP-based data assimilation (DA) implementation is available at <ext-link xlink:href="https://doi.org/10.5281/zenodo.20415493" ext-link-type="DOI">10.5281/zenodo.20415493</ext-link>.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e4143">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e4149">All the indicated authors contributed to the realization and the discussions of this study. BSF and EG  arried out the experiments and the analysis of the scientific relevance of the results. BSF developed the model code, performed the simulations and post-processed the results. FS participated in the deployment of the SAC-SMA model, including the post-processing of the results. NA and DT contributed in the discussion for the operationalization of the models as the aQuasys partners.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e4155">The paper is written in LaTeX using <bold>Overleaf</bold>. <bold>Writefull</bold> and <bold>ChatGPT</bold> have been used for rephrasing and minor corrections. The experiments are primarily based on the CAMELS-US and CAMELS-FR datasets and implemented using open-source software and programming languages, including Python 3.9, scikit-learn, PyTorch, NumPy, and pandas. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e4174">The authors would like to thank <italic>Gustave Eiffel University</italic> and <italic>aQuasys Company</italic> for initiating the <italic>Anticipation, Planification et Pilotage des Prélèvements Agricoles (A3P)</italic> project and the AiQua LabCom. We are grateful to the NeuralHydrology team for making their regional LSTM code publicly available, as well as to the authors of the SAC-SMA model. We also thank the contributors of the CAMELS-US and CAMELS-FR datasets for their significant contributions to the community. We acknowledge the <italic>Groupement Ligérien pour le Calcul Intensif Distribué (GLiCID)</italic> for providing the computing resources. Finally, we thank Michaël Savary, Pierre Nicolle, Reyhaneh Hashemi, Zoë Jack and Otis Cooper for their support, including preliminary proofreading and grammar checking.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e4191">This research has been supported by the Agence Nationale de la Recherche (ANR) under the Aiqua LabCom (grant no. ANR-24-LCV2-0015-01) and Bpifrance under the A3P project (grant no. DOS0231020/00).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e4197">This paper was edited by Ralf Loritz and reviewed by two anonymous referees.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Addor et al.(2017)Addor, Newman, Mizukami, and Clark</label><mixed-citation>Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, <ext-link xlink:href="https://doi.org/10.5194/hess-21-5293-2017" ext-link-type="DOI">10.5194/hess-21-5293-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Anctil et al.(2004)Anctil, Michel, Perrin, and Andréassian</label><mixed-citation>Anctil, F., Michel, C., Perrin, C., and Andréassian, V.: A soil moisture index as an auxiliary ANN input for stream flow forecasting, J. Hydrol., 286, 155–167, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2003.09.006" ext-link-type="DOI">10.1016/j.jhydrol.2003.09.006</ext-link>, 2004.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Anon(2020)</label><mixed-citation>Anon: Anaconda Software Distribution, <uri>https://www.anaconda.com</uri> (last access: 31 July 2024), 2020.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Atmaja and Akagi(2020)</label><mixed-citation>Atmaja, B. T. and Akagi, M.: Deep Multilayer Perceptrons for Dimensional Speech Emotion Recognition, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.2004.02355" ext-link-type="DOI">10.48550/arXiv.2004.02355</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Bell et al.(2021)Bell, Spring, Brady, Andrew, Squire, Blackwood, Sitter, and Chegini</label><mixed-citation>Bell, R., Spring, A., Brady, R., Andrew, Squire, D., Blackwood, Z., Sitter, M. C., and Chegini, T.: xarray-contrib/xskillscore: Release v0.0.23, Zenodo [data set], <ext-link xlink:href="https://doi.org/10.5281/zenodo.5173153" ext-link-type="DOI">10.5281/zenodo.5173153</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Boucher et al.(2020)Boucher, Quilty, and Adamowski</label><mixed-citation>Boucher, M.-A., Quilty, J., and Adamowski, J.: Data Assimilation for Streamflow Forecasting Using Extreme Learning Machines and Multilayer Perceptrons, Water Resour. Res., 56, e2019WR026226, <ext-link xlink:href="https://doi.org/10.1029/2019WR026226" ext-link-type="DOI">10.1029/2019WR026226</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Bourgin et al.(2014)Bourgin, Ramos, Thirel, and Andréassian</label><mixed-citation>Bourgin, F., Ramos, M. H., Thirel, G., and Andréassian, V.: Investigating the interactions between data assimilation and post-processing in hydrological ensemble forecasting, J. Hydrol., 519, 2775–2784, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2014.07.054" ext-link-type="DOI">10.1016/j.jhydrol.2014.07.054</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Bradley and Schwartz(2011)</label><mixed-citation>Bradley, A. A. and Schwartz, S. S.: Summary Verification Measures and Their Interpretation for Ensemble Forecasts, Mon. Weather Rev., 139, 3075–3089, <ext-link xlink:href="https://doi.org/10.1175/2010MWR3305.1" ext-link-type="DOI">10.1175/2010MWR3305.1</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Brier(1950)</label><mixed-citation>Brier, G. W.: Verification of forecasts expressed in terms of probability, Mon. Weather Rev., 78, 1–3, <ext-link xlink:href="https://doi.org/10.1175/1520-0493(1950)078&lt;0001:VOFEIT&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0493(1950)078&lt;0001:VOFEIT&gt;2.0.CO;2</ext-link>, 1950.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Buizza et al.(2005)Buizza, Houtekamer, Pellerin, Toth, Zhu, and Wei</label><mixed-citation>Buizza, R., Houtekamer, P. L., Pellerin, G., Toth, Z., Zhu, Y., and Wei, M.: A Comparison of the ECMWF, MSC, and NCEP Global Ensemble Prediction Systems, Mon. Weather Rev., 133, 1076–1097, <ext-link xlink:href="https://doi.org/10.1175/MWR2905.1" ext-link-type="DOI">10.1175/MWR2905.1</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Chevillon(2007)</label><mixed-citation>Chevillon, G.: Direct multi-step estimation and forecasting, J. Econ. Surv., 21, 746–785, <ext-link xlink:href="https://doi.org/10.1111/j.1467-6419.2007.00518.x" ext-link-type="DOI">10.1111/j.1467-6419.2007.00518.x</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Clark et al.(2008)Clark, Rupp, Woods, Zheng, Ibbitt, Slater, Schmidt, and Uddstrom</label><mixed-citation>Clark, M. P., Rupp, D. E., Woods, R. A., Zheng, X., Ibbitt, R. P., Slater, A. G., Schmidt, J., and Uddstrom, M. J.: Hydrological data assimilation with the ensemble Kalman filter: Use of streamflow observations to update states in a distributed hydrological model, Adv. Water Resour., 31, 1309–1324, <ext-link xlink:href="https://doi.org/10.1016/j.advwatres.2008.06.005" ext-link-type="DOI">10.1016/j.advwatres.2008.06.005</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Corradini et al.(1986)Corradini, Melone, and Ubertini</label><mixed-citation> Corradini, C., Melone, F., and Ubertini, L.: A semi-distributed adaptive model for real-time flood forecasting, J. Am. Water Resour. Assoc., 22, 1031–1038, 1986.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Crochemore et al.(2017)Crochemore, Ramos, Pappenberger, and Perrin</label><mixed-citation>Crochemore, L., Ramos, M.-H., Pappenberger, F., and Perrin, C.: Seasonal streamflow forecasting by conditioning climatology with precipitation indices, Hydrol. Earth Syst. Sci., 21, 1573–1591, <ext-link xlink:href="https://doi.org/10.5194/hess-21-1573-2017" ext-link-type="DOI">10.5194/hess-21-1573-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Day(1985)</label><mixed-citation>Day, G. N.: Extended Streamflow Forecasting Using NWSRFS, J. Water Resour. Plan. Manage., 111, 157–170, <ext-link xlink:href="https://doi.org/10.1061/(ASCE)0733-9496(1985)111:2(157)" ext-link-type="DOI">10.1061/(ASCE)0733-9496(1985)111:2(157)</ext-link>, 1985.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Delaigue et al.(2025)Delaigue, Guimarães, Brigode, Génot, Perrin, Soubeyroux, Janet, Addor, and Andréassian</label><mixed-citation>Delaigue, O., Guimarães, G. M., Brigode, P., Génot, B., Perrin, C., Soubeyroux, J.-M., Janet, B., Addor, N., and Andréassian, V.: CAMELS-FR dataset: a large-sample hydroclimatic dataset for France to explore hydrological diversity and support model benchmarking, Earth Syst. Sci. Data, 17, 1461–1479, <ext-link xlink:href="https://doi.org/10.5194/essd-17-1461-2025" ext-link-type="DOI">10.5194/essd-17-1461-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Fang et al.(2021)Fang, Wang, Peng, and Hong</label><mixed-citation>Fang, Z., Wang, Y., Peng, L., and Hong, H.: Predicting flood susceptibility using LSTM neural networks, J. Hydrol., 594, 125734, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2020.125734" ext-link-type="DOI">10.1016/j.jhydrol.2020.125734</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Feng et al.(2020)Feng, Fang, and Shen</label><mixed-citation>Feng, D., Fang, K., and Shen, C.: Enhancing Streamflow Forecast and Extracting Insights Using Long-Short Term Memory Networks With Data Integration at Continental Scales, Water Resour. Res., 56, e2019WR026793, <ext-link xlink:href="https://doi.org/10.1029/2019WR026793" ext-link-type="DOI">10.1029/2019WR026793</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Gupta et al.(2009)Gupta, Kling, Yilmaz, and Martinez</label><mixed-citation>Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.: Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling, J. Hydrol., 377, 80–91, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2009.08.003" ext-link-type="DOI">10.1016/j.jhydrol.2009.08.003</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Hamill(2001)</label><mixed-citation>Hamill, T. M.: Interpretation of Rank Histograms for Verifying Ensemble Forecasts, Mon. Weather Rev., 129, 550–560, <ext-link xlink:href="https://doi.org/10.1175/1520-0493(2001)129&lt;0550:IORHFV&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0493(2001)129&lt;0550:IORHFV&gt;2.0.CO;2</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Harold et al.(2015)Harold, Barb, Beth, Chris, Johannes, Ian, Tieh-Yong, Paul, and David</label><mixed-citation>Harold, B., Barb, B., Beth, E., Chris, F., Johannes, J., Ian, J., Tieh-Yong, K., Paul, R., and David, S.: WWRP/WGNE Joint Working Group on Forecast Verification Research, <uri>https://www.cawcr.gov.au/projects/verification/</uri> (last access: 13 December 2024), 2015.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Hashemi et al.(2022)Hashemi, Brigode, Garambois, and Javelle</label><mixed-citation>Hashemi, R., Brigode, P., Garambois, P.-A., and Javelle, P.: How can we benefit from regime information to make more effective use of long short-term memory (LSTM) runoff models?, Hydrol. Earth Syst. Sci., 26, 5793–5816, <ext-link xlink:href="https://doi.org/10.5194/hess-26-5793-2022" ext-link-type="DOI">10.5194/hess-26-5793-2022</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Hersbach(2000)</label><mixed-citation>Hersbach, H.: Decomposition of the Continuous Ranked Probability Score for Ensemble Prediction Systems, Weather Forecast., 15, 559–570, <ext-link xlink:href="https://doi.org/10.1175/1520-0434(2000)015&lt;0559:DOTCRP&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0434(2000)015&lt;0559:DOTCRP&gt;2.0.CO;2</ext-link>, 2000.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Hidalgo and Jougla(2018)</label><mixed-citation>Hidalgo, J. and Jougla, R.: On the use of local weather types classification to improve climate understanding: An application on the urban climate of Toulouse, PLOS ONE, 13, e0208138, <ext-link xlink:href="https://doi.org/10.1371/journal.pone.0208138" ext-link-type="DOI">10.1371/journal.pone.0208138</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Hudson et al.(2020)Hudson, Alves, Hendon, Lim, Liu, Luo, MacLachlan, Marshall, Shi, Wang, Wedd, Young, Zhao, and Zhou</label><mixed-citation>Hudson, D., Alves, O., Hendon, H. H., Lim, E.-P., Liu, G., Luo, J.-J., MacLachlan, C., Marshall, A. G., Shi, L., Wang, G., Wedd, R., Young, G., Zhao, M., and Zhou, X.: Corrigendum to: ACCESS-S1: The new Bureau of Meteorology multi-week to seasonal prediction system, J. South. Hemis. Earth Syst. Sci., 70, 393, <ext-link xlink:href="https://doi.org/10.1071/ES17009_CO" ext-link-type="DOI">10.1071/ES17009_CO</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Hunter(2007)</label><mixed-citation>Hunter, J. D.: Matplotlib: A 2D Graphics Environment, Comput. Sci. Eng., 9, 90–95, <ext-link xlink:href="https://doi.org/10.1109/MCSE.2007.55" ext-link-type="DOI">10.1109/MCSE.2007.55</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Husic et al.(2022)Husic, Al-Aamery, and Fox</label><mixed-citation>Husic, A., Al-Aamery, N., and Fox, J. F.: Simulating hydrologic pathway contributions in fluvial and karst settings: An evaluation of conceptual, physically-based, and deep learning modeling approaches, J. Hydrol. X, 17, 100134, <ext-link xlink:href="https://doi.org/10.1016/j.hydroa.2022.100134" ext-link-type="DOI">10.1016/j.hydroa.2022.100134</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Jeannin et al.(2021)Jeannin, Artigue, Butscher, Chang, Charlier, Duran, Gill, Hartmann, Johannet, Jourde, Kavousi, Liesch, Liu, Lüthi, Malard, Mazzilli, Pardo-Igúzquiza, Thiéry, Reimann, Schuler, Wöhling, and Wunsch</label><mixed-citation>Jeannin, P.-Y., Artigue, G., Butscher, C., Chang, Y., Charlier, J.-B., Duran, L., Gill, L., Hartmann, A., Johannet, A., Jourde, H., Kavousi, A., Liesch, T., Liu, Y., Lüthi, M., Malard, A., Mazzilli, N., Pardo-Igúzquiza, E., Thiéry, D., Reimann, T., Schuler, P., Wöhling, T., and Wunsch, A.: Karst modelling challenge 1: Results of hydrological modelling, J. Hydrol., 600, 126508, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2021.126508" ext-link-type="DOI">10.1016/j.jhydrol.2021.126508</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>JetBrains(2024)</label><mixed-citation>JetBrains: PyCharm, <uri>https://www.jetbrains.com/pycharm/</uri> (last access: 20 March 2026), 2024.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Kitanidis and Bras(1980)</label><mixed-citation>Kitanidis, P. K. and Bras, R. L.: Real-time forecasting with a conceptual hydrologic model: 2. Applications and results, Water Resour. Res., 16, 1034–1044, <ext-link xlink:href="https://doi.org/10.1029/WR016i006p01034" ext-link-type="DOI">10.1029/WR016i006p01034</ext-link>, 1980.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Kluyver et al.(2016)Kluyver, Ragan-Kelley, Pérez, Granger, Bussonnier, Frederic, Kelley, Hamrick, Grout, Corlay, Ivanov, Avila, Abdalla, and Willing</label><mixed-citation>Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B., Bussonnier, M., Frederic, J., Kelley, K., Hamrick, J., Grout, J., Corlay, S., Ivanov, P., Avila, D., Abdalla, S., and Willing, C.: Jupyter Notebooks – a publishing format for reproducible computational workflows, in: Positioning and Power in Academic Publishing: Players, Agents and Agendas, IOS Press, 87–90, <ext-link xlink:href="https://doi.org/10.3233/978-1-61499-649-1-87" ext-link-type="DOI">10.3233/978-1-61499-649-1-87</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Kratzert et al.(2018)Kratzert, Klotz, Brenner, Schulz, and Herrnegger</label><mixed-citation>Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005–6022, <ext-link xlink:href="https://doi.org/10.5194/hess-22-6005-2018" ext-link-type="DOI">10.5194/hess-22-6005-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Kratzert et al.(2019)Kratzert, Klotz, Shalev, Klambauer, Hochreiter, and Nearing</label><mixed-citation>Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, G.: Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089–5110, <ext-link xlink:href="https://doi.org/10.5194/hess-23-5089-2019" ext-link-type="DOI">10.5194/hess-23-5089-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Lai et al.(2011)Lai, Gross, and Shen</label><mixed-citation> Lai, T. L., Gross, S. T., and Shen, D. B.: Evaluating probability forecasts, Ann. Stat., 39, 2356–2382, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Li et al.(2024)Li, Zhang, Chu, Shen, and Li</label><mixed-citation>Li, H., Zhang, C., Chu, W., Shen, D., and Li, R.: A process-driven deep learning hydrological model for daily rainfall-runoff simulation, J. Hydrol., 637, 131434, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2024.131434" ext-link-type="DOI">10.1016/j.jhydrol.2024.131434</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Liu and Wang(2024)</label><mixed-citation>Liu, X. and Wang, W.: Deep Time Series Forecasting Models: A Comprehensive Survey, Mathematics, 12, <ext-link xlink:href="https://doi.org/10.3390/math12101504" ext-link-type="DOI">10.3390/math12101504</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Mangin(1984)</label><mixed-citation>Mangin, A.: Pour une meilleure connaissance des systèmes hydrologiques à partir des analyses corrélatoire et spectrale, J. Hydrol., 67, 25–43, <ext-link xlink:href="https://doi.org/10.1016/0022-1694(84)90230-0" ext-link-type="DOI">10.1016/0022-1694(84)90230-0</ext-link>, 1984.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Matheson and Winkler(1976)</label><mixed-citation>Matheson, J. E. and Winkler, R. L.: Scoring Rules for Continuous Probability Distributions, Manage. Sci., 22, 1087–1096, <ext-link xlink:href="https://doi.org/10.1287/mnsc.22.10.1087" ext-link-type="DOI">10.1287/mnsc.22.10.1087</ext-link>, 1976.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>McKinney(2010)</label><mixed-citation>McKinney, W.: Data Structures for Statistical Computing in Python, in: Proceedings of the 9th Python in Science Conference, edited by: v. D. W. Stefan and Jarrod, M., Austin, Texas, USA, 56–61, <ext-link xlink:href="https://doi.org/10.25080/Majora-92bf1922-00a" ext-link-type="DOI">10.25080/Majora-92bf1922-00a</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>Murphy(1993)</label><mixed-citation>Murphy, A. H.: What Is a Good Forecast? An Essay on the Nature of Goodness in Weather Forecasting, Weather Forecast., 8, 281–293, <ext-link xlink:href="https://doi.org/10.1175/1520-0434(1993)008&lt;0281:WIAGFA&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0434(1993)008&lt;0281:WIAGFA&gt;2.0.CO;2</ext-link>, 1993.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>Nash and Sutcliffe(1970)</label><mixed-citation>Nash, J. E. and Sutcliffe, J. V.: River flow forecasting through conceptual models part I – A discussion of principles, J. Hydrol., 10, 282–290, <ext-link xlink:href="https://doi.org/10.1016/0022-1694(70)90255-6" ext-link-type="DOI">10.1016/0022-1694(70)90255-6</ext-link>, 1970.</mixed-citation></ref>
      <ref id="bib1.bibx42"><label>Nearing et al.(2022)Nearing, Klotz, Frame, Gauch, Gilon, Kratzert, Sampson, Shalev, and Nevo</label><mixed-citation>Nearing, G. S., Klotz, D., Frame, J. M., Gauch, M., Gilon, O., Kratzert, F., Sampson, A. K., Shalev, G., and Nevo, S.: Technical note: Data assimilation and autoregression for using near-real-time streamflow observations in long short-term memory networks, Hydrol. Earth Syst. Sci., 26, 5493–5513, <ext-link xlink:href="https://doi.org/10.5194/hess-26-5493-2022" ext-link-type="DOI">10.5194/hess-26-5493-2022</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Newman et al.(2017)Newman, Mizukami, Clark, Wood, Nijssen, and Nearing</label><mixed-citation>Newman, A. J., Mizukami, N., Clark, M. P., Wood, A. W., Nijssen, B., and Nearing, G.: Benchmarking of a Physically Based Hydrologic Model, J. Hydrometeorol, 18, 2215–2225, <ext-link xlink:href="https://doi.org/10.1175/JHM-D-16-0284.1" ext-link-type="DOI">10.1175/JHM-D-16-0284.1</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx44"><label>Oliveira et al.(2021)Oliveira, Rampinelli, Tozatto, Andreão, and Müller</label><mixed-citation>Oliveira, D. D., Rampinelli, M., Tozatto, G. Z., Andreão, R. V., and Müller, S. M. T.: Forecasting vehicular traffic flow using MLP and LSTM, Neural Comput. Appl., 33, 17245–17256, <ext-link xlink:href="https://doi.org/10.1007/s00521-021-06315-w" ext-link-type="DOI">10.1007/s00521-021-06315-w</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx45"><label>Pedregosa et al.(2012)Pedregosa, Varoquaux, Gramfort, Michel, Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, VanderPlas, Passos, Cournapeau, Brucher, Perrot, and Duchesnay</label><mixed-citation>Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., VanderPlas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.: Scikit-learn: Machine Learning in Python, CoRR, abs/1201.0490, arXiv [preprint], <ext-link xlink:href="https://doi.org/10.48550/arXiv.1201.0490" ext-link-type="DOI">10.48550/arXiv.1201.0490</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx46"><label>Pelletier and Andréassian(2024)</label><mixed-citation>Pelletier, A. and Andréassian, V.: An underground view of surface hydrology: what can piezometers tell us about river floods and droughts?, Comptes Rendus. Géoscience, 355, 271–280, <ext-link xlink:href="https://doi.org/10.5802/crgeos.195" ext-link-type="DOI">10.5802/crgeos.195</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx47"><label>Perrin et al.(2003)Perrin, Michel, and Andréassian</label><mixed-citation>Perrin, C., Michel, C., and Andréassian, V.: Improvement of a parsimonious model for streamflow simulation, J. Hydrol., 279, 275–289, <ext-link xlink:href="https://doi.org/10.1016/S0022-1694(03)00225-7" ext-link-type="DOI">10.1016/S0022-1694(03)00225-7</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx48"><label>Petropoulos et al.(2022)Petropoulos, Apiletti, Assimakopoulos, Babai, Barrow, Ben Taieb, Bergmeir, Bessa, Bijak, Boylan, Browell, Carnevale, Castle, Cirillo, Clements, Cordeiro, Cyrino Oliveira, De Baets, Dokumentov, Ellison, Fiszeder, Franses, Frazier, Gilliland, Gönül, Goodwin, Grossi, Grushka-Cockayne, Guidolin, Guidolin, Gunter, Guo, Guseo, Harvey, Hendry, Hollyman, Januschowski, Jeon, Jose, Kang, Koehler, Kolassa, Kourentzes, Leva, Li, Litsiou, Makridakis, Martin, Martinez, Meeran, Modis, Nikolopoulos, Önkal, Paccagnini, Panagiotelis, Panapakidis, Pavía, Pedio, Pedregal, Pinson, Ramos, Rapach, Reade, Rostami-Tabar, Rubaszek, Sermpinis, Shang, Spiliotis, Syntetos, Talagala, Talagala, Tashman, Thomakos, Thorarinsdottir, Todini, Trapero Arenas, Wang, Winkler, Yusupova, and Ziel</label><mixed-citation>Petropoulos, F., Apiletti, D., Assimakopoulos, V., Babai, M. Z., Barrow, D. K., Ben Taieb, S., Bergmeir, C., Bessa, R. J., Bijak, J., Boylan, J. E., Browell, J., Carnevale, C., Castle, J. L., Cirillo, P., Clements, M. P., Cordeiro, C., Cyrino Oliveira, F. L., De Baets, S., Dokumentov, A., Ellison, J., Fiszeder, P., Franses, P. H., Frazier, D. T., Gilliland, M., Gönül, M. S., Goodwin, P., Grossi, L., Grushka-Cockayne, Y., Guidolin, M., Guidolin, M., Gunter, U., Guo, X., Guseo, R., Harvey, N., Hendry, D. F., Hollyman, R., Januschowski, T., Jeon, J., Jose, V. R. R., Kang, Y., Koehler, A. B., Kolassa, S., Kourentzes, N., Leva, S., Li, F., Litsiou, K., Makridakis, S., Martin, G. M., Martinez, A. B., Meeran, S., Modis, T., Nikolopoulos, K., Önkal, D., Paccagnini, A., Panagiotelis, A., Panapakidis, I., Pavía, J. M., Pedio, M., Pedregal, D. J., Pinson, P., Ramos, P., Rapach, D. E., Reade, J. J., Rostami-Tabar, B., Rubaszek, M., Sermpinis, G., Shang, H. L., Spiliotis, E., Syntetos, A. A., Talagala, P. D., Talagala, T. S., Tashman, L., Thomakos, D., Thorarinsdottir, T., Todini, E., Trapero Arenas, J. R., Wang, X., Winkler, R. L., Yusupova, A., and Ziel, F.: Forecasting: theory and practice, Int. J. Forecast., 38, 705–871, <ext-link xlink:href="https://doi.org/10.1016/j.ijforecast.2021.11.001" ext-link-type="DOI">10.1016/j.ijforecast.2021.11.001</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx49"><label>Philip et al.(2020)Philip, Kew, van Oldenborgh, Otto, Vautard, van der Wiel, King, Lott, Arrighi, Singh, and van Aalst</label><mixed-citation>Philip, S., Kew, S., van Oldenborgh, G. J., Otto, F., Vautard, R., van der Wiel, K., King, A., Lott, F., Arrighi, J., Singh, R., and van Aalst, M.: A protocol for probabilistic extreme event attribution analyses, Adv. Stat. Climatol. Meteorol. Oceanogr., 6, 177–203, <ext-link xlink:href="https://doi.org/10.5194/ascmo-6-177-2020" ext-link-type="DOI">10.5194/ascmo-6-177-2020</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx50"><label>Piazzi et al.(2021)Piazzi, Thirel, Perrin, and Delaigue</label><mixed-citation>Piazzi, G., Thirel, G., Perrin, C., and Delaigue, O.: Sequential Data Assimilation for Streamflow Forecasting: Assessing the Sensitivity to Uncertainties and Updated Variables of a Conceptual Hydrological Model at Basin Scale, Water Resour. Res., 57, <ext-link xlink:href="https://doi.org/10.1029/2020WR028390" ext-link-type="DOI">10.1029/2020WR028390</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx51"><label>Pölz et al.(2024)Pölz, Blaschke, Komma, Farnleitner, and Derx</label><mixed-citation>Pölz, A., Blaschke, A. P., Komma, J., Farnleitner, A. H., and Derx, J.: Transformer Versus LSTM: A Comparison of Deep Learning Models for Karst Spring Discharge Forecasting, Water Resour. Res., 60, e2022WR032602, <ext-link xlink:href="https://doi.org/10.1029/2022WR032602" ext-link-type="DOI">10.1029/2022WR032602</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx52"><label>Rahbar et al.(2022)Rahbar, Mirarabi, Nakhaei, Talkhabi, and Jamali</label><mixed-citation>Rahbar, A., Mirarabi, A., Nakhaei, M., Talkhabi, M., and Jamali, M.: A Comparative Analysis of Data-Driven Models (SVR, ANFIS, and ANNs) for Daily Karst Spring Discharge Prediction, Water Resour. Res., 36, 589–609, <ext-link xlink:href="https://doi.org/10.1007/s11269-021-03041-9" ext-link-type="DOI">10.1007/s11269-021-03041-9</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx53"><label>Rentschler et al.(2023)Rentschler, Avner, Marconcini, Su, Strano, Vousdoukas, and Hallegatte</label><mixed-citation>Rentschler, J., Avner, P., Marconcini, M., Su, R., Strano, E., Vousdoukas, M., and Hallegatte, S.: Global evidence of rapid urban growth in flood zones since 1985, Nature, 622, 87–92, <ext-link xlink:href="https://doi.org/10.1038/s41586-023-06468-9" ext-link-type="DOI">10.1038/s41586-023-06468-9</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx54"><label>Rosenblatt(1958)</label><mixed-citation>Rosenblatt, F.: The perceptron: A probabilistic model for information storage and organization in the brain., Psycholog. Rev., 65, 386–408, <ext-link xlink:href="https://doi.org/10.1037/h0042519" ext-link-type="DOI">10.1037/h0042519</ext-link>, 1958.</mixed-citation></ref>
      <ref id="bib1.bibx55"><label>Saint Fleur et al.(2020)Saint Fleur, Artigue, Johannet, and Pistre</label><mixed-citation> Saint Fleur, B. E., Artigue, G., Johannet, A., and Pistre, S.: Deep Multilayer Perceptron for Knowledge Extraction: Understanding the Gardon de Mialet Flash Floods Modeling, in: Theory and Applications of Time Series Analysis, edited by: Valenzuela, O., Rojas, F., Herrera, L. J., Pomares, H., and Rojas, I., Springer International Publishing, Cham, 333–348, ISBN 978-3-030-56219-9, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx56"><label>Saint-Fleur et al.(2023)Saint-Fleur, Allier, Lassara, Rivet, Artigue, Pistre, and Johannet</label><mixed-citation>Saint-Fleur, B. E., Allier, S., Lassara, E., Rivet, A., Artigue, G., Pistre, S., and Johannet, A.: Towards a better consideration of rainfall and hydrological spatial features by a deep neural network model to improve flash floods forecasting: case study on the Gardon basin, France, Model. Earth Syst. Environ., 9, 3693–3708, <ext-link xlink:href="https://doi.org/10.1007/s40808-022-01650-w" ext-link-type="DOI">10.1007/s40808-022-01650-w</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx57"><label>Schiermeier(2018)</label><mixed-citation>Schiermeier, Q.: Droughts, heatwaves and floods: How to tell when climate change is to blame, Nature, 560, 20–22, <ext-link xlink:href="https://doi.org/10.1038/d41586-018-05849-9" ext-link-type="DOI">10.1038/d41586-018-05849-9</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx58"><label>Seillier-Moiseiwitsch and Dawid(1993)</label><mixed-citation>Seillier-Moiseiwitsch, F. and Dawid, A. P.: On Testing the Validity of Sequential Probability Forecasts, J. Am. Stat. Assoc., 88, 355–359, <ext-link xlink:href="https://doi.org/10.2307/2290731" ext-link-type="DOI">10.2307/2290731</ext-link>, 1993.</mixed-citation></ref>
      <ref id="bib1.bibx59"><label>Slater et al.(2019)Slater, Villarini, and Bradley</label><mixed-citation>Slater, L. J., Villarini, G., and Bradley, A. A.: Evaluation of the skill of North-American Multi-Model Ensemble (NMME) Global Climate Models in predicting average and extreme precipitation and temperature over the continental USA, Clim. Dynam., 53, 7381–7396, <ext-link xlink:href="https://doi.org/10.1007/s00382-016-3286-1" ext-link-type="DOI">10.1007/s00382-016-3286-1</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx60"><label>Talagrand et al.(1997)Talagrand, Vautard, and Strauss</label><mixed-citation> Talagrand, O., Vautard, R., and Strauss, B.: Evaluation of probabilistic prediction systems, PhD thesis, Shinfield Park, Reading, 1997.</mixed-citation></ref>
      <ref id="bib1.bibx61"><label>Teräsvirta et al.(2010)Teräsvirta, Tjøstheim, and Granger</label><mixed-citation>Teräsvirta, T., Tjøstheim, D., and Granger, C. W. J.: Modelling Nonlinear Economic Time Series, Oxford University Press, ISBN 9780199587148, <ext-link xlink:href="https://doi.org/10.1093/acprof:oso/9780199587148.001.0001" ext-link-type="DOI">10.1093/acprof:oso/9780199587148.001.0001</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx62"><label>Terven et al.(2025)Terven, Cordova-Esparza, Romero-González, Ramírez-Pedraza, and Chávez-Urbiola</label><mixed-citation>Terven, J., Cordova-Esparza, D.-M., Romero-González, J.-A., Ramírez-Pedraza, A., and Chávez-Urbiola, E. A.: A comprehensive survey of loss functions and metrics in deep learning, Artif. Intel. Rev., 58, 195, <ext-link xlink:href="https://doi.org/10.1007/s10462-025-11198-7" ext-link-type="DOI">10.1007/s10462-025-11198-7</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx63"><label>van der Walt et al.(2011)Walt, Colbert, and Varoquaux</label><mixed-citation>van der Walt, S., Colbert, S. C., and Varoquaux, G.: The NumPy Array: A Structure for Efficient Numerical Computation, Comput. Sci. Eng., 13, 22–30, <ext-link xlink:href="https://doi.org/10.1109/MCSE.2011.37" ext-link-type="DOI">10.1109/MCSE.2011.37</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx64"><label>van Rossum(1995)</label><mixed-citation> van Rossum, G.: Python tutorial, CWI – Centrum voor Wiskunde en Informatica, Amsterdam, the Netherlands, 1995.</mixed-citation></ref>
      <ref id="bib1.bibx65"><label>Vitart et al.(2017)Vitart, Ardilouze, Bonet, Brookshaw, Chen, Codorean, Déqué, Ferranti, Fucile, Fuentes, Hendon, Hodgson, Kang, Kumar, Lin, Liu, Liu, Malguzzi, Mallas, Manoussakis, Mastrangelo, MacLachlan, McLean, Minami, Mladek, Nakazawa, Najm, Nie, Rixen, Robertson, Ruti, Sun, Takaya, Tolstykh, Venuti, Waliser, Woolnough, Wu, Won, Xiao, Zaripov, and Zhang</label><mixed-citation>Vitart, F., Ardilouze, C., Bonet, A., Brookshaw, A., Chen, M., Codorean, C., Déqué, M., Ferranti, L., Fucile, E., Fuentes, M., Hendon, H., Hodgson, J., Kang, H.-S., Kumar, A., Lin, H., Liu, G., Liu, X., Malguzzi, P., Mallas, I., Manoussakis, M., Mastrangelo, D., MacLachlan, C., McLean, P., Minami, A., Mladek, R., Nakazawa, T., Najm, S., Nie, Y., Rixen, M., Robertson, A. W., Ruti, P., Sun, C., Takaya, Y., Tolstykh, M., Venuti, F., Waliser, D., Woolnough, S., Wu, T., Won, D.-J., Xiao, H., Zaripov, R., and Zhang, L.: The Subseasonal to Seasonal (S2S) Prediction Project Database, B. Am. Meteorol. Soc., 98, 163–173, <ext-link xlink:href="https://doi.org/10.1175/BAMS-D-16-0017.1" ext-link-type="DOI">10.1175/BAMS-D-16-0017.1</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx66"><label>Waskom(2021)</label><mixed-citation>Waskom, M. L.: seaborn: statistical data visualization, J. Open Sour. Softw., 6, <ext-link xlink:href="https://doi.org/10.21105/joss.03021" ext-link-type="DOI">10.21105/joss.03021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx67"><label>Werbos(1974)</label><mixed-citation>Werbos, P.: Beyond regression: New tools for prediction and analysis in the behavioral sciences, PhD thesis, Committee on Applied Mathematics, Harvard University, Cambridge, MA, <uri>https://gwern.net/doc/ai/nn/1974-werbos.pdf</uri> (last access: 5 June 2026), 1974.</mixed-citation></ref>
      <ref id="bib1.bibx68"><label>Werbos(1988)</label><mixed-citation>Werbos, P.: Backpropagation: Past and future, in: IEEE 1988 International Conference on Neural Networks, 343–353, <ext-link xlink:href="https://doi.org/10.1109/ICNN.1988.23866" ext-link-type="DOI">10.1109/ICNN.1988.23866</ext-link>, 1988. </mixed-citation></ref>
      <ref id="bib1.bibx69"><label>Whitaker and Loughe(1998)</label><mixed-citation>Whitaker, J. S. and Loughe, A. F.: The Relationship between Ensemble Spread and Ensemble Mean Skill, Mon. Weather Rev., 126, 3292–3302, <ext-link xlink:href="https://doi.org/10.1175/1520-0493(1998)126&lt;3292:TRBESA&gt;2.0.CO;2" ext-link-type="DOI">10.1175/1520-0493(1998)126&lt;3292:TRBESA&gt;2.0.CO;2</ext-link>, 1998.</mixed-citation></ref>
      <ref id="bib1.bibx70"><label>Wunsch et al.(2021)Wunsch, Liesch, and Broda</label><mixed-citation>Wunsch, A., Liesch, T., and Broda, S.: Groundwater level forecasting with artificial neural networks: a comparison of long short-term memory (LSTM), convolutional neural networks (CNNs), and non-linear autoregressive networks with exogenous input (NARX), Hydrol. Earth Syst. Sci., 25, 1671–1687, <ext-link xlink:href="https://doi.org/10.5194/hess-25-1671-2021" ext-link-type="DOI">10.5194/hess-25-1671-2021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx71"><label>Yang et al.(2020)</label><mixed-citation>Yang, C., Yuan, H., and Su, X.: Bias correction of ensemble precipitation forecasts in the improvement of summer streamflow prediction skill, J. Hydrol., 588, 124955, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2020.124955" ext-link-type="DOI">10.1016/j.jhydrol.2020.124955</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx72"><label>Yang et al.(2025)Yang, Pan, Feng, Xiao, Dixon, Hartman, Shen, Song, Sengupta, Delle Monache, and Ralph</label><mixed-citation>Yang, Y., Pan, M., Feng, D., Xiao, M., Dixon, T., Hartman, R., Shen, C., Song, Y., Sengupta, A., Delle Monache, L., and Ralph, F. M.: Improving streamflow simulation through machine learning-powered data integration and its potential for forecasting in the Western U.S., Hydrol. Earth Syst. Sci., 29, 5453–5476, <ext-link xlink:href="https://doi.org/10.5194/hess-29-5453-2025" ext-link-type="DOI">10.5194/hess-29-5453-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx73"><label>Zalachori et al.(2012)Zalachori, Ramos, Garçon, Mathevet, and Gailhard</label><mixed-citation>Zalachori, I., Ramos, M.-H., Garçon, R., Mathevet, T., and Gailhard, J.: Statistical processing of forecasts for hydrological ensemble prediction: a comparative study of different bias correction strategies, Adv. Sci. Res., 8, 135–141, <ext-link xlink:href="https://doi.org/10.5194/asr-8-135-2012" ext-link-type="DOI">10.5194/asr-8-135-2012</ext-link>, 2012.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Testing discharge assimilation strategies to enhance short-range AI-based operational rainfall–runoff forecasts</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Addor et al.(2017)Addor, Newman, Mizukami, and
Clark</label><mixed-citation>
      
Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, <a href="https://doi.org/10.5194/hess-21-5293-2017" target="_blank">https://doi.org/10.5194/hess-21-5293-2017</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Anctil et al.(2004)Anctil, Michel, Perrin, and
Andréassian</label><mixed-citation>
      
Anctil, F., Michel, C., Perrin, C., and Andréassian, V.: A soil moisture index as an auxiliary ANN input for stream flow forecasting, J. Hydrol., 286, 155–167, <a href="https://doi.org/10.1016/j.jhydrol.2003.09.006" target="_blank">https://doi.org/10.1016/j.jhydrol.2003.09.006</a>, 2004.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Anon(2020)</label><mixed-citation>
      
Anon: Anaconda Software Distribution, <a href="https://www.anaconda.com" target="_blank"/> (last access: 31 July 2024), 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Atmaja and Akagi(2020)</label><mixed-citation>
      
Atmaja, B. T. and Akagi, M.: Deep Multilayer Perceptrons for Dimensional
Speech Emotion Recognition, arXiv [preprint], <a href="https://doi.org/10.48550/arXiv.2004.02355" target="_blank">https://doi.org/10.48550/arXiv.2004.02355</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Bell et al.(2021)Bell, Spring, Brady, Andrew, Squire, Blackwood,
Sitter, and Chegini</label><mixed-citation>
      
Bell, R., Spring, A., Brady, R., Andrew, Squire, D., Blackwood, Z., Sitter,
M. C., and Chegini, T.: xarray-contrib/xskillscore: Release v0.0.23,
Zenodo [data set], <a href="https://doi.org/10.5281/zenodo.5173153" target="_blank">https://doi.org/10.5281/zenodo.5173153</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Boucher et al.(2020)Boucher, Quilty, and Adamowski</label><mixed-citation>
      
Boucher, M.-A., Quilty, J., and Adamowski, J.: Data Assimilation for
Streamflow Forecasting Using Extreme Learning Machines and Multilayer
Perceptrons, Water Resour. Res., 56, e2019WR026226, <a href="https://doi.org/10.1029/2019WR026226" target="_blank">https://doi.org/10.1029/2019WR026226</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Bourgin et al.(2014)Bourgin, Ramos, Thirel, and
Andréassian</label><mixed-citation>
      
Bourgin, F., Ramos, M. H., Thirel, G., and Andréassian, V.: Investigating the interactions between data assimilation and post-processing
in hydrological ensemble forecasting, J. Hydrol., 519, 2775–2784,
<a href="https://doi.org/10.1016/j.jhydrol.2014.07.054" target="_blank">https://doi.org/10.1016/j.jhydrol.2014.07.054</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Bradley and Schwartz(2011)</label><mixed-citation>
      
Bradley, A. A. and Schwartz, S. S.: Summary Verification Measures and Their
Interpretation for Ensemble Forecasts, Mon. Weather Rev., 139,
3075–3089, <a href="https://doi.org/10.1175/2010MWR3305.1" target="_blank">https://doi.org/10.1175/2010MWR3305.1</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Brier(1950)</label><mixed-citation>
      
Brier, G. W.: Verification of forecasts expressed in terms of probability,
Mon. Weather Rev., 78, 1–3, <a href="https://doi.org/10.1175/1520-0493(1950)078&lt;0001:VOFEIT&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0493(1950)078&lt;0001:VOFEIT&gt;2.0.CO;2</a>, 1950.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Buizza et al.(2005)Buizza, Houtekamer, Pellerin, Toth, Zhu, and
Wei</label><mixed-citation>
      
Buizza, R., Houtekamer, P. L., Pellerin, G., Toth, Z., Zhu, Y., and Wei, M.: A Comparison of the ECMWF, MSC, and NCEP Global Ensemble Prediction Systems, Mon. Weather Rev., 133, 1076–1097, <a href="https://doi.org/10.1175/MWR2905.1" target="_blank">https://doi.org/10.1175/MWR2905.1</a>, 2005.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Chevillon(2007)</label><mixed-citation>
      
Chevillon, G.: Direct multi-step estimation and forecasting, J. Econ. Surv., 21, 746–785, <a href="https://doi.org/10.1111/j.1467-6419.2007.00518.x" target="_blank">https://doi.org/10.1111/j.1467-6419.2007.00518.x</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Clark et al.(2008)Clark, Rupp, Woods, Zheng, Ibbitt, Slater, Schmidt, and Uddstrom</label><mixed-citation>
      
Clark, M. P., Rupp, D. E., Woods, R. A., Zheng, X., Ibbitt, R. P., Slater,
A. G., Schmidt, J., and Uddstrom, M. J.: Hydrological data assimilation with
the ensemble Kalman filter: Use of streamflow observations to update states
in a distributed hydrological model, Adv. Water Resour., 31, 1309–1324, <a href="https://doi.org/10.1016/j.advwatres.2008.06.005" target="_blank">https://doi.org/10.1016/j.advwatres.2008.06.005</a>, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Corradini et al.(1986)Corradini, Melone, and
Ubertini</label><mixed-citation>
      
Corradini, C., Melone, F., and Ubertini, L.: A semi-distributed adaptive model for real-time flood forecasting, J. Am. Water Resour. Assoc., 22, 1031–1038, 1986.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Crochemore et al.(2017)Crochemore, Ramos, Pappenberger, and
Perrin</label><mixed-citation>
      
Crochemore, L., Ramos, M.-H., Pappenberger, F., and Perrin, C.: Seasonal
streamflow forecasting by conditioning climatology with precipitation
indices, Hydrol. Earth Syst. Sci., 21, 1573–1591,
<a href="https://doi.org/10.5194/hess-21-1573-2017" target="_blank">https://doi.org/10.5194/hess-21-1573-2017</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Day(1985)</label><mixed-citation>
      
Day, G. N.: Extended Streamflow Forecasting Using NWSRFS, J. Water Resour. Plan. Manage., 111, 157–170, <a href="https://doi.org/10.1061/(ASCE)0733-9496(1985)111:2(157)" target="_blank">https://doi.org/10.1061/(ASCE)0733-9496(1985)111:2(157)</a>, 1985.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Delaigue et al.(2025)Delaigue, Guimarães, Brigode, Génot, Perrin, Soubeyroux, Janet, Addor, and Andréassian</label><mixed-citation>
      
Delaigue, O., Guimarães, G. M., Brigode, P., Génot, B., Perrin, C.,
Soubeyroux, J.-M., Janet, B., Addor, N., and Andréassian, V.:
CAMELS-FR dataset: a large-sample hydroclimatic dataset for France to
explore hydrological diversity and support model benchmarking, Earth Syst. Sci. Data, 17, 1461–1479, <a href="https://doi.org/10.5194/essd-17-1461-2025" target="_blank">https://doi.org/10.5194/essd-17-1461-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Fang et al.(2021)Fang, Wang, Peng, and Hong</label><mixed-citation>
      
Fang, Z., Wang, Y., Peng, L., and Hong, H.: Predicting flood susceptibility
using LSTM neural networks, J. Hydrol., 594, 125734, <a href="https://doi.org/10.1016/j.jhydrol.2020.125734" target="_blank">https://doi.org/10.1016/j.jhydrol.2020.125734</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Feng et al.(2020)Feng, Fang, and Shen</label><mixed-citation>
      
Feng, D., Fang, K., and Shen, C.: Enhancing Streamflow Forecast and Extracting Insights Using Long-Short Term Memory Networks With Data Integration at Continental Scales, Water Resour. Res., 56, e2019WR026793, <a href="https://doi.org/10.1029/2019WR026793" target="_blank">https://doi.org/10.1029/2019WR026793</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Gupta et al.(2009)Gupta, Kling, Yilmaz, and Martinez</label><mixed-citation>
      
Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.: Decomposition of the mean squared error and NSE performance criteria: Implications for
improving hydrological modelling, J. Hydrol., 377, 80–91, <a href="https://doi.org/10.1016/j.jhydrol.2009.08.003" target="_blank">https://doi.org/10.1016/j.jhydrol.2009.08.003</a>, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Hamill(2001)</label><mixed-citation>
      
Hamill, T. M.: Interpretation of Rank Histograms for Verifying Ensemble
Forecasts, Mon. Weather Rev., 129, 550–560, <a href="https://doi.org/10.1175/1520-0493(2001)129&lt;0550:IORHFV&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0493(2001)129&lt;0550:IORHFV&gt;2.0.CO;2</a>, 2001.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Harold et al.(2015)Harold, Barb, Beth, Chris, Johannes, Ian,
Tieh-Yong, Paul, and David</label><mixed-citation>
      
Harold, B., Barb, B., Beth, E., Chris, F., Johannes, J., Ian, J., Tieh-Yong,
K., Paul, R., and David, S.: WWRP/WGNE Joint Working Group on Forecast
Verification Research,
<a href="https://www.cawcr.gov.au/projects/verification/" target="_blank"/> (last access: 13 December 2024), 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Hashemi et al.(2022)Hashemi, Brigode, Garambois, and
Javelle</label><mixed-citation>
      
Hashemi, R., Brigode, P., Garambois, P.-A., and Javelle, P.: How can we
benefit from regime information to make more effective use of long short-term
memory (LSTM) runoff models?, Hydrol. Earth Syst. Sci., 26, 5793–5816, <a href="https://doi.org/10.5194/hess-26-5793-2022" target="_blank">https://doi.org/10.5194/hess-26-5793-2022</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Hersbach(2000)</label><mixed-citation>
      
Hersbach, H.: Decomposition of the Continuous Ranked Probability Score for
Ensemble Prediction Systems, Weather Forecast., 15, 559–570,
<a href="https://doi.org/10.1175/1520-0434(2000)015&lt;0559:DOTCRP&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0434(2000)015&lt;0559:DOTCRP&gt;2.0.CO;2</a>, 2000.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Hidalgo and Jougla(2018)</label><mixed-citation>
      
Hidalgo, J. and Jougla, R.: On the use of local weather types classification
to improve climate understanding: An application on the urban climate of
Toulouse, PLOS ONE, 13, e0208138, <a href="https://doi.org/10.1371/journal.pone.0208138" target="_blank">https://doi.org/10.1371/journal.pone.0208138</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Hudson et al.(2020)Hudson, Alves, Hendon, Lim, Liu, Luo, MacLachlan, Marshall, Shi, Wang, Wedd, Young, Zhao, and Zhou</label><mixed-citation>
      
Hudson, D., Alves, O., Hendon, H. H., Lim, E.-P., Liu, G., Luo, J.-J.,
MacLachlan, C., Marshall, A. G., Shi, L., Wang, G., Wedd, R., Young, G.,
Zhao, M., and Zhou, X.: Corrigendum to: ACCESS-S1: The new Bureau of
Meteorology multi-week to seasonal prediction system, J. South. Hemis. Earth Syst. Sci., 70, 393, <a href="https://doi.org/10.1071/ES17009_CO" target="_blank">https://doi.org/10.1071/ES17009_CO</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Hunter(2007)</label><mixed-citation>
      
Hunter, J. D.: Matplotlib: A 2D Graphics Environment, Comput. Sci. Eng., 9, 90–95, <a href="https://doi.org/10.1109/MCSE.2007.55" target="_blank">https://doi.org/10.1109/MCSE.2007.55</a>, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Husic et al.(2022)Husic, Al-Aamery, and Fox</label><mixed-citation>
      
Husic, A., Al-Aamery, N., and Fox, J. F.: Simulating hydrologic pathway
contributions in fluvial and karst settings: An evaluation of conceptual,
physically-based, and deep learning modeling approaches, J. Hydrol. X, 17, 100134, <a href="https://doi.org/10.1016/j.hydroa.2022.100134" target="_blank">https://doi.org/10.1016/j.hydroa.2022.100134</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Jeannin et al.(2021)Jeannin, Artigue, Butscher, Chang, Charlier,
Duran, Gill, Hartmann, Johannet, Jourde, Kavousi, Liesch, Liu, Lüthi,
Malard, Mazzilli, Pardo-Igúzquiza, Thiéry, Reimann, Schuler,
Wöhling, and Wunsch</label><mixed-citation>
      
Jeannin, P.-Y., Artigue, G., Butscher, C., Chang, Y., Charlier, J.-B., Duran,
L., Gill, L., Hartmann, A., Johannet, A., Jourde, H., Kavousi, A., Liesch,
T., Liu, Y., Lüthi, M., Malard, A., Mazzilli, N., Pardo-Igúzquiza, E., Thiéry, D., Reimann, T., Schuler, P., Wöhling, T., and Wunsch, A.: Karst modelling challenge 1: Results of hydrological modelling, J. Hydrol., 600, 126508, <a href="https://doi.org/10.1016/j.jhydrol.2021.126508" target="_blank">https://doi.org/10.1016/j.jhydrol.2021.126508</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>JetBrains(2024)</label><mixed-citation>
      
JetBrains: PyCharm, <a href="https://www.jetbrains.com/pycharm/" target="_blank"/> (last access: 20 March 2026), 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Kitanidis and Bras(1980)</label><mixed-citation>
      
Kitanidis, P. K. and Bras, R. L.: Real-time forecasting with a conceptual
hydrologic model: 2. Applications and results, Water Resour. Res., 16,
1034–1044, <a href="https://doi.org/10.1029/WR016i006p01034" target="_blank">https://doi.org/10.1029/WR016i006p01034</a>, 1980.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Kluyver et al.(2016)Kluyver, Ragan-Kelley, Pérez, Granger,
Bussonnier, Frederic, Kelley, Hamrick, Grout, Corlay, Ivanov, Avila, Abdalla, and Willing</label><mixed-citation>
      
Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B., Bussonnier, M.,
Frederic, J., Kelley, K., Hamrick, J., Grout, J., Corlay, S., Ivanov, P.,
Avila, D., Abdalla, S., and Willing, C.: Jupyter Notebooks – a publishing
format for reproducible computational workflows, in: Positioning and Power
in Academic Publishing: Players, Agents and Agendas, IOS Press, 87–90, <a href="https://doi.org/10.3233/978-1-61499-649-1-87" target="_blank">https://doi.org/10.3233/978-1-61499-649-1-87</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Kratzert et al.(2018)Kratzert, Klotz, Brenner, Schulz, and
Herrnegger</label><mixed-citation>
      
Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.:
Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks,
Hydrol. Earth Syst. Sci., 22, 6005–6022, <a href="https://doi.org/10.5194/hess-22-6005-2018" target="_blank">https://doi.org/10.5194/hess-22-6005-2018</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Kratzert et al.(2019)Kratzert, Klotz, Shalev, Klambauer, Hochreiter, and Nearing</label><mixed-citation>
      
Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and
Nearing, G.: Towards learning universal, regional, and local hydrological
behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089–5110, <a href="https://doi.org/10.5194/hess-23-5089-2019" target="_blank">https://doi.org/10.5194/hess-23-5089-2019</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Lai et al.(2011)Lai, Gross, and Shen</label><mixed-citation>
      
Lai, T. L., Gross, S. T., and Shen, D. B.: Evaluating probability forecasts, Ann. Stat., 39, 2356–2382, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Li et al.(2024)Li, Zhang, Chu, Shen, and Li</label><mixed-citation>
      
Li, H., Zhang, C., Chu, W., Shen, D., and Li, R.: A process-driven deep
learning hydrological model for daily rainfall-runoff simulation, J. Hydrol., 637, 131434, <a href="https://doi.org/10.1016/j.jhydrol.2024.131434" target="_blank">https://doi.org/10.1016/j.jhydrol.2024.131434</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Liu and Wang(2024)</label><mixed-citation>
      
Liu, X. and Wang, W.: Deep Time Series Forecasting Models: A Comprehensive
Survey, Mathematics, 12, <a href="https://doi.org/10.3390/math12101504" target="_blank">https://doi.org/10.3390/math12101504</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Mangin(1984)</label><mixed-citation>
      
Mangin, A.: Pour une meilleure connaissance des systèmes hydrologiques
à partir des analyses corrélatoire et spectrale, J. Hydrol., 67, 25–43, <a href="https://doi.org/10.1016/0022-1694(84)90230-0" target="_blank">https://doi.org/10.1016/0022-1694(84)90230-0</a>, 1984.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Matheson and Winkler(1976)</label><mixed-citation>
      
Matheson, J. E. and Winkler, R. L.: Scoring Rules for Continuous Probability
Distributions, Manage. Sci., 22, 1087–1096, <a href="https://doi.org/10.1287/mnsc.22.10.1087" target="_blank">https://doi.org/10.1287/mnsc.22.10.1087</a>, 1976.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>McKinney(2010)</label><mixed-citation>
      
McKinney, W.: Data Structures for Statistical Computing in Python, in:
Proceedings of the 9th Python in Science Conference, edited by: v.
D. W. Stefan and Jarrod, M., Austin, Texas, USA, 56–61,
<a href="https://doi.org/10.25080/Majora-92bf1922-00a" target="_blank">https://doi.org/10.25080/Majora-92bf1922-00a</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Murphy(1993)</label><mixed-citation>
      
Murphy, A. H.: What Is a Good Forecast? An Essay on the Nature of Goodness in Weather Forecasting, Weather Forecast., 8, 281–293,
<a href="https://doi.org/10.1175/1520-0434(1993)008&lt;0281:WIAGFA&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0434(1993)008&lt;0281:WIAGFA&gt;2.0.CO;2</a>, 1993.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Nash and Sutcliffe(1970)</label><mixed-citation>
      
Nash, J. E. and Sutcliffe, J. V.: River flow forecasting through conceptual
models part I – A discussion of principles, J. Hydrol., 10, 282–290, <a href="https://doi.org/10.1016/0022-1694(70)90255-6" target="_blank">https://doi.org/10.1016/0022-1694(70)90255-6</a>, 1970.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Nearing et al.(2022)Nearing, Klotz, Frame, Gauch, Gilon, Kratzert,
Sampson, Shalev, and Nevo</label><mixed-citation>
      
Nearing, G. S., Klotz, D., Frame, J. M., Gauch, M., Gilon, O., Kratzert, F.,
Sampson, A. K., Shalev, G., and Nevo, S.: Technical note: Data assimilation
and autoregression for using near-real-time streamflow observations in long
short-term memory networks, Hydrol. Earth Syst. Sci., 26, 5493–5513, <a href="https://doi.org/10.5194/hess-26-5493-2022" target="_blank">https://doi.org/10.5194/hess-26-5493-2022</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Newman et al.(2017)Newman, Mizukami, Clark, Wood, Nijssen, and
Nearing</label><mixed-citation>
      
Newman, A. J., Mizukami, N., Clark, M. P., Wood, A. W., Nijssen, B., and
Nearing, G.: Benchmarking of a Physically Based Hydrologic Model, J. Hydrometeorol, 18, 2215–2225, <a href="https://doi.org/10.1175/JHM-D-16-0284.1" target="_blank">https://doi.org/10.1175/JHM-D-16-0284.1</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Oliveira et al.(2021)Oliveira, Rampinelli, Tozatto, Andreão,
and Müller</label><mixed-citation>
      
Oliveira, D. D., Rampinelli, M., Tozatto, G. Z., Andreão, R. V., and
Müller, S. M. T.: Forecasting vehicular traffic flow using MLP and
LSTM, Neural Comput. Appl., 33, 17245–17256, <a href="https://doi.org/10.1007/s00521-021-06315-w" target="_blank">https://doi.org/10.1007/s00521-021-06315-w</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Pedregosa et al.(2012)Pedregosa, Varoquaux, Gramfort, Michel,
Thirion, Grisel, Blondel, Prettenhofer, Weiss, Dubourg, VanderPlas, Passos,
Cournapeau, Brucher, Perrot, and Duchesnay</label><mixed-citation>
      
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel,
O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., VanderPlas, J.,
Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E.:
Scikit-learn: Machine Learning in Python, CoRR, abs/1201.0490, arXiv [preprint], <a href="https://doi.org/10.48550/arXiv.1201.0490" target="_blank">https://doi.org/10.48550/arXiv.1201.0490</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Pelletier and
Andréassian(2024)</label><mixed-citation>
      
Pelletier, A. and Andréassian, V.: An underground view of surface
hydrology: what can piezometers tell us about river floods and droughts?,
Comptes Rendus. Géoscience, 355, 271–280, <a href="https://doi.org/10.5802/crgeos.195" target="_blank">https://doi.org/10.5802/crgeos.195</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Perrin et al.(2003)Perrin, Michel, and
Andréassian</label><mixed-citation>
      
Perrin, C., Michel, C., and Andréassian, V.: Improvement of a
parsimonious model for streamflow simulation, J. Hydrol., 279, 275–289, <a href="https://doi.org/10.1016/S0022-1694(03)00225-7" target="_blank">https://doi.org/10.1016/S0022-1694(03)00225-7</a>, 2003.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Petropoulos et al.(2022)Petropoulos, Apiletti, Assimakopoulos, Babai, Barrow, Ben Taieb, Bergmeir, Bessa, Bijak, Boylan, Browell, Carnevale, Castle, Cirillo, Clements, Cordeiro, Cyrino Oliveira, De Baets, Dokumentov, Ellison, Fiszeder, Franses, Frazier, Gilliland, Gönül, Goodwin, Grossi, Grushka-Cockayne, Guidolin, Guidolin, Gunter, Guo, Guseo, Harvey, Hendry, Hollyman, Januschowski, Jeon, Jose, Kang, Koehler, Kolassa, Kourentzes, Leva, Li, Litsiou, Makridakis, Martin, Martinez, Meeran, Modis, Nikolopoulos, Önkal, Paccagnini, Panagiotelis, Panapakidis, Pavía, Pedio, Pedregal, Pinson, Ramos, Rapach, Reade, Rostami-Tabar, Rubaszek, Sermpinis, Shang, Spiliotis, Syntetos, Talagala, Talagala, Tashman, Thomakos, Thorarinsdottir, Todini, Trapero Arenas, Wang, Winkler, Yusupova, and Ziel</label><mixed-citation>
      
Petropoulos, F., Apiletti, D., Assimakopoulos, V., Babai, M. Z., Barrow, D. K., Ben Taieb, S., Bergmeir, C., Bessa, R. J., Bijak, J., Boylan, J. E., Browell, J., Carnevale, C., Castle, J. L., Cirillo, P., Clements, M. P., Cordeiro, C., Cyrino Oliveira, F. L., De Baets, S., Dokumentov, A., Ellison, J., Fiszeder, P., Franses, P. H., Frazier, D. T., Gilliland, M., Gönül, M. S., Goodwin, P., Grossi, L., Grushka-Cockayne, Y., Guidolin, M., Guidolin, M., Gunter, U., Guo, X., Guseo, R., Harvey, N., Hendry, D. F., Hollyman, R., Januschowski, T., Jeon, J., Jose, V. R. R., Kang, Y., Koehler, A. B., Kolassa, S., Kourentzes, N., Leva, S., Li, F., Litsiou, K., Makridakis, S., Martin, G. M., Martinez, A. B., Meeran, S., Modis, T., Nikolopoulos, K., Önkal, D., Paccagnini, A., Panagiotelis, A., Panapakidis, I., Pavía, J. M., Pedio, M., Pedregal, D. J., Pinson, P., Ramos, P., Rapach, D. E., Reade, J. J., Rostami-Tabar, B., Rubaszek, M., Sermpinis, G., Shang, H. L., Spiliotis, E., Syntetos, A. A., Talagala, P. D., Talagala, T. S., Tashman, L., Thomakos, D., Thorarinsdottir, T., Todini, E.,
Trapero Arenas, J. R., Wang, X., Winkler, R. L., Yusupova, A., and Ziel, F.:
Forecasting: theory and practice, Int. J. Forecast., 38, 705–871, <a href="https://doi.org/10.1016/j.ijforecast.2021.11.001" target="_blank">https://doi.org/10.1016/j.ijforecast.2021.11.001</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Philip et al.(2020)Philip, Kew, van Oldenborgh, Otto, Vautard,
van der Wiel, King, Lott, Arrighi, Singh, and van Aalst</label><mixed-citation>
      
Philip, S., Kew, S., van Oldenborgh, G. J., Otto, F., Vautard, R., van der
Wiel, K., King, A., Lott, F., Arrighi, J., Singh, R., and van Aalst, M.: A
protocol for probabilistic extreme event attribution analyses, Adv.
Stat. Climatol. Meteorol. Oceanogr., 6, 177–203, <a href="https://doi.org/10.5194/ascmo-6-177-2020" target="_blank">https://doi.org/10.5194/ascmo-6-177-2020</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Piazzi et al.(2021)Piazzi, Thirel, Perrin, and Delaigue</label><mixed-citation>
      
Piazzi, G., Thirel, G., Perrin, C., and Delaigue, O.: Sequential Data
Assimilation for Streamflow Forecasting: Assessing the Sensitivity to
Uncertainties and Updated Variables of a Conceptual Hydrological Model at
Basin Scale, Water Resour. Res., 57, <a href="https://doi.org/10.1029/2020WR028390" target="_blank">https://doi.org/10.1029/2020WR028390</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Pölz et al.(2024)Pölz, Blaschke, Komma, Farnleitner, and Derx</label><mixed-citation>
      
Pölz, A., Blaschke, A. P., Komma, J., Farnleitner, A. H., and Derx, J.:
Transformer Versus LSTM: A Comparison of Deep Learning Models for Karst
Spring Discharge Forecasting, Water Resour. Res., 60, e2022WR032602,
<a href="https://doi.org/10.1029/2022WR032602" target="_blank">https://doi.org/10.1029/2022WR032602</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Rahbar et al.(2022)Rahbar, Mirarabi, Nakhaei, Talkhabi, and
Jamali</label><mixed-citation>
      
Rahbar, A., Mirarabi, A., Nakhaei, M., Talkhabi, M., and Jamali, M.: A
Comparative Analysis of Data-Driven Models (SVR, ANFIS, and ANNs) for Daily
Karst Spring Discharge Prediction, Water Resour. Res., 36, 589–609,
<a href="https://doi.org/10.1007/s11269-021-03041-9" target="_blank">https://doi.org/10.1007/s11269-021-03041-9</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>Rentschler et al.(2023)Rentschler, Avner, Marconcini, Su, Strano,
Vousdoukas, and Hallegatte</label><mixed-citation>
      
Rentschler, J., Avner, P., Marconcini, M., Su, R., Strano, E., Vousdoukas, M., and Hallegatte, S.: Global evidence of rapid urban growth in flood zones
since 1985, Nature, 622, 87–92, <a href="https://doi.org/10.1038/s41586-023-06468-9" target="_blank">https://doi.org/10.1038/s41586-023-06468-9</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>Rosenblatt(1958)</label><mixed-citation>
      
Rosenblatt, F.: The perceptron: A probabilistic model for information storage and organization in the brain., Psycholog. Rev., 65, 386–408,
<a href="https://doi.org/10.1037/h0042519" target="_blank">https://doi.org/10.1037/h0042519</a>, 1958.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>Saint Fleur et al.(2020)Saint Fleur, Artigue, Johannet, and
Pistre</label><mixed-citation>
      
Saint Fleur, B. E., Artigue, G., Johannet, A., and Pistre, S.: Deep Multilayer Perceptron for Knowledge Extraction: Understanding the Gardon de Mialet Flash Floods Modeling, in: Theory and Applications of Time Series Analysis, edited by: Valenzuela, O., Rojas, F., Herrera, L. J., Pomares, H., and Rojas, I., Springer International Publishing, Cham, 333–348, ISBN 978-3-030-56219-9, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>Saint-Fleur et al.(2023)Saint-Fleur, Allier, Lassara, Rivet, Artigue, Pistre, and Johannet</label><mixed-citation>
      
Saint-Fleur, B. E., Allier, S., Lassara, E., Rivet, A., Artigue, G., Pistre,
S., and Johannet, A.: Towards a better consideration of rainfall and
hydrological spatial features by a deep neural network model to improve flash
floods forecasting: case study on the Gardon basin, France, Model. Earth
Syst. Environ., 9, 3693–3708, <a href="https://doi.org/10.1007/s40808-022-01650-w" target="_blank">https://doi.org/10.1007/s40808-022-01650-w</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>Schiermeier(2018)</label><mixed-citation>
      
Schiermeier, Q.: Droughts, heatwaves and floods: How to tell when climate
change is to blame, Nature, 560, 20–22, <a href="https://doi.org/10.1038/d41586-018-05849-9" target="_blank">https://doi.org/10.1038/d41586-018-05849-9</a>,
2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>Seillier-Moiseiwitsch and Dawid(1993)</label><mixed-citation>
      
Seillier-Moiseiwitsch, F. and Dawid, A. P.: On Testing the Validity of
Sequential Probability Forecasts, J. Am. Stat. Assoc., 88, 355–359, <a href="https://doi.org/10.2307/2290731" target="_blank">https://doi.org/10.2307/2290731</a>, 1993.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>Slater et al.(2019)Slater, Villarini, and
Bradley</label><mixed-citation>
      
Slater, L. J., Villarini, G., and Bradley, A. A.: Evaluation of the skill of
North-American Multi-Model Ensemble (NMME) Global Climate Models in
predicting average and extreme precipitation and temperature over the
continental USA, Clim. Dynam., 53, 7381–7396, <a href="https://doi.org/10.1007/s00382-016-3286-1" target="_blank">https://doi.org/10.1007/s00382-016-3286-1</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib60"><label>Talagrand et al.(1997)Talagrand, Vautard, and
Strauss</label><mixed-citation>
      
Talagrand, O., Vautard, R., and Strauss, B.: Evaluation of probabilistic
prediction systems, PhD thesis, Shinfield Park, Reading, 1997.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib61"><label>Teräsvirta et al.(2010)Teräsvirta, Tjøstheim, and
Granger</label><mixed-citation>
      
Teräsvirta, T., Tjøstheim, D., and Granger, C. W. J.: Modelling
Nonlinear Economic Time Series, Oxford University Press, ISBN 9780199587148,
<a href="https://doi.org/10.1093/acprof:oso/9780199587148.001.0001" target="_blank">https://doi.org/10.1093/acprof:oso/9780199587148.001.0001</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib62"><label>Terven et al.(2025)Terven, Cordova-Esparza, Romero-González,
Ramírez-Pedraza, and
Chávez-Urbiola</label><mixed-citation>
      
Terven, J., Cordova-Esparza, D.-M., Romero-González, J.-A.,
Ramírez-Pedraza, A., and Chávez-Urbiola, E. A.: A comprehensive
survey of loss functions and metrics in deep learning, Artif. Intel. Rev., 58, 195, <a href="https://doi.org/10.1007/s10462-025-11198-7" target="_blank">https://doi.org/10.1007/s10462-025-11198-7</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib63"><label>van der Walt et al.(2011)Walt, Colbert, and Varoquaux</label><mixed-citation>
      
van der Walt, S., Colbert, S. C., and Varoquaux, G.: The NumPy Array: A Structure for Efficient Numerical Computation, Comput. Sci. Eng., 13, 22–30, <a href="https://doi.org/10.1109/MCSE.2011.37" target="_blank">https://doi.org/10.1109/MCSE.2011.37</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib64"><label>van Rossum(1995)</label><mixed-citation>
      
van Rossum, G.: Python tutorial, CWI – Centrum voor Wiskunde en Informatica, Amsterdam, the Netherlands, 1995.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib65"><label>Vitart et al.(2017)Vitart, Ardilouze, Bonet, Brookshaw, Chen,
Codorean, Déqué, Ferranti, Fucile, Fuentes, Hendon, Hodgson,
Kang, Kumar, Lin, Liu, Liu, Malguzzi, Mallas, Manoussakis, Mastrangelo,
MacLachlan, McLean, Minami, Mladek, Nakazawa, Najm, Nie, Rixen, Robertson,
Ruti, Sun, Takaya, Tolstykh, Venuti, Waliser, Woolnough, Wu, Won, Xiao,
Zaripov, and Zhang</label><mixed-citation>
      
Vitart, F., Ardilouze, C., Bonet, A., Brookshaw, A., Chen, M., Codorean, C.,
Déqué, M., Ferranti, L., Fucile, E., Fuentes, M., Hendon, H.,
Hodgson, J., Kang, H.-S., Kumar, A., Lin, H., Liu, G., Liu, X., Malguzzi, P.,
Mallas, I., Manoussakis, M., Mastrangelo, D., MacLachlan, C., McLean, P.,
Minami, A., Mladek, R., Nakazawa, T., Najm, S., Nie, Y., Rixen, M.,
Robertson, A. W., Ruti, P., Sun, C., Takaya, Y., Tolstykh, M., Venuti, F.,
Waliser, D., Woolnough, S., Wu, T., Won, D.-J., Xiao, H., Zaripov, R., and
Zhang, L.: The Subseasonal to Seasonal (S2S) Prediction Project Database, B. Am. Meteorol. Soc., 98, 163–173, <a href="https://doi.org/10.1175/BAMS-D-16-0017.1" target="_blank">https://doi.org/10.1175/BAMS-D-16-0017.1</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib66"><label>Waskom(2021)</label><mixed-citation>
      
Waskom, M. L.: seaborn: statistical data visualization, J. Open Sour. Softw., 6, <a href="https://doi.org/10.21105/joss.03021" target="_blank">https://doi.org/10.21105/joss.03021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib67"><label>Werbos(1974)</label><mixed-citation>
      
Werbos, P.: Beyond regression: New tools for prediction and analysis in the
behavioral sciences, PhD thesis, Committee on Applied Mathematics, Harvard
University, Cambridge, MA, <a href="https://gwern.net/doc/ai/nn/1974-werbos.pdf" target="_blank"/>
(last access: 5 June 2026), 1974.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib68"><label>Werbos(1988)</label><mixed-citation>
      
Werbos, P.: Backpropagation: Past and future, in: IEEE 1988 International
Conference on Neural Networks, 343–353, <a href="https://doi.org/10.1109/ICNN.1988.23866" target="_blank">https://doi.org/10.1109/ICNN.1988.23866</a>, 1988.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib69"><label>Whitaker and Loughe(1998)</label><mixed-citation>
      
Whitaker, J. S. and Loughe, A. F.: The Relationship between Ensemble Spread
and Ensemble Mean Skill, Mon. Weather Rev., 126, 3292–3302,
<a href="https://doi.org/10.1175/1520-0493(1998)126&lt;3292:TRBESA&gt;2.0.CO;2" target="_blank">https://doi.org/10.1175/1520-0493(1998)126&lt;3292:TRBESA&gt;2.0.CO;2</a>, 1998.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib70"><label>Wunsch et al.(2021)Wunsch, Liesch, and Broda</label><mixed-citation>
      
Wunsch, A., Liesch, T., and Broda, S.: Groundwater level forecasting with
artificial neural networks: a comparison of long short-term memory (LSTM),
convolutional neural networks (CNNs), and non-linear autoregressive networks
with exogenous input (NARX), Hydrol. Earth Syst. Sci., 25, 1671–1687, <a href="https://doi.org/10.5194/hess-25-1671-2021" target="_blank">https://doi.org/10.5194/hess-25-1671-2021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib71"><label>Yang et al.(2020)</label><mixed-citation>
      
Yang, C., Yuan, H., and Su, X.: Bias correction of ensemble precipitation forecasts in the improvement of summer streamflow prediction skill, J. Hydrol., 588, 124955, <a href="https://doi.org/10.1016/j.jhydrol.2020.124955" target="_blank">https://doi.org/10.1016/j.jhydrol.2020.124955</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib72"><label>Yang et al.(2025)Yang, Pan, Feng, Xiao, Dixon, Hartman, Shen, Song,
Sengupta, Delle Monache, and Ralph</label><mixed-citation>
      
Yang, Y., Pan, M., Feng, D., Xiao, M., Dixon, T., Hartman, R., Shen, C., Song, Y., Sengupta, A., Delle Monache, L., and Ralph, F. M.: Improving streamflow simulation through machine learning-powered data integration and its potential for forecasting in the Western U.S., Hydrol. Earth Syst. Sci., 29, 5453–5476, <a href="https://doi.org/10.5194/hess-29-5453-2025" target="_blank">https://doi.org/10.5194/hess-29-5453-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib73"><label>Zalachori et al.(2012)Zalachori, Ramos, Garçon, Mathevet, and
Gailhard</label><mixed-citation>
      
Zalachori, I., Ramos, M.-H., Garçon, R., Mathevet, T., and Gailhard, J.: Statistical processing of forecasts for hydrological ensemble prediction: a
comparative study of different bias correction strategies, Adv. Sci. Res., 8, 135–141, <a href="https://doi.org/10.5194/asr-8-135-2012" target="_blank">https://doi.org/10.5194/asr-8-135-2012</a>, 2012.

    </mixed-citation></ref-html>--></article>
