<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0" article-type="research-article">
  <front>
    <journal-meta><journal-id journal-id-type="publisher">HESS</journal-id><journal-title-group>
    <journal-title>Hydrology and Earth System Sciences</journal-title>
    <abbrev-journal-title abbrev-type="publisher">HESS</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Hydrol. Earth Syst. Sci.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1607-7938</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/hess-30-3549-2026</article-id><title-group><article-title>Multi-site learning for hydrological uncertainty prediction: the case of quantile random forests</article-title><alt-title>Multi-site learning for hydrological uncertainty prediction</alt-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>El Ouahabi</surname><given-names>Taha-Abderrahman</given-names></name>
          <email>taha.elouahabi@gmail.com</email>
        <ext-link>https://orcid.org/0009-0000-7613-3086</ext-link></contrib>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Bourgin</surname><given-names>François</given-names></name>
          <email>francois.bourgin@inrae.com</email>
        <ext-link>https://orcid.org/0000-0002-2820-7260</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Perrin</surname><given-names>Charles</given-names></name>
          
        <ext-link>https://orcid.org/0000-0001-8552-1881</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Andréassian</surname><given-names>Vazken</given-names></name>
          
        <ext-link>https://orcid.org/0000-0001-7124-9303</ext-link></contrib>
        <aff id="aff1"><label>1</label><institution>Université Paris-Saclay, INRAE, HYCAR, Antony, France</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Taha-Abderrahman El Ouahabi (taha.elouahabi@gmail.com) and François Bourgin (francois.bourgin@inrae.com)</corresp></author-notes><pub-date><day>12</day><month>June</month><year>2026</year></pub-date>
      
      <volume>30</volume>
      <issue>11</issue>
      <fpage>3549</fpage><lpage>3574</lpage>
      <history>
        <date date-type="received"><day>24</day><month>July</month><year>2025</year></date>
           <date date-type="rev-request"><day>30</day><month>July</month><year>2025</year></date>
           <date date-type="rev-recd"><day>28</day><month>May</month><year>2026</year></date>
           <date date-type="accepted"><day>28</day><month>May</month><year>2026</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2026 Taha-Abderrahman El Ouahabi et al.</copyright-statement>
        <copyright-year>2026</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026.html">This article is available from https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026.html</self-uri><self-uri xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026.pdf">The full text article is available as a PDF file from https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026.pdf</self-uri>
      <abstract><title>Abstract</title>

      <p id="d2e108">To improve hydrological uncertainty estimation, recent studies have explored machine learning (ML)-based post-processing approaches that enable both enhanced predictive performance and hydrologically informed probabilistic streamflow predictions. Among these, random forests (RF) and their probabilistic extension, quantile random forests (QRF), are increasingly used for their balance between interpretability and performance. However, the application of QRF in regional post-processing settings remains unexplored. In this study, we develop a hydrologically informed QRF post-processor trained in a multi-site setting and compare its performance against a locally (at-site) trained QRF using probabilistic evaluation metrics. The QRF framework leverages simulations and state variables from the GR6J process-based hydrological model, along with readily available catchment descriptors, to predict daily streamflow uncertainty. Our results show that the regional QRF approach is beneficial for hydrological uncertainty estimation, particularly in catchments where local information is insufficient. The findings highlight that multi-site learning enables effective information transfer across hydrologically similar catchments and is especially advantageous for high-flow events. However, the selection of appropriate catchment descriptors is critical to achieving these benefits.</p>
  </abstract>
    
<funding-group>
<award-group id="gs1">
<funding-source>Agence Nationale de la Recherche</funding-source>
<award-id>ANR-20-CE04-0009</award-id>
</award-group>
</funding-group>
</article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
<sec id="Ch1.S1.SS1">
  <label>1.1</label><title>On the need for quality uncertainty estimates</title>
      <p id="d2e127">Providing quality uncertainty estimates for streamflow predictions is critically important, particularly in applications such as operational drought simulation, water resource management, and flood mitigation where significant stakes are involved <xref ref-type="bibr" rid="bib1.bibx29 bib1.bibx72" id="paren.1"/>. Poorly quantified or overly confident predictions can lead to misinformed decisions, potentially resulting in economic losses, infrastructure damage, or even threats to public safety. To address this, various approaches have been proposed in the hydrological community for streamflow uncertainty quantification, including multi-model ensembles <xref ref-type="bibr" rid="bib1.bibx19 bib1.bibx66" id="paren.2"/>, Bayesian inference <xref ref-type="bibr" rid="bib1.bibx37 bib1.bibx2" id="paren.3"/>, and hydrological error modeling <xref ref-type="bibr" rid="bib1.bibx36 bib1.bibx65 bib1.bibx58 bib1.bibx4" id="paren.4"/>, referred to as post-processing. Post-processing involves a process-based model followed by a statistical approach to correct errors and quantify the associated uncertainties. With the post-processing procedure, uncertainties can be quantified by modeling the error patterns based on an archive of past data.</p>
      <p id="d2e142">Hydrological post-processing techniques were adopted early through methods such as the hydrological uncertainty processor (HUP) <xref ref-type="bibr" rid="bib1.bibx36" id="paren.5"/> and model conditional processor (MCP) <xref ref-type="bibr" rid="bib1.bibx65" id="paren.6"/>, but recent machine learning (ML)-based approaches have emerged as powerful tools for hydrological post-processing. Although less interpretable, ML-based approaches can potentially produce reliable and more informative uncertainty estimates <xref ref-type="bibr" rid="bib1.bibx49 bib1.bibx67" id="paren.7"/>. Methods such as quantile regression (QR) <xref ref-type="bibr" rid="bib1.bibx68 bib1.bibx49" id="paren.8"/>, conformal prediction <xref ref-type="bibr" rid="bib1.bibx1" id="paren.9"/>, and random forests <xref ref-type="bibr" rid="bib1.bibx73" id="paren.10"/> have been used for streamflow post-processing with promising results. However, ML algorithms can produce different uncertainty estimates depending on how they are trained – particularly on which catchments are included in the training dataset. Since hydrological conditions vary significantly across catchments, the selection of catchments used for training can influence the uncertainty estimates of the ML model. In our study, we aim to explore whether including different catchments, referred to as regional or multi-site learning, may improve ML-based post-processing for error-correction and uncertainty prediction, and specifically for the quantile random forests (QRF) model. A QRF model for error-correction and uncertainty prediction trained in a regional setting can leverage information across several catchments. In contrast, local at-site approaches train models independently for each catchment and cannot benefit from error patterns shared in hydrologically similar catchments.</p>
</sec>
<sec id="Ch1.S1.SS2">
  <label>1.2</label><title>Machine learning-based post-processors</title>
      <p id="d2e172">Random forest (RF) <xref ref-type="bibr" rid="bib1.bibx8" id="paren.11"/> and its probabilistic variant, quantile random forest (QRF) <xref ref-type="bibr" rid="bib1.bibx43" id="paren.12"/> are extensively used and are considered state-of-the-art in many hydrological applications. Recently, <xref ref-type="bibr" rid="bib1.bibx73" id="text.13"/> compared the QRF model and the countable mixtures of asymmetric Laplacians long short-term memory (CMAL-LSTM) model to probabilistically post-process streamflow simulations across 522 catchments. The QRF and CMAL-LSTM models were comparable in terms of uncertainty estimates, but the CMAL-LSTM deep learning (DL) model performed better in catchments with large flow accumulation areas. QRF has also been applied in hydrologically informed post-processing approaches. <xref ref-type="bibr" rid="bib1.bibx57" id="text.14"/> used in an RF framework and leveraged internal state variables to correct PCR-GLOBWB (PCRaster Global Water Balance, a global hydrological model) simulations at three stations in the Rhine Basin. They found that the use of hydrological model states as input features of RFs provides additional information that may not be included in the model simulations. However, challenges remain, particularly in modeling errors during high streamflow periods. <xref ref-type="bibr" rid="bib1.bibx41" id="text.15"/> expanded the same approach at a global scale, using PCR-GLOBWB model simulations and internal states, in conjunction with static catchment attributes, to train a single RF model on a global database of streamflow simulations and measurements. They found that improvements were independent of the availability of streamflow data, indicating the power of regional learning methods in poorly gauged and ungauged catchments.</p>
      <p id="d2e190">Prediction in ungauged basins is not the only benefit of training a single ML model on data from multiple catchments. <xref ref-type="bibr" rid="bib1.bibx35" id="text.16"/> advocated the use of regional approaches to fit a deterministic Long Short-term Memory (LSTM) DL model for streamflow simulations. They found that larger LSTM models trained on all available basins outperform smaller models trained on a limited set of catchments. This is because some ML approaches can properly use the additional information contained in larger training sets to perform better than models specialized for individual catchments<xref ref-type="bibr" rid="bib1.bibx44" id="paren.17"/>. Furthermore, <xref ref-type="bibr" rid="bib1.bibx31" id="text.18"/> found that hydrological model performance depends on catchment characteristics, indicating the presence of regional bias, where hydrological errors exhibit similar properties for neighbouring catchments. This can be harnessed to improve uncertainty estimation using a post-processing model with a regional parametrisation and trained on hydrological errors from multiple sites. We intend to consider these catchment characteristics as additional input features within the proposed post-processing model.</p>
</sec>
<sec id="Ch1.S1.SS3">
  <label>1.3</label><title>Scope of this study</title>
      <p id="d2e210">This study addresses uncertainty estimation of process-based hydrological model simulations, with a focus on streamflow reconstruction and future projection scenarios. We investigate the added value of multi-site post-processing applied to a quantile random forests (QRF) model. The main contributions of this work are: (i) to understand the impact of including different catchments in the training process of QRF (multi-site) on the quality if its uncertainty estimates and (ii) to investigate the importance of catchment characteristics for these multi-site QRFs. Although multi-site approaches can benefit the modelling of ungauged catchments, our work specifically addresses improvements in uncertainty estimation for catchments with available streamflow measurements.</p>
      <p id="d2e213">Accordingly, we use temporally varying information (predicted streamflows and model states) in addition to catchment dependent characteristics to estimate uncertainties in hydrological model predictions. We chose to focus on QRF due to its balance of performance, interpretability, and popularity in the hydrological community. To the best of our knowledge, no prior study has explored the impact of multi-site learning with the QRF algorithm for uncertainty estimation, particularly when post-processing a hydrological model calibrated separately for each catchment. To that end, we fit different QRF variants on the internal states of a hydrological model, on meteorological variables, and on readily available catchment characteristics. The proposed regional QRF variants are evaluated across a large sample of 564 French catchments to identify when multi-site learning may be beneficial and to offer practical considerations for multi-site QRF applications.</p>
      <p id="d2e216">The paper is organized as follows: We first introduce the dataset and describe the QRF algorithm, its variants, and the probabilistic evaluation framework. Then, we present and discuss the results before summarizing the key findings along with implications for future work.</p>

      <fig id="F1" specific-use="star"><label>Figure 1</label><caption><p id="d2e222">Location of the 564 catchment outlets. Plotted regions represent the hydroclimatological catchment groups used in the study.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026-f01.png"/>

        </fig>

<table-wrap id="T1" specific-use="star"><label>Table 1</label><caption><p id="d2e234">Features used in the study.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Features</oasis:entry>
         <oasis:entry colname="col2">Unit</oasis:entry>
         <oasis:entry colname="col3">Description</oasis:entry>
         <oasis:entry colname="col4">Type</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">PotEvap</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M1" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">potential evapotranspiration</oasis:entry>
         <oasis:entry colname="col4"><italic>Dynamic features</italic></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Precip</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M2" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">precipitation</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">AE</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M3" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">actual evapotranspiration</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Prod</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M4" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">production store</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Rout</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M5" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">routing store</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">AExch</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M6" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">intercatchment exchange</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Qsim</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M7" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">simulated flows</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Delta_sim7</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M8" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">7 <inline-formula><mml:math id="M9" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula> difference in simulated flows</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Delta_sim1</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M10" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">1 <inline-formula><mml:math id="M11" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula> difference in simulated flows</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Delta_rout7</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M12" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">7 <inline-formula><mml:math id="M13" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula> difference in routing store</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Delta_rout1</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M14" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">1 <inline-formula><mml:math id="M15" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula> difference in routing store</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Delta_prod1</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M16" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">1 <inline-formula><mml:math id="M17" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula> difference in production store</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Prec_sold_frac</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">fraction of solid precipitation</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Temp</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M18" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">temperature</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">SWI_ISBA</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">soil wetness index</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Rolling_temp</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M19" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">moving average of temperature</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Rolling_precip</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M20" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">moving average of precipitation</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Rolling_sold_frac</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M21" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">moving average of solid precipitation</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Month_of_year</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">annual cycle (cosine term)</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">top_drainage_density</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">drainage density</oasis:entry>
         <oasis:entry colname="col4"><italic>Static features</italic></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">sit_area_topo</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M22" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">topographic catchment area</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">hyd_bfi_pelletier_pet_ou</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">baseflow index</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">cli_prec_mean</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M23" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">mean daily precipitation</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">cli_pet_ou_mean</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M24" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">mean daily potential evapotranspiration</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">cli_temp_mean</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M25" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">mean daily temperature</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">cli_aridity_ou</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">aridity index</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">cli_psol_frac_safran</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">mean fraction of solid precipitation</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">cli_prec_freq_high</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">frequency of high-precipitation days</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">cli_prec_freq_low</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">frequency of dry days</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">top_altitude_mean</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M26" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">m</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">mean catchment elevation</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">cli_prec_season_pet_ou</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">seasonality index</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Response_Time</oasis:entry>
         <oasis:entry colname="col2">days</oasis:entry>
         <oasis:entry colname="col3">Response time based on the X4 parameter from GR6J</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">mean_Qsim</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M27" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">mean Qsim</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">std_Qsim</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M28" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">standard deviation of Qsim</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">mean_Qobs</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M29" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">mean Qobs</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">std_Qobs</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M30" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">standard deviation Qobs</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">mean_Error_log</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">mean error log</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">std_Error_log</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">standard deviation error log</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Region indicator</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">Hydroclimatological region of the catchment</oasis:entry>
         <oasis:entry colname="col4"/>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>


</sec>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Dataset</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>A dataset of 564 French catchments</title>
      <p id="d2e1180">We used a set of 564 catchments distributed throughout France (Fig. <xref ref-type="fig" rid="F1"/>). These catchments represent a wide range of hydrological regimes and simulation contexts. We selected these catchments from the CAMELS-FR hydroclimatic dataset <xref ref-type="bibr" rid="bib1.bibx14" id="paren.19"/>. The criteria for selecting these catchments were as follows: (i) low anthropogenic influence, (ii) good data quality for all flow regimes, and (iii) an available time series longer than 21 years. Streamflow data were obtained from the national HydroPortail archive <xref ref-type="bibr" rid="bib1.bibx38 bib1.bibx16" id="paren.20"/> at a daily time step for the period 1977–2021. Meteorological forcings (precipitation and temperature) were provided by Météo-France's daily SAFRAN grid reanalysis <xref ref-type="bibr" rid="bib1.bibx70" id="paren.21"/>. Potential evaporation (PET) is calculated using the formula proposed by <xref ref-type="bibr" rid="bib1.bibx47" id="text.22"/> and requires two inputs: extraterrestrial radiation (<inline-formula><mml:math id="M31" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">MJ</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) and mean daily air temperature (<inline-formula><mml:math id="M32" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">°</mml:mi><mml:mi mathvariant="normal">C</mml:mi></mml:mrow></mml:math></inline-formula>). Extraterrestrial radiation is computed as a function of the localization of the basin and Julian day values and the temperature is the only dynamical meteorological input used to estimate PET. Since our interest is in developing a multi-site QRF post-processor, we used several static basin-averaged attributes describing climate, topography, and geology. All of these attributes are included in the CAMELS-FR dataset and are listed in Table <xref ref-type="table" rid="T1"/>.</p>
      <p id="d2e1236">Hydrological calibration was performed independently for each catchment over the 1977–2021 period. Subsequently, the QRF variants were implemented using a standard train–validation–test split over the 1990–2021 period: <list list-type="bullet"><list-item>
      <p id="d2e1241"><inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:mi>P</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>: training period from 1990 to 2004, used to train the QRF post-processor.</p></list-item><list-item>
      <p id="d2e1254"><inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:mi>P</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:math></inline-formula>: validation period from 2005 to 2009, used to select the hyperparameters of the QRF post-processor.</p></list-item><list-item>
      <p id="d2e1267"><inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:mi>P</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula>: testing period from 2010 to 2021, used to test the performance of the QRF variants on new data.</p></list-item></list></p>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Methods</title>
</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Hydrological model</title>
      <p id="d2e1295">We used discharge simulations obtained with the GR6J rainfall–runoff model <xref ref-type="bibr" rid="bib1.bibx52" id="paren.23"/>, a daily 6-parameter conceptual lumped model. GR6J has been applied in several studies across a large number of catchments and hydroclimatic contexts <xref ref-type="bibr" rid="bib1.bibx51 bib1.bibx22 bib1.bibx61" id="paren.24"><named-content content-type="pre">e.g.</named-content></xref>. The GR6J model is based on several state variables that control its simulations, in particular, the production and routing store levels, as well as intercatchment exchange fluxes. We intend to use these state variables as predictors in the QRF algorithm. <xref ref-type="bibr" rid="bib1.bibx57" id="text.25"/> successfully used internal state variables as predictors in an RF framework to correct hydrological model errors. They found that internal state variables provided valuable information for the RF, enabling it to detect and correct for systematic hydrological model errors. To account for the influence of snow in some catchments, we incorporated Cemaneige <xref ref-type="bibr" rid="bib1.bibx69" id="paren.26"/>, a snow accumulation and melt model, with constant parameters for all catchments.</p>
      <p id="d2e1312">The Cemaneige-GR6J model was calibrated using the airGR R package <xref ref-type="bibr" rid="bib1.bibx10 bib1.bibx11" id="paren.27"/> using the built-in calibration algorithm. To ensure good performance across a wide range of streamflow conditions, the target optimization criterion was a combination of KGE based criteria <xref ref-type="bibr" rid="bib1.bibx24 bib1.bibx33" id="paren.28"/>: an equal weighting of KGE criteria with power transformations of 0.5 and <inline-formula><mml:math id="M36" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.5 applied to streamflow, as detailed in Appendix <xref ref-type="sec" rid="App1.Ch1.S3"/>. This calibration approach was implemented to obtain the six parameters of GR6J: production store capacity (<inline-formula><mml:math id="M37" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula>), groundwater exchange coefficient (<inline-formula><mml:math id="M38" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>), routing store capacity (<inline-formula><mml:math id="M39" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula>), time constant of unit hydrograph (days), groundwater exchange threshold (–), and exponential store coefficient (<inline-formula><mml:math id="M40" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi></mml:mrow></mml:math></inline-formula>).</p>
</sec>
<sec id="Ch1.S2.SS4">
  <label>2.4</label><title>Feature selection and data transformations</title>
<sec id="Ch1.S2.SS4.SSS1">
  <label>2.4.1</label><title>Target variable</title>
      <p id="d2e1387">For this study, we model the probabilistic distribution of hydrological model errors. Since these errors are skewed and non-Gaussian <xref ref-type="bibr" rid="bib1.bibx17" id="paren.29"><named-content content-type="pre">e.g.</named-content></xref>, we applied a logarithmic transformation to improve the training process:

              <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M41" display="block"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mtext>log</mml:mtext><mml:mfenced open="(" close=")"><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msubsup><mml:mi>Q</mml:mi><mml:mi mathvariant="normal">t</mml:mi><mml:mtext>obs</mml:mtext></mml:msubsup><mml:mo>+</mml:mo><mml:mi mathvariant="italic">δ</mml:mi></mml:mrow><mml:mrow><mml:msubsup><mml:mi>Q</mml:mi><mml:mi mathvariant="normal">t</mml:mi><mml:mtext>sim</mml:mtext></mml:msubsup><mml:mo>+</mml:mo><mml:mi mathvariant="italic">δ</mml:mi></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced></mml:mrow></mml:math></disp-formula>

            where <inline-formula><mml:math id="M42" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the target variable of our study and represents the prediction error, <inline-formula><mml:math id="M43" display="inline"><mml:mrow><mml:msubsup><mml:mi>Q</mml:mi><mml:mi mathvariant="normal">t</mml:mi><mml:mtext>obs</mml:mtext></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:msubsup><mml:mi>Q</mml:mi><mml:mi mathvariant="normal">t</mml:mi><mml:mtext>sim</mml:mtext></mml:msubsup></mml:mrow></mml:math></inline-formula> indicate observed and simulated streamflows, respectively (<inline-formula><mml:math id="M45" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>), and <inline-formula><mml:math id="M46" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> represents the time index with a temporal step of 1 <inline-formula><mml:math id="M47" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">d</mml:mi></mml:mrow></mml:math></inline-formula>. <inline-formula><mml:math id="M48" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> is an offset parameter to avoid zero streamflow values and is unique for each catchment. It was calculated following the recommendation in <xref ref-type="bibr" rid="bib1.bibx53" id="text.30"/>. The use of <inline-formula><mml:math id="M49" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> is especially relevant in this study due to the application of the logarithmic transformation.</p>
      <p id="d2e1524">The input predictors (or features) in the QRF models are listed in Table <xref ref-type="table" rid="T1"/>. These features can be broadly categorized into two groups of approximately equal size: (i) time series data (dynamic features) that capture temporal variability, and (ii) catchment descriptors (static features) that enable spatial identification of catchments.</p>
</sec>
<sec id="Ch1.S2.SS4.SSS2">
  <label>2.4.2</label><title>Dynamic features</title>
      <p id="d2e1537">The proposed QRF framework post-processes GR6J simulations and uses hydrological model outputs and state variables along with meteorological inputs (precipitation and temperature). Streamflow uncertainties are known to be autocorrelated <xref ref-type="bibr" rid="bib1.bibx17" id="paren.31"/>, with strong autoregressive (AR) and memory effects. Consequently, lagged observed streamflow <xref ref-type="bibr" rid="bib1.bibx73 bib1.bibx50" id="paren.32"/> is a popular input feature for RF-based post-processing. In the simulation context of this study, streamflow observations are not available for streamflow reconstruction and projection scenarios, and we use state features from the GR6J model to provide additional information to QRF. Although some of the features in Table <xref ref-type="table" rid="T1"/>, such as simulated flows and production store, are strongly autocorrelated, we assume that the additional information still leads to improved uncertainty estimates compared to using model simulations alone. Similarly to <xref ref-type="bibr" rid="bib1.bibx57" id="text.33"/>, we include other temporal information in QRF through transformed features: (i) increment features of hydrological model simulated streamflow and states to help capture the dynamics of the hydrograph (rising and falling limbs etc.) and (ii) moving averages of meteorological features to highlight general trends. This feature engineering step can be relevant for RF-based algorithms in a time series context, because QRF does not create temporal memory or embeddings as is the case for AR models and LSTM neural networks <xref ref-type="bibr" rid="bib1.bibx17 bib1.bibx39 bib1.bibx34" id="paren.34"/>. In this context, the selected moving average filter size is equal to the catchment response time, which was obtained from the time constant of the unit hydrograph parameter of the GR6J model.</p>
</sec>
<sec id="Ch1.S2.SS4.SSS3">
  <label>2.4.3</label><title>Static features</title>
      <p id="d2e1562">To take into account spatial heterogeneity, catchment descriptors are used for the multi-site QRF variants. The static features include (i) average catchment attributes such as catchment area and aridity index. We chose to keep thirteen relevant catchment attributes following the recommendations of <xref ref-type="bibr" rid="bib1.bibx30" id="text.35"/>; (ii) scale features of errors, simulated flows, and observed streamflows. These scale features provide additional unique indicators of catchment characteristics. <xref ref-type="bibr" rid="bib1.bibx44" id="text.36"/> found that combining individual time series features such as catchment attributes with scale features can improve the performance of ML models in a deterministic setting. Similar improvements are expected for QRF in the setting of hydrological uncertainty estimation. It is important to note that these scale features are not available for prediction in the context of ungauged catchments, as they are calculated based on observed streamflows.</p>

      <fig id="F2" specific-use="star"><label>Figure 2</label><caption><p id="d2e1573">Schematic overview of implementing QRF based post-processors. Configurations 1, 2, and 3 represent the regional information used in QRF post-processing. A description of the features used is provided in Table <xref ref-type="table" rid="T2"/>.</p></caption>
            <graphic xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026-f02.png"/>

          </fig>

</sec>
</sec>
<sec id="Ch1.S2.SS5">
  <label>2.5</label><title>QRF: how to fit the algorithm?</title>
      <p id="d2e1594">Random forest <xref ref-type="bibr" rid="bib1.bibx8" id="text.37"/> is a non-parametric ensemble tree-based model that offers good performance and provides certain interpretability through its feature importance estimates <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx9" id="paren.38"/>. RF and its probabilistic version QRF are used extensively in the hydrometeorological domain. An important advantage of QRF is that it provides full distributional estimates without the need to estimate each quantile separately, as is required in quantile regression <xref ref-type="bibr" rid="bib1.bibx68 bib1.bibx49" id="paren.39"/>. QRF has been applied to complex and heteroscedastic cases, including hydrometeorological ensemble forecasts <xref ref-type="bibr" rid="bib1.bibx60 bib1.bibx64 bib1.bibx62" id="paren.40"/>, post-processing of streamflow simulation <xref ref-type="bibr" rid="bib1.bibx73" id="paren.41"/>, and estimation of the limits of acceptability for hydrological models <xref ref-type="bibr" rid="bib1.bibx23" id="paren.42"/>. Further details on the construction of RF and QRF can be found in <xref ref-type="bibr" rid="bib1.bibx40 bib1.bibx43" id="text.43"/>, but, QRF can be viewed as an analog method <xref ref-type="bibr" rid="bib1.bibx15 bib1.bibx28" id="paren.44"/> that performs a weighted nearest-neighbor search for analogous events. Similarly to a classic RF, QRF grows a number of trees <inline-formula><mml:math id="M50" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>, with each tree trained on a bootstrapped subsample of the original training data.</p>
      <p id="d2e1629">Individual trees are trained according to the <xref ref-type="bibr" rid="bib1.bibx9" id="text.45"/> algorithm by minimizing a loss function and making successive splits with a predefined number of features <inline-formula><mml:math id="M51" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>. This tree-building process enables QRF to account for strongly correlated features, which is important given the strong correlation of some of the features used. For the purpose of this study, we use mean squared error (MSE) as a loss function to calculate the homogeneity of each group. The procedure continues recursively, with each resulting group split further until a minimum number of data points <inline-formula><mml:math id="M52" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula> in child splits is reached.</p>
      <p id="d2e1649">In the classic Random Forest (RF) algorithm, predictions from individual trees are averaged to produce a single deterministic output. In contrast, Quantile Regression Forest (QRF) leverages the leaf nodes of trees to compute proximity measures between a test input and training instances. For a prediction at time <inline-formula><mml:math id="M53" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> and given input <inline-formula><mml:math id="M54" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, each QRF tree is traversed using binary splits to reach a corresponding leaf node. A proximity weight <inline-formula><mml:math id="M55" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ω</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is then defined for each training instance <inline-formula><mml:math id="M56" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula> <xref ref-type="bibr" rid="bib1.bibx43" id="paren.46"/>, which is then used to estimate the cumulative distribution function (CDF) of the prediction uncertainty:

            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M57" display="block"><mml:mrow><mml:mover accent="true"><mml:mi>F</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mo>(</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo>|</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:msub><mml:mi mathvariant="italic">ω</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:msub><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>≤</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ω</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:msub><mml:mi mathvariant="italic">ω</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> denotes the hydrological error of training instance <inline-formula><mml:math id="M61" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math id="M62" display="inline"><mml:mover accent="true"><mml:mi>F</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:math></inline-formula> is the estimated CDF of uncertainty for <inline-formula><mml:math id="M63" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
      <p id="d2e1871">To provide reliable and sharp uncertainty estimates, we considered three hyperparameters for optimization: (i) The minimum number of samples at child nodes <inline-formula><mml:math id="M64" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula>, which affects tree depth and strongly impacts reliability and sharpness. Setting high values for the minimum samples per leaf might yield high reliability, but can lead to poor performance, as the trees are too general and information is lost. Low values result in overfitting and unreliable uncertainty estimates. (ii) The number of features per split, <inline-formula><mml:math id="M65" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>, which also shapes the QRF uncertainty estimates. Higher values can lead to under-dispersed uncertainties, while lower values may reduce sharpness. (iii) The number of trees <inline-formula><mml:math id="M66" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>, which controls precision and stability. A larger number of trees improves the quality of uncertainty estimates, but improvements diminish as computational cost increases, especially in larger models. Further details on hyperparameter values and selection are provided in Sect. <xref ref-type="sec" rid="Ch1.S2.SS7"/>. Additionally, we also use a <inline-formula><mml:math id="M67" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>-nearest neighbor (K-NN) <xref ref-type="bibr" rid="bib1.bibx71" id="paren.47"/> approach as a benchmark for the QRF methods used in the study. Like QRF, K-NN aims to find analogous events but based on the Euclidean distance between features. Here, K-NN is fitted locally on the same variables as for QRF. Further details on the fitting process and the hyperparameters used are provided in Appendix <xref ref-type="sec" rid="App1.Ch1.S2"/>.</p>
      <p id="d2e1911">Given Eq. (<xref ref-type="disp-formula" rid="Ch1.E2"/>), the estimated CDF is bounded by the training sample. QRF is unable to predict a quantile higher than the maximum observed in the training sample, which implies that QRF trained on a single basin is constrained by the range of errors in its training data. A more hydrologically diverse training dataset would alleviate this problem and enable QRF to adapt to more extreme events, provided that QRF is able to use the additional information properly.</p>

<table-wrap id="T2" specific-use="star"><label>Table 2</label><caption><p id="d2e1919">QRF variants of the study.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="5">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="center"/>
     <oasis:colspec colnum="3" colname="col3" align="center"/>
     <oasis:colspec colnum="4" colname="col4" align="center"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Configurations</oasis:entry>
         <oasis:entry colname="col2">Dynamic features</oasis:entry>
         <oasis:entry colname="col3">Static features</oasis:entry>
         <oasis:entry colname="col4">Hydroclimatological group</oasis:entry>
         <oasis:entry colname="col5">Number of models</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">QRF-local</oasis:entry>
         <oasis:entry colname="col2">✓</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5">564</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">QRF-regional</oasis:entry>
         <oasis:entry colname="col2">✓</oasis:entry>
         <oasis:entry colname="col3">✓</oasis:entry>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5">15</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">QRF-national</oasis:entry>
         <oasis:entry colname="col2">✓</oasis:entry>
         <oasis:entry colname="col3">✓</oasis:entry>
         <oasis:entry colname="col4">✓</oasis:entry>
         <oasis:entry colname="col5">1</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">QRF-basic</oasis:entry>
         <oasis:entry colname="col2">✓</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5">1</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="Ch1.S2.SS6">
  <label>2.6</label><title>QRF variants</title>
      <p id="d2e2037">Figure <xref ref-type="fig" rid="F2"/> presents the framework employed for the QRF configurations analyzed in this study. The local approach (QRF-local) refers to training the QRF algorithm on data from a single basin. Given their construction, spatial features are constant for each individual catchment and only time-series features can be fed to QRF in the local setup. QRF-local yields 564 independently trained QRF models, each specific to its respective catchment. Next, spatial variability is added as we extend QRF to a multi-site setting. The objective here is twofold: (i) to examine whether spatial diversity can improve the uncertainty estimates of QRF. This can be challenging, particularly since the used GR6J hydrological model is calibrated on an on-site basis; and (ii) to determine the optimal number of catchments to include in the training set for effectively capturing hydrological diversity and improving QRF predictions. We test QRF with two spatial settings: (i) a regional approach (QRF-region), where QRF is trained on data from catchments that are geographically close and thus potentially have similar error dynamics. In total, 15 regional QRF models are developed for the hydrological regions of the study (see Fig. <xref ref-type="fig" rid="F1"/>), based on hydroclimatological groupings of French catchments; (ii) a global approach (QRF-national), in which a QRF is trained on data from all catchments in the dataset. Both static and dynamic descriptors are used in the training process of QRF-region and QRF-national. However, QRF-national uses the catchment's hydrological region as an additional input feature, which cannot be used for QRF-region. Intuitively, in cases where QRF is unable to transfer information from different basins or when there are no useful analogs in similar catchments, QRF-local would yield better performance, as no information from other catchments is used to build the model. To assess the usefulness of static features for the multi-site QRF setup, we included QRF-basic, a global QRF approach fitted on all catchments of the study, but only with dynamic features. This experiment is expected to highlight whether dynamic time series features are sufficient to improve multi-site QRF predictions or that static features are essential for multi-site post-processing. Table <xref ref-type="table" rid="T2"/> presents the features used in the three configurations. The QRF models used in this study was fitted using the quantile-forest Python library <xref ref-type="bibr" rid="bib1.bibx32" id="paren.48"/>.</p>
      <p id="d2e2049">It is worth mentioning that in multi-site setups, the standardization procedure is an essential step that enables QRF to determine analogs across a set of diversified catchments, as the scales of streamflows and dynamic features (GR6J states and transformed variables) vary significantly. Standardization is important for a meaningful training process and for the identification of adequate analogous events. Initially, we standardized input data via the popular standard scaling method <xref ref-type="bibr" rid="bib1.bibx27" id="paren.49"/>, which transforms dynamic features – for each catchment – so that the average and standard deviation are set to 0 and 1, respectively. However, the method resulted in inconsistencies for catchments with outliers, as the standard deviation is sensitive to extreme values. To solve this issue, we opted for robust standardization <xref ref-type="bibr" rid="bib1.bibx27" id="paren.50"/>, which removes median values of dynamic features and the target errors <inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> defined in Sect. <xref ref-type="sec" rid="Ch1.S2.SS3"/>.</p>
</sec>
<sec id="Ch1.S2.SS7">
  <label>2.7</label><title>QRF hyperparameters</title>
      <p id="d2e2079">Since we use QRF for probabilistic predictions, hyperparameter selection was based on the mean of the alpha score and CRPSS values. This would enable a selection based on the quality of overall uncertainty estimates with an emphasis on reliability. For QRF-local, hyperparameters were tuned independently for each catchment and the set maximizing the aforementioned criteria during the validation period was selected. QRF-region and QRF-national hyperparameters were selected based on median criteria among the regions' catchments. The selection criteria for QRF-local are specific to each catchment, which is expected to enhance the results of the local model compared to approaches that use fixed hyperparameters across multiple catchments. Table <xref ref-type="table" rid="T3"/> presents the hyperparameters selected for optimization.</p>

<table-wrap id="T3"><label>Table 3</label><caption><p id="d2e2087">Hyperparameters set optimized for QRF.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="2">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Hyperparameter</oasis:entry>
         <oasis:entry colname="col2">Values</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Min samples leaf</oasis:entry>
         <oasis:entry colname="col2">5, 10, 25, 50, 75, 100, 150, 200, 400, 600</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Number of estimators</oasis:entry>
         <oasis:entry colname="col2">200, 400, 600</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Max features</oasis:entry>
         <oasis:entry colname="col2">sqrt, 8, 16</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Seeds</oasis:entry>
         <oasis:entry colname="col2">0, 1, 2</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Assessment criteria</title>
      <p id="d2e2159">In this section, we present the probabilistic metrics used to evaluate the three variants of QRF. The criterion followed for probabilistic predictions conforms to <xref ref-type="bibr" rid="bib1.bibx21" id="text.51"/>'s objective of maximizing reliability subject to sharpness. In this context, reliability refers to the statistical consistency between the predicted uncertainty distributions and the observed streamflow values, while sharpness is a property of predictions exclusively and refers to the magnitude of the uncertainty distributions. In practice, uncertainty estimates that closely align with observed streamflows are more accurate and reliable, while predictions with smaller magnitudes are considered sharper. To assess these properties, we used two types of metrics. Distributional metrics evaluate the full predictive distribution, while interval metrics focus on a pre-specified predictive interval. As presented below, reliability is measured by the alpha score and the coverage ratio. For assessing sharpness, we used the dispersion score and average width interval. Additionally, deterministic evaluation criteria were also used to provide a more holistic assessment of the proposed QRF variants. Each variant predicts 200 quantile members at each time step 200 quantile members equally spaced between 0.005 and 0.995. The scores are calculated using the EvalHyd <xref ref-type="bibr" rid="bib1.bibx25" id="paren.52"/> Python library.</p>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Distributional metrics: alpha score, dispersion score, and CRPSS</title>
      <p id="d2e2175">The alpha score <xref ref-type="bibr" rid="bib1.bibx56" id="paren.53"/> targets reliability. It calculates the closeness of predicted uncertainty distributions to the statistical distribution of observed streamflows. If the uncertainty distributions of streamflows are reliably quantified, the observations correspond to realisations from the uncertainty distributions of streamflows. In practice, the alpha score compares the empirical distribution of the probability integral transform (PIT) values to the uniform distribution on <inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:mn mathvariant="normal">0</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>. A perfect alpha score corresponds to uniform PIT distributions while deviations of PIT values from the uniform distribution indicate lower reliability. The values of the metric range from 0 (worst reliability) to 1 (perfect reliability).</p>
      <p id="d2e2197">For sharpness, we used the dispersion score calculated as a skill score following <xref ref-type="bibr" rid="bib1.bibx6" id="text.54"/>. The method consists in computing the continuous ranked probability score (CRPS; <xref ref-type="bibr" rid="bib1.bibx20" id="altparen.55"/>) by comparing uncertainty distributions with their medians, as detailed in Appendix <xref ref-type="sec" rid="App1.Ch1.S5"/>. This formulation targets the magnitude of the distributions instead of the agreement with the observations. To obtain a positively oriented score, the dispersion score is expressed as a skill score by dividing by the same quantity computed for the climatological distribution. The resulting metric scores 1 for a perfect point prediction; positive values indicate better sharpness compared to the climatological distribution, while negative values indicate worse performance.</p>
      <p id="d2e2208">We also compute CRPS by comparing uncertainty distributions to observed streamflow values, which allows to assess both reliability and sharpness. We express CRPS as a skill score (CRPSS) relative to the climatological distribution, and similarly to the dispersion score, CRPSS equals 1 for perfect point prediction; positive values indicate better performance than the climatological distribution, while negative values indicate worse performance. Throughout the study, the empirical distribution of observed streamflows is defined over the training period, as detailed in Appendix <xref ref-type="sec" rid="App1.Ch1.S5"/>.</p>
</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Coverage ratio, average interval width, and Winkler score</title>
      <p id="d2e2221">To provide a more comprehensive assessment of predictive uncertainty, evaluation metrics were calculated for prediction intervals at the 90 % and 95 % confidence levels. The coverage ratio (CR) is a measure of reliability that counts the proportion of observations that lie within the prediction intervals. Values closest to the desired coverage level (i.e., 90 % or 95 %) are best. Scoring lower values diminishes the utility of the model (under-coverage), while scoring higher values than the desired coverage is less problematic, but indicates that the model can provide sharper intervals (over-coverage). Moreover, it is worth highlighting that the two metrics used to assess reliability are closely related: the coverage ratio reflects reliability at a specific confidence level, while the alpha score aggregates coverage across all confidence levels. To assess the sharpness, we employ the average width metric (AW), which corresponds to the average width of the prediction interval during the evaluation period. We also evaluate the Winkler score (WS), which simultaneously includes both criteria, and enables an easy comparison between the variants of the study. Both AW and WS are presented as skill scores – AW skill score (AWSS) and Winkler skill score (WSS) – relative to the climatological distribution.</p>
</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Deterministic metrics</title>
      <p id="d2e2232">Although the main focus of this study is probabilistic post-processing, some decision-makers may require deterministic predictions. Therefore, we also evaluate mean predictions to compare the different post-processing variants of the study. We use the popular Nash–Sutcliffe efficiency (NSE) <xref ref-type="bibr" rid="bib1.bibx45" id="paren.56"/> and Kling–Gupta efficiency (KGE) <xref ref-type="bibr" rid="bib1.bibx24" id="paren.57"/> metrics to gauge the quality of deterministic predictions in multi-site learning setups.</p>

      <fig id="F3" specific-use="star"><label>Figure 3</label><caption><p id="d2e2243">Example randomly selected for a catchment (station code K287191001 at Giroux with 756 <inline-formula><mml:math id="M70" display="inline"><mml:mrow class="unit"><mml:msup><mml:mi mathvariant="normal">km</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>) a catchment between 1 January 2016 and 1 January 2017 comprising both high- and low-flow events. Uncertainty estimates from QRF-local (orange) are plotted with the observed flows (blue). The orange line represents the median of uncertainty estimates, while darker orange shades indicate regions of higher probability (25 % and 75 % quantiles). Lighter regions indicate low probability quantiles (5 % and 95 % in addition to 2.5 % and 97.5 % quantiles).</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026-f03.png"/>

        </fig>

</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Results</title>
      <p id="d2e2272">In this section, we compare each QRF variant according to its performance during the testing period. We investigate flow ranges in which multi-site learning is preferable, and we explore the importance of including catchment descriptors for regional QRFs. The results for the K-NN approach can be found in Appendix <xref ref-type="sec" rid="App1.Ch1.S2"/>. Figure <xref ref-type="fig" rid="F3"/> illustrates the uncertainty estimates of QRF-local for a randomly selected catchment.</p>

      <fig id="F4" specific-use="star"><label>Figure 4</label><caption><p id="d2e2281">Hyperparameters optimization results for QRF-national. The selection criterion is the median hyperparameter criterion with equal contribution of the alpha score and CRPSS across the catchments of the study.</p></caption>
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026-f04.png"/>

      </fig>

      <fig id="F5" specific-use="star"><label>Figure 5</label><caption><p id="d2e2292">CDF of distributional metrics across the 564 catchments for the QRF variants in the study. Curves that are closer to the right of the plots indicate better performance. The blue line represents the performance of QRF-local, orange represents QRF-region, and green represents QRF-national.</p></caption>
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026-f05.png"/>

      </fig>

<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Hyperparameter tuning</title>
      <p id="d2e2309">We conducted a hyperparameter grid search for each QRF variant using an equal contribution of the alpha score and CRPSS <inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>+</mml:mo><mml:mrow class="chem"><mml:mi mathvariant="normal">CRPSS</mml:mi></mml:mrow></mml:mrow><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> as mentioned in Sect. <xref ref-type="sec" rid="Ch1.S2.SS7"/>. Figure <xref ref-type="fig" rid="F4"/> shows the hyperparameter tuning procedure for QRF-national's three hyperparameters: the minimum number of samples at child nodes, the maximum number of candidate variables to use for splitting at each tree node, and the size of the forest (number of trees). We aim to obtain a single set of hyperparameters, and the selection criterion is based on the median of the alpha score and CRPSS across all catchments of the study. Each subplot illustrates how the selection criterion varies with the hyperparameter under consideration, while the variability reflects the influence of the remaining hyperparameters. This allows to assess the sensitivity of the QRF-national to each hyperparameter during the optimisation process. Overall, the performance of QRF was most sensitive to the minimum number of samples at child nodes. QRF was trained with minimum number of samples at child nodes ranging from 5 to 600 data points.</p>
      <p id="d2e2338">It is notable that best results were recorded for lower values of the minimum samples at leaf nodes. The improvement slows for values lower than 25. As such, a minimum sample at leaf nodes of 10 is selected. Overall, QRF was found to be fairly insensitive to the number of candidate predictors used for splitting at each node. By default, the quantile-forest library uses the integer value of the square root (sqrt) of the total number of predictors for this parameter. With 31 total predictors for QRF-national, 6 would be the default. Figure <xref ref-type="fig" rid="F4"/> shows that using the default value of the square root was slightly better. For the number of trees parameter, a forest with more trees will generally be more skillful than one with fewer trees, as it can fit on the nuances of the training set, and there is a point when the rate of improvement with more trees is negligible, as noted in <xref ref-type="bibr" rid="bib1.bibx46" id="text.58"/> and <xref ref-type="bibr" rid="bib1.bibx8" id="text.59"/>. Most of the boxplot ranges overlap, and the results appear to be relatively insensitive to this QRF parameter over the range considered.. For the experimented values, Fig. <xref ref-type="fig" rid="F4"/> shows that a number of 400 trees allows for slightly better performances. We selected hyperparameters for the other QRF variants using a more specific basis: per catchment for QRF-local and per region for QRF-region. One can expect that catchment specific hyperparameter tuning might help model performance. The distribution of the selected hyperparameters for QRF-local and-region is provided in Appendix <xref ref-type="sec" rid="App1.Ch1.S4"/>, as mentioned in Sect. 2.7.</p>

      <fig id="F6" specific-use="star"><label>Figure 6</label><caption><p id="d2e2355">Comparative plots between QRF variants in the study during the testing period. The first row shows the alpha score, the second row shows dispersion, and the third row shows CRPSS. The first column compares metric values for QRF-local vs. QRF-region, the second column for QRF-national vs. QRF-region, and the third column for QRF-national vs. QRF-region</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026-f06.png"/>

        </fig>

</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><title>Alpha score, Dispersion score, and CRPSS</title>
      <p id="d2e2372">We first present our results with distributional metrics in Fig. <xref ref-type="fig" rid="F5"/>, which shows the cumulative distribution of reliability, sharpness, and CRPSS for the 564 catchments of the study. QRF-region and QRF-national slightly improve reliability compared to QRF-local. However, multi-site learning does not yield better alpha scores for well-calibrated stations with QRF-local. Figure <xref ref-type="fig" rid="F6"/> shows a direct comparison of the proposed variants and indicates that improvements were most noticeable for catchments where QRF-local provided low reliability, i.e., 25 % of the basins with the lowest alpha scores (the 25 % quantiles of the alpha score were 0.742 for QRF-local and 0.76 and 0.76 for QRF-region and QRF-national, respectively). In terms of sharpness, the different QRF variants performed similarly, while multi-site setups significantly improve CRPSS values. Among the QRF variants, QRF-national generally outperformed QRF-local, improving CRPSS by approximately 2 %, except in the case of four catchments, where QRF-local performed significantly better. Additionally, QRF-region improved CRPSS for 69 % of the catchments compared to QRF-local, while QRF-national showed improvements in 88 % of the catchments. Overall, the improvements are less apparent for reliability, but multi-site QRFs seem to improve performance for catchments with initially limited reliability in the local setup. As highlighted in Sect. <xref ref-type="sec" rid="Ch1.S3"/> CRPSS improvements combine both reliability or sharpness. Given that the sharpness metric was nearly identical across the QRF variants in the study, we suspect that the CRPSS improvements are mainly due to improvements in reliability. This is noteworthy, as the objective of probabilistic predictions is to improve reliability prior to enhancing sharpness <xref ref-type="bibr" rid="bib1.bibx21" id="paren.60"/>.</p>

      <fig id="F7" specific-use="star"><label>Figure 7</label><caption><p id="d2e2386">Box plots of 90 % interval-based metrics across the 564 catchments of the study. Blue indicates the performance of QRF-local, orange indicates QRF-region, and green indicates QRF-national.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026-f07.png"/>

        </fig>

</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Interval metrics</title>
      <p id="d2e2403">We now consider the 90 % interval metrics. Figure <xref ref-type="fig" rid="F7"/> represents the box plots across the 564 catchments of the study for coverage ratio, average interval width, and Winkler skill scores. The multi-site learning setup was beneficial for QRF and improved the reliability of the predictive intervals. For instance, the median coverage ratios were set at 0.87, 0.89, and 0.89 for QRF-local, QRF-region, and QRF-national, respectively. Similar to the alpha score, the improvements in the multi-site QRF variants are most noticeable for stations with low reliability in the local approach. As shown in Fig. <xref ref-type="fig" rid="F7"/>, prediction intervals from multi-site QRFs over-cover observed streamflows (coverage ratio greater than 0.9) for certain catchments. While not optimal, this is a preferable outcome compared to the local approach, where uncertainty intervals more frequently miss the observed streamflows and tends to underestimate uncertainty in some catchments. The improvements are also observed for Winkler skill score, where QRF-national provided the best results. The average interval width was similar for all the variants in the study, further indicating that improvements in multi-site learning in the case of QRF mainly relate to reliability. For the sake of completeness, we include interval metrics for the 95 % predictive uncertainty interval in Appendix <xref ref-type="sec" rid="App1.Ch1.S6"/>, as the conclusions remain the same.</p>

      <fig id="F8" specific-use="star"><label>Figure 8</label><caption><p id="d2e2414">CDF of deterministic metrics across the 564 catchments for the QRF variants during the testing period. Curves that are closer to the right of the plots indicate better performance. The blue line represents the performance of mean uncertainty estimates for QRF-local, orange for QRF-region, green for QRF-national, while red represents the performance of raw GR6J predictions.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026-f08.png"/>

        </fig>

</sec>
<sec id="Ch1.S4.SS4">
  <label>4.4</label><title>Deterministic metrics</title>
      <p id="d2e2431">Figure <xref ref-type="fig" rid="F8"/> shows the cumulative distribution function for the deterministic metrics of Nash–Sutcliffe efficiency (NSE) and Kling–Gupta efficiency (KGE) scores for mean predictions. Multi-site learning improves NSE for most catchments, but for KGE, improvements are most apparent when the local approach yields low KGE values. For example, when investigating catchments at the lower 25 % percentile, QRF-region and QRF-national improved the median KGE by 6 %. However, for catchments where QRF-local provided decent KGE scores (top 25 % performers), multi-site setups yielded similar scores to a single-basin approach. This would highlight the equalizing effects of multi-site learning for QRF, as it is most impactful for catchments with limited performance in the single-basin post-processing.</p>
      <p id="d2e2436">Figure <xref ref-type="fig" rid="F8"/> also provides deterministic metrics for the uncorrected GR6J model predictions. The figure highlights that the proposed QRF methods can improve hydrological deterministic predictions, especially for NSE. For example, QRF-national produced better NSE performance compared to GR6J predictions (0.87 vs 0.86 in median NSE) and for 75 % of the study’s catchments. Overall, QRF variants had better NSE for the majority of the catchments. For KGE, the uncorrected GR6J estimates outperform all tested QRF approaches.</p>
      <p id="d2e2441">We argue that these results show: (i) the ability of QRF in its multi-site setup to identify and transfer useful information from neighbouring catchments; (ii) although the improvements relate to both deterministic and uncertainty predictions, they are most significant for coverage ratio, CRPSS, WSS, NSE, and KGE. Building on these findings, we investigated whether these benefits were more pronounced under specific hydrological conditions.</p>

<table-wrap id="T4" specific-use="star"><label>Table 4</label><caption><p id="d2e2448">Summary of average metrics for different QRF methods across the 564 catchments of the study. Three flow ranges are included: high, medium, and low simulated flows. The bold numbers indicate better performance in each group.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="8">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3">Alpha score</oasis:entry>
         <oasis:entry colname="col4">Dispersion score</oasis:entry>
         <oasis:entry colname="col5">CRPSS</oasis:entry>
         <oasis:entry colname="col6">CR90.0</oasis:entry>
         <oasis:entry colname="col7">AWSS</oasis:entry>
         <oasis:entry colname="col8">WSS</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Regime</oasis:entry>
         <oasis:entry colname="col2">QRF variant</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"/>
         <oasis:entry colname="col5"/>
         <oasis:entry colname="col6"/>
         <oasis:entry colname="col7"/>
         <oasis:entry colname="col8"/>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Low flows (<inline-formula><mml:math id="M72" display="inline"><mml:mo lspace="0mm">&lt;</mml:mo></mml:math></inline-formula> 33 % <inline-formula><mml:math id="M73" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mtext>med</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col2">QRF_local</oasis:entry>
         <oasis:entry colname="col3"><bold>0.769</bold></oasis:entry>
         <oasis:entry colname="col4"><bold>0.905</bold></oasis:entry>
         <oasis:entry colname="col5">0.914</oasis:entry>
         <oasis:entry colname="col6">0.848</oasis:entry>
         <oasis:entry colname="col7"><bold>0.921</bold></oasis:entry>
         <oasis:entry colname="col8">0.919</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">QRF_region</oasis:entry>
         <oasis:entry colname="col3">0.767</oasis:entry>
         <oasis:entry colname="col4">0.901</oasis:entry>
         <oasis:entry colname="col5">0.919</oasis:entry>
         <oasis:entry colname="col6"><bold>0.874</bold></oasis:entry>
         <oasis:entry colname="col7">0.918</oasis:entry>
         <oasis:entry colname="col8">0.925</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">QRF_national</oasis:entry>
         <oasis:entry colname="col3">0.765</oasis:entry>
         <oasis:entry colname="col4"><bold>0.905</bold></oasis:entry>
         <oasis:entry colname="col5"><bold>0.921</bold></oasis:entry>
         <oasis:entry colname="col6">0.871</oasis:entry>
         <oasis:entry colname="col7">0.920</oasis:entry>
         <oasis:entry colname="col8"><bold>0.927</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Medium flows (<inline-formula><mml:math id="M74" display="inline"><mml:mo lspace="0mm">&gt;</mml:mo></mml:math></inline-formula> 34 % <inline-formula><mml:math id="M75" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mtext>med</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M76" display="inline"><mml:mo>&lt;</mml:mo></mml:math></inline-formula> 66 % <inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mtext>med</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col2">QRF_local</oasis:entry>
         <oasis:entry colname="col3">0.819</oasis:entry>
         <oasis:entry colname="col4"><bold>0.764</bold></oasis:entry>
         <oasis:entry colname="col5">0.792</oasis:entry>
         <oasis:entry colname="col6">0.862</oasis:entry>
         <oasis:entry colname="col7"><bold>0.808</bold></oasis:entry>
         <oasis:entry colname="col8">0.812</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">QRF_region</oasis:entry>
         <oasis:entry colname="col3"><bold>0.833</bold></oasis:entry>
         <oasis:entry colname="col4">0.752</oasis:entry>
         <oasis:entry colname="col5">0.797</oasis:entry>
         <oasis:entry colname="col6"><bold>0.882</bold></oasis:entry>
         <oasis:entry colname="col7">0.798</oasis:entry>
         <oasis:entry colname="col8">0.821</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">QRF_national</oasis:entry>
         <oasis:entry colname="col3">0.831</oasis:entry>
         <oasis:entry colname="col4">0.758</oasis:entry>
         <oasis:entry colname="col5"><bold>0.802</bold></oasis:entry>
         <oasis:entry colname="col6"><bold>0.882</bold></oasis:entry>
         <oasis:entry colname="col7">0.802</oasis:entry>
         <oasis:entry colname="col8"><bold>0.825</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">High flows (<inline-formula><mml:math id="M78" display="inline"><mml:mo lspace="0mm">&gt;</mml:mo></mml:math></inline-formula> 67 % <inline-formula><mml:math id="M79" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mtext>med</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col2">QRF_local</oasis:entry>
         <oasis:entry colname="col3">0.809</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M80" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.216</oasis:entry>
         <oasis:entry colname="col5">0.225</oasis:entry>
         <oasis:entry colname="col6">0.870</oasis:entry>
         <oasis:entry colname="col7"><bold>0.269</bold></oasis:entry>
         <oasis:entry colname="col8">0.381</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">QRF_region</oasis:entry>
         <oasis:entry colname="col3"><bold>0.827</bold></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M81" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.342</oasis:entry>
         <oasis:entry colname="col5">0.247</oasis:entry>
         <oasis:entry colname="col6">0.889</oasis:entry>
         <oasis:entry colname="col7">0.154</oasis:entry>
         <oasis:entry colname="col8">0.391</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">QRF_national</oasis:entry>
         <oasis:entry colname="col3">0.826</oasis:entry>
         <oasis:entry colname="col4"><bold>-0.213</bold></oasis:entry>
         <oasis:entry colname="col5"><bold>0.264</bold></oasis:entry>
         <oasis:entry colname="col6"><bold>0.890</bold></oasis:entry>
         <oasis:entry colname="col7">0.249</oasis:entry>
         <oasis:entry colname="col8"><bold>0.427</bold></oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="Ch1.S4.SS5">
  <label>4.5</label><title>How do QRF uncertainty estimates perform for different flow ranges?</title>
      <p id="d2e2886">Here, we aim to understand how the proposed QRF approaches perform across different flow ranges. Table <xref ref-type="table" rid="T4"/> summarizes the average values of the alpha, dispersion, CRPSS, and interval scores for three flow groups: high (<inline-formula><mml:math id="M82" display="inline"><mml:mo lspace="0mm">&gt;</mml:mo></mml:math></inline-formula> 67 % <inline-formula><mml:math id="M83" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mtext>med</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>), medium (<inline-formula><mml:math id="M84" display="inline"><mml:mo lspace="0mm">&gt;</mml:mo></mml:math></inline-formula> 34 % <inline-formula><mml:math id="M85" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mtext>med</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M86" display="inline"><mml:mo>&lt;</mml:mo></mml:math></inline-formula> 66 % <inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mtext>med</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>), and low flows (<inline-formula><mml:math id="M88" display="inline"><mml:mo lspace="0mm">&lt;</mml:mo></mml:math></inline-formula> 33 % <inline-formula><mml:math id="M89" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mtext>med</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>).</p>
      <p id="d2e2964">Performances were stratified based on the median values of the uncertainty distributions. Under low-flow conditions, the scores are similar, especially alpha and dispersion scores. But for higher simulated flows, multi-site QRFs provide better reliability (alpha score and coverage ratio) and better overall performance (CRPSS and WSS). Although QRF-local was able to provide narrower interval widths, especially for higher flows, it had lower reliability compared to multi-site QRFs. QRF-region and QRF-national adapt to higher-flow ranges by providing wider uncertainty estimates and enable better reliability and conditionality, as reflected in the improved CRPSS and Winkler scores.</p>

      <fig id="F9" specific-use="star"><label>Figure 9</label><caption><p id="d2e2969">Distributional metrics across the 564 catchments for the QRF variants in the study. Curves that are closer to the right of the plots indicate better performance. The red line represents the performance of QRF-basic and green represents QRF-national.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026-f09.png"/>

        </fig>

</sec>
<sec id="Ch1.S4.SS6">
  <label>4.6</label><title>Impact of static descriptors</title>
      <p id="d2e2986">To understand the impact of static catchment descriptors, Fig. <xref ref-type="fig" rid="F9"/> illustrates the distributional metrics for QRF-national and QRF-basic. QRF-basic is a multi-site QRF trained across all catchments of the study and using the same features as for QRF-local (no static features). Notable differences between the two variants are observed: in terms of reliability (median 0.827 vs. 0.806 across all catchments), sharpness (0.637 vs. 0.614), and CRPSS (0.706 vs. 0.691). Largest differences are observed for CRPSS, as QRF-national was better for 80 % of the stations of the study. Furthermore, Fig. <xref ref-type="fig" rid="FF2"/> in Appendix <xref ref-type="sec" rid="App1.Ch1.S6"/> shows that QRF-basic had very similar CRPSS values as for QRF-local. These results suggest that the performance improvements in multi-site QRF models are not solely due to the inclusion of hydrological diversity in the training data. Static catchment descriptors play a significant role, and the selection of informative static features appears to be critical for effective multi-site QRF implementations.</p>

      <fig id="F10" specific-use="star"><label>Figure 10</label><caption><p id="d2e2997">Example uncertainty estimates for a Mediterranean catchment (Solenzara River at Sari-Solenzara [Canniciu]; station Y960000102) from QRF-local (left) and QRF-national (right), covering the period from July 2017 to January 2020. The estimated 5 % quantile is shown in blue, and the 95 % quantile in green. The QRF-national model noticeably overestimates the upper (95 %) uncertainty quantile.</p></caption>
          <graphic xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026-f10.png"/>

        </fig>

<table-wrap id="T5" specific-use="star"><label>Table 5</label><caption><p id="d2e3009">Average and median (shown in parentheses) differences between QRF-local and QRF-national models across distributional metrics, evaluated across different error scale groups. For catchments with extreme error variability (Group 1), QRF-national model degrades the quality of uncertainty estimates.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Error scale</oasis:entry>
         <oasis:entry colname="col2">Alpha score</oasis:entry>
         <oasis:entry colname="col3">Dispersion score</oasis:entry>
         <oasis:entry colname="col4">CRPSS</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Group 1</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M90" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.26 (<inline-formula><mml:math id="M91" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>0.241)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M92" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.786 (<inline-formula><mml:math id="M93" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>0.028)</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M94" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.297 (<inline-formula><mml:math id="M95" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>0.084)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Group 2</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M96" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.023 (<inline-formula><mml:math id="M97" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>0.032)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M98" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.091 (<inline-formula><mml:math id="M99" display="inline"><mml:mo lspace="0mm">-</mml:mo></mml:math></inline-formula>0.021)</oasis:entry>
         <oasis:entry colname="col4">0.009 (0.011)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Group 3</oasis:entry>
         <oasis:entry colname="col2">0.015 (0.007)</oasis:entry>
         <oasis:entry colname="col3">0.01 (0.008)</oasis:entry>
         <oasis:entry colname="col4">0.021 (0.022)</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="Ch1.S4.SS7">
  <label>4.7</label><title>Sensitivity to scale (potential for improving the performance of QRF-national)</title>
      <p id="d2e3166">Following the results of the previous section, we found that multi-site learning can significantly degrade the performance for four cases across distinct hydrological regions; an example of such catchments is presented in Fig. <xref ref-type="fig" rid="F10"/>. To investigate this, Table <xref ref-type="table" rid="T5"/> presents the differences in performance (alpha score and CRPSS) between QRF-national and QRF-local based on the variability of errors <inline-formula><mml:math id="M100" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> for three groups: group 1 is characterized by important error variability (<inline-formula><mml:math id="M101" display="inline"><mml:mo lspace="0mm">&gt;</mml:mo></mml:math></inline-formula> 1.5), group 2 also has important error variability but to a lesser degree (0.77 <inline-formula><mml:math id="M102" display="inline"><mml:mo>&lt;</mml:mo></mml:math></inline-formula> and  <inline-formula><mml:math id="M103" display="inline"><mml:mo>&lt;</mml:mo></mml:math></inline-formula> 1.5), and group 3 (<inline-formula><mml:math id="M104" display="inline"><mml:mo lspace="0mm">&lt;</mml:mo></mml:math></inline-formula> 0.77) which can be seen as having normal variability. Values 1.5 and 0.77 are the 99 % and 90 % quantiles of the interquartile range used to standardize the errors <inline-formula><mml:math id="M105" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">ϵ</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>. QRF-national performed poorly for catchments with significant variations in the target variable, with notable decreases in reliability and CRPSS compared to a single-basin approach. These results highlight that robust standardization of input variables and errors is key to delivering meaningful multi-site QRFs, since this enables the algorithm to find analogs from locally calibrated hydrological model inputs. Furthermore, the aforementioned scale discrepancies occurred specifically for catchments characterized by frequent zero values in simulated and observed streamflows. This can be problematic when using logarithmic relative hydrological errors. Figure <xref ref-type="fig" rid="F10"/> illustrates QRF-local and QRF-national predictions for the 5 % and 95 % quantiles alongside observed flows for the Y960000102 catchment. Clearly, QRF-national overestimates the upper quantile. The local approach thus yields better results, since the error dynamics of this catchment are unconventional compared to the other catchments of the study.</p>
</sec>
</sec>
<sec id="Ch1.S5">
  <label>5</label><title>Discussion</title>
<sec id="Ch1.S5.SS1">
  <label>5.1</label><title>When and where is it preferable to use a multi-site learning setup?</title>
      <p id="d2e3243">Although training QRF on local data yields good uncertainty estimates, as discussed in previous studies <xref ref-type="bibr" rid="bib1.bibx60 bib1.bibx73" id="paren.61"/>, using a multi-site setup can slightly improve QRF performance. Our results indicate that the best improvements are achieved with QRF-national, which includes all 564 catchments of the study. The improvements mainly concern (i) catchments where local information is not sufficient (i.e., limited QRF-local performance), (ii) CRPSS and WSS scores for nearly all catchments, and (iii) periods of higher flows.</p>
      <p id="d2e3249">The results presented in Sect. <xref ref-type="sec" rid="Ch1.S4"/> indicate that multi-site learning improves the performance of QRF models, and that larger models yield better uncertainty estimates (QRF-national <inline-formula><mml:math id="M106" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> QRF-region <inline-formula><mml:math id="M107" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> QRF-local). We find this result interesting, since one might argue that regional models in the QRF-region approach provide an equilibrium in spatial variability by aggregating similar catchments and, possibly, similar error dynamics <xref ref-type="bibr" rid="bib1.bibx31" id="paren.62"/>. Such an approach confines the QRF analog search to specific hydrological regions, without extending the search to the entire study area. QRF-national, however, is not constrained by the predefined hydro-climatological regions. Region indicators are included as input features for QRF-national, and the algorithm is trained to find analogous events using these indicators, but is not strictly limited by them. Spatial variability appears to be beneficial for QRF, provided that appropriate catchment descriptors are used, and incorporating explicit measures of catchment similarity could further improve multi-site learning with QRF <xref ref-type="bibr" rid="bib1.bibx26 bib1.bibx35" id="paren.63"/>.</p>
      <p id="d2e3274">Most improvements are noted for high and medium flows. QRF-national provided a better alpha score, i.e., reliability for 60 % of the catchments during high and medium flows compared to QRF-local, but performance was identical for low flows. As highlighted in <xref ref-type="bibr" rid="bib1.bibx5" id="text.64"/> and <xref ref-type="bibr" rid="bib1.bibx1" id="text.65"/>, high flows are generally more difficult to predict and some high-flow events cannot be predicted exclusively from local historical data. QRF makes use of information from neighboring catchments to provide uncertainty estimates for these events that can be more challenging to predict. Local information seems to be sufficient to characterize low-flow events.</p>
      <p id="d2e3283">The provided analyses highlighted a limitation of the multi-site QRFs which concerns catchments characterized by frequent zero flow values. Modelling hydrological model errors of ephemeral catchments is generally challenging <xref ref-type="bibr" rid="bib1.bibx42 bib1.bibx39" id="paren.66"/>, particularly with the use of a log-based error transformation. Figure <xref ref-type="fig" rid="F10"/> shows that multi-site learning can degrade the predictions for ephemeral catchments as uncertainties are overestimated. Although we have added the <inline-formula><mml:math id="M108" display="inline"><mml:mi mathvariant="italic">δ</mml:mi></mml:math></inline-formula> offset parameter, the use of an alternative transformation that is better suited to zero flow values (e.g., a Box–Cox transformation) could better stabilize hydrological errors used to train the QRF models for ephemeral catchments. Another solution could be the treatment of such catchments separately when training multi-site QRF. As showed in Fig. <xref ref-type="fig" rid="F10"/>, QRF-local better managed the case of zero flow values. In the literature, other approaches <xref ref-type="bibr" rid="bib1.bibx26 bib1.bibx18" id="paren.67"/> use adapted catchment groupings based on clustering approaches grouping homogeneous catchments. The use the number of zero flow values or the clusters as inputs to QRF can allow QRF to better distinguish catchments characterized by this issue.</p>
      <p id="d2e3304">Also, we expected that the proposed variants would also improve KGE as was observed in the previous studies of <xref ref-type="bibr" rid="bib1.bibx73 bib1.bibx57 bib1.bibx41" id="paren.68"/>. For example, <xref ref-type="bibr" rid="bib1.bibx41" id="text.69"/> used a closely related deterministic RF framework for hydrological error correction and found that the hybrid RF approach boosted streamflow predictions from a median of <inline-formula><mml:math id="M109" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.03 to 0.51. Here, the post-processing was not beneficial for KGE performances, and we suggest that this can be attributed to how QRF hyperparameters were selected. The used selection criterion in this study aims to maximize the probabilistic performances of reliability and sharpness of the uncertainty estimates. While for the aforementioned studies, the RF post-processor was optimized specifically for deterministic error correction.</p>
</sec>
<sec id="Ch1.S5.SS2">
  <label>5.2</label><title>What is the importance of meaningful catchment attributes?</title>
      <p id="d2e3328">We showed in Fig. <xref ref-type="fig" rid="F9"/> that the improvements in QRF-national depend on the use of static descriptors. A nation-wide QRF variant with no catchment attributes (identical input descriptors to the local variant) performed worse than a classic single-catchment QRF. This indicates that increasing hydrological diversity and lumping more catchment data are not the primary drivers of performance improvements. The information shared within a multi-site setup is best used by the QRF algorithm in conjunction with quality catchment descriptors. This would enable better uncertainty characterization and improved analog searches in similar catchments. The catchment descriptors used are readily available in the CAMELS-FR and other CAMELS datasets, making the use of such descriptors straightforward. Furthermore, a globally parametrized QRF post-processor is able to extend its uncertainty estimates into ungauged catchments. <xref ref-type="bibr" rid="bib1.bibx41" id="text.70"/> found that RF is able to learn global mappings and improve deterministic estimates in poorly gauged and ungauged basins. Similar improvements could be obtained in an uncertainty estimation at ungauged catchments context, if appropriate catchment descriptors that do not rely on observed streamflows are selected.</p>
</sec>
<sec id="Ch1.S5.SS3">
  <label>5.3</label><title>Multi-site QRF for extrapolation of hydrological uncertainties</title>
      <p id="d2e3344">We used a large sample dataset (CAMELS-FR) to train the QRF model, and many practical hydrological applications can be interested in applying QRF post-processing to catchments not included in the training set. Our study demonstrates the ability of the QRF model to make use of hydrological information from neighbouring catchments to improve uncertainty estimates. Building on this finding, we suggest that applying a multi-site QRF, supported by appropriate catchment descriptors, to catchments outside the training set is likely to yield improved uncertainty estimates compared to a QRF model trained with single catchment data. However, the quality of these estimates remains unexplored, as the generalizability of multi-site QRF variants would depend on the similarity between hydrological model errors between the training catchments and the target regions to which uncertainty estimates are extrapolated. A spatio-temporal cross-validation experiment <xref ref-type="bibr" rid="bib1.bibx18" id="paren.71"/> that splits the training and testing by catchments and time periods can be used to understand the flexibility of multi-site QRF post-processing. Also, the proposed framework can be adapted for a prediction at ungauged basins. to explore this extension, the main practical difficulty lies in obtaining consistent hydrological model states for ungauged catchments, and to adapt the significant additional uncertainty usually associated with such settings <xref ref-type="bibr" rid="bib1.bibx55 bib1.bibx48 bib1.bibx7" id="paren.72"/>. If this can be properly handled, the proposed QRF multi-site variants could provide meaningful uncertainty estimates in the context of uncertainty estimation for ungauged catchments. A comparison to LSTM based post-processing techniques would be also interesting, as LSTMs perform well in ungauged basin setting <xref ref-type="bibr" rid="bib1.bibx34" id="paren.73"/>.</p>
</sec>
<sec id="Ch1.S5.SS4">
  <label>5.4</label><title>On model complexity and computational time</title>
      <p id="d2e3364">Table <xref ref-type="table" rid="T6"/> presents the number of parameters for each QRF variant, calculated as the product of the number of trees and the parameters of each tree (e.g., split thresholds, used input features). QRF-region and QRF-local exhibit a similar number of parameters, despite QRF-region providing better uncertainty estimates. In contrast, QRF-national shows a 47 % increase in model complexity. This suggests that the performance gains of QRF-national come at the expense of increased computational cost which can be a drawback, especially since QRF stores not only the tree parameters but also samples used for training.</p>

<table-wrap id="T6"><label>Table 6</label><caption><p id="d2e3372">Cumulative number of parameters across all models.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="2">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Model</oasis:entry>
         <oasis:entry colname="col2">Number of parameters</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">QRF-local</oasis:entry>
         <oasis:entry colname="col2">379M</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">QRF-region</oasis:entry>
         <oasis:entry colname="col2">364M</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">QRF-national</oasis:entry>
         <oasis:entry colname="col2">551M</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d2e3426">RF-based algorithms are CPU-intensive and suffer from memory voracity, especially for larger datasets <xref ref-type="bibr" rid="bib1.bibx59" id="text.74"/>. In the case of our study, we had no difficulty fitting QRF-local and QRF-region with an Intel(R) Core(TM) i7-4770 CPU (3.40 <inline-formula><mml:math id="M110" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">GHz</mml:mi></mml:mrow></mml:math></inline-formula>) and 16 <inline-formula><mml:math id="M111" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">GBs</mml:mi></mml:mrow></mml:math></inline-formula> of memory. However, because of memory issues, we trained QRF-national on Jean-Zay HPC, where a single node with two CPUs (at 2.5 <inline-formula><mml:math id="M112" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">GHz</mml:mi></mml:mrow></mml:math></inline-formula>) and 128 <inline-formula><mml:math id="M113" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">GBs</mml:mi></mml:mrow></mml:math></inline-formula> of memory was sufficient. With this configuration, it takes on average 25 <inline-formula><mml:math id="M114" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">min</mml:mi></mml:mrow></mml:math></inline-formula> to fit QRF-national with a single parameter. While training time is longer for multi-site settings, inference/prediction times are very similar to those of QRF-local.</p>
</sec>
</sec>
<sec id="Ch1.S6" sec-type="conclusions">
  <label>6</label><title>Conclusions</title>
      <p id="d2e3483">In this study, we investigated the added value of multi-site learning with a hydrologically informed quantile random forest (QRF) post-processor across a large set of 564 French catchments. Three training setups were proposed – local, regional, and national – which we evaluated with different probabilistic metrics and across various hydrometeorological conditions. Based on reliability, sharpness, and overall metrics, our results indicate that multi-site learning improves QRF uncertainty estimates, with notable enhancements; (i) for overall metrics (CRPSS and WSS) and deterministic metrics (NSE and KGE) (ii) at stations where the local approach provided unreliable uncertainty estimates; and (iii) for high and medium flows, where predictions can be more challenging. These findings corroborate previous studies <xref ref-type="bibr" rid="bib1.bibx18 bib1.bibx5 bib1.bibx1" id="paren.75"/> that found that high-flow events can have similar characteristics in neighbouring catchments. These results suggest that the QRF algorithm in its regional extensions can leverage data from neighbouring catchments to improve its uncertainty estimates; this is particularly advantageous given the off-the-shelf use of available catchment descriptors and the similarity of the learning process between local and regional variants. Additionally, the selection of representative and quality catchment attributes and static features is necessary to achieve the aforementioned improvements. We also found that using a single QRF post-processor for all catchments in the study (QRF-national) provided the best probabilistic predictions, which might indicate that the larger the model the better the uncertainty estimates with QRF. But QRF-national can yield erroneous uncertainty estimates for catchments with significant scale variations in the errors. We argue that this is mainly due to the use of logarithmic transformation of relative errors, which strongly influences hydrological error dynamics at such stations. The use of other transformations and experimenting with other catchments groupings <xref ref-type="bibr" rid="bib1.bibx26" id="paren.76"/> could solve this issue. In addition, larger models are associated with higher computational costs, with increased complexity, and with a larger number of parameters. This is particularly relevant for QRF, as RF-based algorithms are known for their intensive memory use. However, some solutions include the use of GPU-accelerated QRFs <xref ref-type="bibr" rid="bib1.bibx54" id="paren.77"/>.</p>
      <p id="d2e3495">We acknowledge certain limitations related to model-dependent artifacts. In this study, we were able to test QRF variants only using the GR6J hydrological model, as it was the only model for which simulations were available. However, the proposed framework is flexible and can be extended to other hydrological models and states. A comparison with other error models is also an attractive option, this includes standard Autoregressive models, and LSTM based error modelling. These findings highlight the performance enhancements of regional hydrologically informed QRF post-processing, and we aim to explore further in future studies the merits of the proposed QRF framework in both forecasting applications and prediction at ungauged basins.</p>
</sec>

      
      </body>
    <back><app-group>

<app id="App1.Ch1.S1">
  <label>Appendix A</label><title>Overview</title>
      <p id="d2e3509">The Appendix is structured as follows: Appendix <xref ref-type="sec" rid="App1.Ch1.S2"/> compares the overall performance of <inline-formula><mml:math id="M115" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>-Nearest Neighbors model against QRF-local. Appendix <xref ref-type="sec" rid="App1.Ch1.S3"/> investigates the power transformation used to calibrate the hydrological model GR6J. Appendix <xref ref-type="sec" rid="App1.Ch1.S4"/> provides the distribution of the selected hyperparameters for QRF-local and QRF-region variants, Appendix <xref ref-type="sec" rid="App1.Ch1.S5"/> details the used assessment criteria used to compare the uncertainty quantification approaches of the study. Finally, Appendix <xref ref-type="sec" rid="App1.Ch1.S5"/> provides additional results that complement the main paper.</p>
</app>

<app id="App1.Ch1.S2">
  <label>Appendix B</label><title><inline-formula><mml:math id="M116" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>-Nearest Neighbors model as benchmark</title>
<sec id="App1.Ch1.S2.SS1">
  <label>B1</label><title><inline-formula><mml:math id="M117" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>-Nearest Neighbours algorithm</title>
      <p id="d2e3559">We used a naive <inline-formula><mml:math id="M118" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-nearest neighbors approach as a benchmark. The <inline-formula><mml:math id="M119" display="inline"><mml:mi>k</mml:mi></mml:math></inline-formula>-nearest neighbors (K-NN) algorithm is a non-parametric method that makes predictions based on the closest historical examples in the feature space. In hydrology, it is often used to estimate streamflow by averaging the outputs of the k most analogous conditions. Here analogs were used to estimate uncertainty, on a local basis. Table <xref ref-type="table" rid="TB1"/> presents the hyperparameters used for the K-NN algorithm, while Table <xref ref-type="table" rid="TB2"/> compares the approaches K-NN and QRF-local. Average and median (between parentheses) across the catchments of the study, including distributional, interval and deterministic metrics.</p>

<table-wrap id="TB1"><label>Table B1</label><caption><p id="d2e3583">Hyperparameters Set Optimized for K-NN approach.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="2">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Hyperparameter</oasis:entry>
         <oasis:entry colname="col2">Values</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Number of neighbors</oasis:entry>
         <oasis:entry colname="col2">5, 10, 25, 50, 75, 100, 150, 200, 400, 800</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Distance</oasis:entry>
         <oasis:entry colname="col2">Uniform, distance</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>


</sec>
<sec id="App1.Ch1.S2.SS2">
  <label>B2</label><title><inline-formula><mml:math id="M120" display="inline"><mml:mi>K</mml:mi></mml:math></inline-formula>-Nearest Neighbours and QRF-local comparison</title>
      <p id="d2e3645">Table <xref ref-type="table" rid="TB2"/> compares the average and median performance metrics of QRF-local and K-NN across the 564 catchments. Overall, QRF-local consistently outperforms K-NN across all evaluated metrics. It achieves higher alpha scores, improved dispersion scores, and better CRPSS values, indicating both more reliable and more skillful probabilistic predictions. These results suggest that QRF-local leverages dynamic input features more effectively than the simpler K-NN approach.</p>

<table-wrap id="TB2"><label>Table B2</label><caption><p id="d2e3654">Average performance scores across the 564 catchments for QRF-local and K-NN. Median scores are shown in parentheses. Bold values indicate better performance for each metric.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="9">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:colspec colnum="6" colname="col6" align="left"/>
     <oasis:colspec colnum="7" colname="col7" align="left"/>
     <oasis:colspec colnum="8" colname="col8" align="left"/>
     <oasis:colspec colnum="9" colname="col9" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Model</oasis:entry>
         <oasis:entry colname="col2">Alpha score</oasis:entry>
         <oasis:entry colname="col3">Dispersion score</oasis:entry>
         <oasis:entry colname="col4">CRPSS</oasis:entry>
         <oasis:entry colname="col5">CR90.0</oasis:entry>
         <oasis:entry colname="col6">AWSS</oasis:entry>
         <oasis:entry colname="col7">WSS</oasis:entry>
         <oasis:entry colname="col8">NSE</oasis:entry>
         <oasis:entry colname="col9">KGE</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">K-NN</oasis:entry>
         <oasis:entry colname="col2">0.771 (0.798)</oasis:entry>
         <oasis:entry colname="col3">0.587 (0.63)</oasis:entry>
         <oasis:entry colname="col4">0.655 (0.674)</oasis:entry>
         <oasis:entry colname="col5">0.835 (0.848)</oasis:entry>
         <oasis:entry colname="col6">0.678 (0.702)</oasis:entry>
         <oasis:entry colname="col7">0.698 (0.72)</oasis:entry>
         <oasis:entry colname="col8">0.825 (0.85)</oasis:entry>
         <oasis:entry colname="col9">0.812 (0.835)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">QRF_local</oasis:entry>
         <oasis:entry colname="col2"><bold>0.799 (0.824)</bold></oasis:entry>
         <oasis:entry colname="col3"><bold>0.593 (0.636)</bold></oasis:entry>
         <oasis:entry colname="col4"><bold>0.674 (0.686)</bold></oasis:entry>
         <oasis:entry colname="col5"><bold>0.860 (0.871)</bold></oasis:entry>
         <oasis:entry colname="col6"><bold>0.680 (0.709)</bold></oasis:entry>
         <oasis:entry colname="col7"><bold>0.715 (0.731)</bold></oasis:entry>
         <oasis:entry colname="col8"><bold>0.832 (0.856)</bold></oasis:entry>
         <oasis:entry colname="col9"><bold>0.822 (0.842)</bold></oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
</app>

<app id="App1.Ch1.S3">
  <label>Appendix C</label><title>Prior transformations on streamflow</title>
      <p id="d2e3798">The power transformations are applied to the target variable – both observed and simulated streamflows – before calculating the associated KGE criteria (Eq. <xref ref-type="disp-formula" rid="App1.Ch1.S3.E3"/>). For example square root transformation aim at increasing the weight of errors for specific hydrological regimes. The use of streamflow transformations for model calibration is further investigated in the recent study of <xref ref-type="bibr" rid="bib1.bibx63" id="text.78"/>

              <disp-formula specific-use="gather" content-type="numbered"><mml:math id="M121" display="block"><mml:mtable displaystyle="true"><mml:mlabeledtr id="App1.Ch1.S3.E3"><mml:mtd><mml:mtext>C1</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mtext mathvariant="normal">KGE</mml:mtext><mml:mo>(</mml:mo><mml:msubsup><mml:mi>Q</mml:mi><mml:mtext>obs</mml:mtext><mml:mi mathvariant="normal">p</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>Q</mml:mi><mml:mtext>prd</mml:mtext><mml:mi mathvariant="normal">p</mml:mi></mml:msubsup><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:msqrt><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="italic">α</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="italic">β</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S3.E4"><mml:mtd><mml:mtext>C2</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mtext>Cov</mml:mtext><mml:mo>(</mml:mo><mml:msubsup><mml:mi>Q</mml:mi><mml:mtext>obs</mml:mtext><mml:mi mathvariant="normal">p</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>Q</mml:mi><mml:mtext>prd</mml:mtext><mml:mi mathvariant="normal">p</mml:mi></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mtext>obs</mml:mtext></mml:msub><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mtext>prd</mml:mtext></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S3.E5"><mml:mtd><mml:mtext>C3</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle displaystyle="true" class="stylechange"/><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mtext>prd</mml:mtext></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mtext>obs</mml:mtext></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mtd></mml:mlabeledtr><mml:mlabeledtr id="App1.Ch1.S3.E6"><mml:mtd><mml:mtext>C4</mml:mtext></mml:mtd><mml:mtd><mml:mrow><mml:mstyle class="stylechange" displaystyle="true"/><mml:mi mathvariant="italic">β</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mtext>prd</mml:mtext></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mtext>obs</mml:mtext></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mtd></mml:mlabeledtr></mml:mtable></mml:math></disp-formula></p>
      <p id="d2e3992">Where: <inline-formula><mml:math id="M122" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> is the Pearson correlation coefficient, <inline-formula><mml:math id="M123" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula> is the variability ratio, <inline-formula><mml:math id="M124" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula> is the bias ratio. <inline-formula><mml:math id="M125" display="inline"><mml:mi mathvariant="italic">μ</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M126" display="inline"><mml:mi mathvariant="italic">σ</mml:mi></mml:math></inline-formula> represents the mean and standard deviation of the transformed streamflow time series.</p>
</app>

<app id="App1.Ch1.S4">
  <label>Appendix D</label><title>Distribution of QRF hyperparameters for QRF-local and -region</title>
      <p id="d2e4039">Figures <xref ref-type="fig" rid="FD1"/> and <xref ref-type="fig" rid="FD2"/> show the distribution of hyperparameters for QRF-local and QRF-region, respectively. Unlike the national approach, which uses a single set of hyperparameters, QRF-local assigns one set per catchment, while QRF-region uses one set per region. Overall, hyperparameters vary across catchments and regions. Important variability is observed for the minimum number of samples at child nodes, which which affects trees depth. The values of this hyperparameter were mostly skewed toward lower ranges for both QRF-local and QRF-region. The response of QRF-local to the other hyperparameters was less discriminative. For QRF-local, the selected values for the number of trees and features per split remained largely consistent across the values tested. For QRF-region, 200 trees were most often selected, while 8 features and the default square root (6 features per split) were the most frequent optimal value for the number of features per split hyperparameter.</p>

      <fig id="FD1"><label>Figure D1</label><caption><p id="d2e4048">Distribution of the selected hyperparameters for QRF-local across the catchements of the study. The hyperparameters are: (i) The number of trees <inline-formula><mml:math id="M127" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>, (ii) The minimum number of samples at child nodes <inline-formula><mml:math id="M128" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula>, and (iii) The number of features per split, <inline-formula><mml:math id="M129" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026-f11.png"/>

      </fig>

      <fig id="FD2" specific-use="star"><label>Figure D2</label><caption><p id="d2e4082">Distribution of the selected hyperparameters for QRF-region across the hydroclimatological catchment groups used in the  study. The hyperparameters are: (i) The number of trees <inline-formula><mml:math id="M130" display="inline"><mml:mi>n</mml:mi></mml:math></inline-formula>, (ii) The minimum number of samples at child nodes <inline-formula><mml:math id="M131" display="inline"><mml:mi>m</mml:mi></mml:math></inline-formula>, and (iii) The number of features per split, <inline-formula><mml:math id="M132" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula>.</p></caption>
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026-f12.png"/>

      </fig>


</app>

<app id="App1.Ch1.S5">
  <label>Appendix E</label><title>Assessment criteria</title>
<sec id="App1.Ch1.S5.SS1">
  <label>E1</label><title>Continuous ranked probability score</title>
      <p id="d2e4129">Given a univariate predictive distribution <inline-formula><mml:math id="M133" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula> and a corresponding realization <inline-formula><mml:math id="M134" display="inline"><mml:mi>y</mml:mi></mml:math></inline-formula>, the continuous ranked probability score (CRPS) is defined as:

            <disp-formula id="App1.Ch1.S5.E7" content-type="numbered"><label>E1</label><mml:math id="M135" display="block"><mml:mrow><mml:mtext>CRPS</mml:mtext><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">∫</mml:mo><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow><mml:mrow><mml:mo>+</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:munderover><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mo>(</mml:mo><mml:mi>u</mml:mi><mml:mo>)</mml:mo><mml:mo>-</mml:mo><mml:mi>H</mml:mi><mml:mo>(</mml:mo><mml:mi>u</mml:mi><mml:mo>-</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mi>u</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M136" display="inline"><mml:mi>H</mml:mi></mml:math></inline-formula> is the Heaviside function such that <inline-formula><mml:math id="M137" display="inline"><mml:mrow><mml:mi>H</mml:mi><mml:mo>(</mml:mo><mml:mi>u</mml:mi><mml:mo>-</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> if <inline-formula><mml:math id="M138" display="inline"><mml:mrow><mml:mi>u</mml:mi><mml:mo>≥</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:math></inline-formula> and 0 otherwise. In this study, the probabilistic predictions are in the form of draws distributions; hence, Eq. (<xref ref-type="disp-formula" rid="App1.Ch1.S5.E7"/>) has to be discretized for computation. We apply the method which is implemented in the function “CRPS_FROM_ECDF” from the Python package EvalHyd <xref ref-type="bibr" rid="bib1.bibx25" id="paren.79"/>. The CRPS is negatively oriented, meaning that smaller values are better.</p>
</sec>
<sec id="App1.Ch1.S5.SS2">
  <label>E2</label><title>Skill score</title>
      <p id="d2e4271">The performance of predictions can be more easily compared with that of a reference prediction skill scores. Skill scores (SS) are used to assess the relative quality of two predictions. They are generally defined as: SS it is generally defined as:

            <disp-formula id="App1.Ch1.S5.E8" content-type="numbered"><label>E2</label><mml:math id="M139" display="block"><mml:mrow><mml:mtext>SS</mml:mtext><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mover accent="true"><mml:mi>S</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mover accent="true"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mtext>ref</mml:mtext></mml:msub></mml:mrow><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:mfrac></mml:mstyle></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M140" display="inline"><mml:mover accent="true"><mml:mi>S</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:math></inline-formula> and <inline-formula><mml:math id="M141" display="inline"><mml:mover accent="true"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mtext>ref</mml:mtext></mml:msub></mml:mrow><mml:mo mathvariant="normal">‾</mml:mo></mml:mover></mml:math></inline-formula> are the scores of predictions from the model to evaluate and the reference model, respectively. Climatology is commonly used as a reference. In this study, we consider climatological distributions of observed streamflows. It has been estimated as the empirical distribution of discharges across the training periods (<inline-formula><mml:math id="M142" display="inline"><mml:mrow><mml:mi>P</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>).</p>
</sec>
<sec id="App1.Ch1.S5.SS3">
  <label>E3</label><title>Alpha score</title>
      <p id="d2e4346">Given a univariate forecast distribution <inline-formula><mml:math id="M143" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and a corresponding realization <inline-formula><mml:math id="M144" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, the <inline-formula><mml:math id="M145" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> value is <inline-formula><mml:math id="M146" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mi>p</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>Y</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>≤</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and the alpha score is defined as:

            <disp-formula id="App1.Ch1.S5.E9" content-type="numbered"><label>E3</label><mml:math id="M147" display="block"><mml:mrow><mml:msubsup><mml:mi mathvariant="italic">α</mml:mi><mml:mi>y</mml:mi><mml:mo>′</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:munderover><mml:mfenced open="|" close="|"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>y</mml:mi><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msub><mml:mo>-</mml:mo><mml:msubsup><mml:mi>p</mml:mi><mml:mrow><mml:mi>y</mml:mi><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:mfenced></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M148" display="inline"><mml:mrow><mml:msubsup><mml:mi>p</mml:mi><mml:mi>y</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M149" display="inline"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:msup><mml:mo>)</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mtext>th</mml:mtext><mml:mo>)</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> are the <inline-formula><mml:math id="M150" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>th observed and theoretical <inline-formula><mml:math id="M151" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> values of <inline-formula><mml:math id="M152" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> values. <inline-formula><mml:math id="M153" display="inline"><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the number of <inline-formula><mml:math id="M154" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> values. The alpha score takes values between 0 and 1. It is positively oriented, with scores close to 1 reflecting perfect reliability.</p>
      <p id="d2e4588">The performance of streamflow forecasts can vary depending on the flow range considered (e.g., flood forecasting vs. drought forecasting). <xref ref-type="bibr" rid="bib1.bibx3" id="text.80"/> suggest a forecast-based sample stratification for continuous scalar variables in order to consider the merits of streamflow forecasts on different ranges of flows. To ensure robust reliability estimates and prevent potential compensation effects, the alpha score was calculated separately for three distinct flow ranges: low, high, and average forecasted flows.</p>
</sec>
<sec id="App1.Ch1.S5.SS4">
  <label>E4</label><title>Dispersion score</title>
      <p id="d2e4602">Sharpness is quantiﬁed with the skill score of the forecast CRPS of median forecasts relative to climatological streamﬂow distribution, in which CRPS median is defined as follows:

            <disp-formula id="App1.Ch1.S5.E10" content-type="numbered"><label>E4</label><mml:math id="M155" display="block"><mml:mrow><mml:msub><mml:mtext>CRPS</mml:mtext><mml:mtext>median</mml:mtext></mml:msub><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mo>)</mml:mo><mml:mo>=</mml:mo><mml:mtext>CRPS</mml:mtext><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mtext>median</mml:mtext></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M156" display="inline"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mtext>median</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> is the median value of the distribution <inline-formula><mml:math id="M157" display="inline"><mml:mi>F</mml:mi></mml:math></inline-formula>. This is related to an interesting decomposition of the CRPS proposed in the PhD thesis of <xref ref-type="bibr" rid="bib1.bibx6" id="text.81"/></p>
</sec>
<sec id="App1.Ch1.S5.SS5">
  <label>E5</label><title>Coverage ratio</title>
      <p id="d2e4668">Given a predictive interval <inline-formula><mml:math id="M158" display="inline"><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> at a confidence level <inline-formula><mml:math id="M159" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:mi mathvariant="italic">α</mml:mi></mml:mrow></mml:math></inline-formula> and a corresponding observation <inline-formula><mml:math id="M160" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, the coverage ratio (CR) is defined as:

            <disp-formula id="App1.Ch1.S5.E11" content-type="numbered"><label>E5</label><mml:math id="M161" display="block"><mml:mrow><mml:mtext>CR</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mn mathvariant="bold">1</mml:mn><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>∈</mml:mo><mml:mo>[</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>]</mml:mo><mml:mo mathvariant="italic">}</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          where <inline-formula><mml:math id="M162" display="inline"><mml:mrow><mml:mn mathvariant="bold">1</mml:mn><mml:mo mathvariant="italic">{</mml:mo><mml:mo>⋅</mml:mo><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:math></inline-formula> is the indicator function and <inline-formula><mml:math id="M163" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> is the number of observations.</p>
</sec>
<sec id="App1.Ch1.S5.SS6">
  <label>E6</label><title>Average width</title>
      <p id="d2e4806">The average width (AW) of the prediction intervals is given by:

            <disp-formula id="App1.Ch1.S5.E12" content-type="numbered"><label>E6</label><mml:math id="M164" display="block"><mml:mrow><mml:mtext>AW</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mo>(</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>

          quantifying the sharpness of the probabilistic forecasts.</p>
</sec>
<sec id="App1.Ch1.S5.SS7">
  <label>E7</label><title>Winkler score</title>
      <p id="d2e4863">The Winkler score (WS) penalizes both miscoverage and interval width and is defined as:

            <disp-formula id="App1.Ch1.S5.E13" content-type="numbered"><label>E7</label><mml:math id="M165" display="block"><mml:mtable class="split" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:msub><mml:mtext>WS</mml:mtext><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>=</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mo>(</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mo>+</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">α</mml:mi></mml:mfrac></mml:mstyle><mml:mo>(</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="bold">1</mml:mn><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>&lt;</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo mathvariant="italic">}</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd/><mml:mtd><mml:mrow><mml:mo>+</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">2</mml:mn><mml:mi mathvariant="italic">α</mml:mi></mml:mfrac></mml:mstyle><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>)</mml:mo><mml:mn mathvariant="bold">1</mml:mn><mml:mo mathvariant="italic">{</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo>&gt;</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mi mathvariant="normal">t</mml:mi></mml:msub><mml:mo mathvariant="italic">}</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>

          and the overall score is averaged over all observations: <inline-formula><mml:math id="M166" display="inline"><mml:mrow><mml:mtext>WS</mml:mtext><mml:mo>=</mml:mo><mml:mstyle displaystyle="false"><mml:mfrac style="text"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:msubsup><mml:msub><mml:mtext>WS</mml:mtext><mml:mi mathvariant="normal">t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>.</p>
</sec>
</app>

<app id="App1.Ch1.S6">
  <label>Appendix F</label><title>Results supplement</title>

      <fig id="FF1"><label>Figure F1</label><caption><p id="d2e5039">Box plots of 95 % interval-based metrics across the 564 catchments of the study. Blue for the performance of QRF-local, orange for QRF-region, and green for QRF-national.</p></caption>
        
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026-f13.png"/>

      </fig>

      <fig id="FF2"><label>Figure F2</label><caption><p id="d2e5052">CRPSS metric across the 564 catchments. The blue line represents the performance of QRF-basic, orange represents QRF-local.</p></caption>
        <graphic xlink:href="https://hess.copernicus.org/articles/30/3549/2026/hess-30-3549-2026-f14.png"/>

      </fig>


</app>
  </app-group><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d2e5067">The quantile-forest package is available at <uri>https://pypi.org/project/quantile-forest</uri> (last access: 10 September 2025). The airGR package can be downloaded from CRAN repositories using the following identifier: <ext-link xlink:href="https://doi.org/10.32614/CRAN.package.airGR" ext-link-type="DOI">10.32614/CRAN.package.airGR</ext-link> <xref ref-type="bibr" rid="bib1.bibx12" id="paren.82"/>. The evalhyd-python package can be downloaded from the HAL open archive using the following identifier: hal-04088473. The CAMELS-FR dataset can be downloaded from the Recherche Data Gouv repository using the following identifier: <ext-link xlink:href="https://doi.org/10.57745/WH7FJR" ext-link-type="DOI">10.57745/WH7FJR</ext-link> <xref ref-type="bibr" rid="bib1.bibx13" id="paren.83"/>.</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d2e5088">TE carried out the experiments and wrote the manuscript with support from FB; CP and VA helped review the manuscript and supervise the project.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d2e5094">The contact author has declared that none of the authors has any competing interests.</p>
  </notes><notes notes-type="disclaimer"><title>Disclaimer</title>

      <p id="d2e5100">Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. The authors bear the ultimate responsibility for providing appropriate place names. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.</p>
  </notes><ack><title>Acknowledgements</title><p id="d2e5106">We gratefully acknowledge Météo-France for providing the weather data and SCHAPI for the streamflow data. We would also like to thank the PREMHYCE and CIPRHES projects, OFB, INRAE and SCHAPI for their financial support to the first author, which made this research possible. This work was performed using HPC resources from GENCI-IDRIS (grant no. 2024-AD011013991R2).</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d2e5111">This research has been supported by the Agence National de la Recherche (ANR) (grant no. ANR-20-CE04-0009).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d2e5118">This paper was edited by Albrecht Weerts and reviewed by Derek Karssenberg and one anonymous referee.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><label>Auer et al.(2024)Auer, Gauch, Kratzert, Nearing, Hochreiter, and Klotz</label><mixed-citation>Auer, A., Gauch, M., Kratzert, F., Nearing, G., Hochreiter, S., and Klotz, D.: A data-centric perspective on the information needed for hydrological uncertainty predictions, Hydrol. Earth Syst. Sci., 28, 4099–4126, <ext-link xlink:href="https://doi.org/10.5194/hess-28-4099-2024" ext-link-type="DOI">10.5194/hess-28-4099-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx2"><label>Bates and Campbell(2001)</label><mixed-citation> Bates, B. C. and Campbell, E. P.: A Markov chain Monte Carlo scheme for parameter estimation and inference in conceptual rainfall-runoff modeling, Water Resour. Res., 37, 937–947, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx3"><label>Bellier et al.(2017)Bellier, Zin, and Bontron</label><mixed-citation>Bellier, J., Zin, I., and Bontron, G.: Sample stratification in verification of ensemble forecasts of continuous scalar variables: Potential benefits and pitfalls, Mon. Weather Rev., 145, 3529–3544, <ext-link xlink:href="https://doi.org/10.1175/MWR-D-16-0487.1" ext-link-type="DOI">10.1175/MWR-D-16-0487.1</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx4"><label>Bennett et al.(2021)Bennett, Robertson, Wang, Li, and Perraud</label><mixed-citation>Bennett, J. C., Robertson, D. E., Wang, Q. J., Li, M., and Perraud, J.-M.: Propagating reliable estimates of hydrological forecast uncertainty to many lead times, J. Hydrol., 603, 126798, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2021.126798" ext-link-type="DOI">10.1016/j.jhydrol.2021.126798</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx5"><label>Bertola et al.(2023)Bertola, Blöschl, Bohac, Borga, Castellarin, Chirico, Claps, Dallan, Danilovich, Ganora et al.</label><mixed-citation> Bertola, M., Blöschl, G., Bohac, M., Borga, M., Castellarin, A., Chirico, G. B., Claps, P., Dallan, E., Danilovich, I., Ganora, D., Gorbachova, L., Ledvinka, O., Mavrova-Guirguinova, M., Montanari, A., Ovcharuk, V., Viglione, A., Volpi, E., Arheimer, B., Aronica, G. T., Bonacci, O., Čanjevac, I., Csik, A., Frolova, N., Gnandt, B., Gribovszki, Z., Gül, A., Günther, K., Guse, B., Hannaford, J., Harrigan, S., Kireeva, M., Kohnová, S., Komma, J., Kriauciuniene, J., Kronvang, B., Lawrence, D., Lüdtke, S., Mediero, L., Merz, B., Molnar, P., Murphy, C., Oskoruš, D., Osuch, M., Parajka, J., Pfister, L., Radevski, I., Sauquet, E., Schröter, K., Šraj, M., Szolgay, J., Turner, S., Valent, P., Veijalainen, N., Ward, P. J., Willems, P., and Zivkovic, N.: Megafloods in Europe can be anticipated from observations in hydrologically similar catchments, Nat. Geosci., 16, 982–988, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx6"><label>Bontron(2004)</label><mixed-citation> Bontron, G.: Prévision quantitative des précipitations: Adaptation probabiliste par recherche d'analogues. Utilisation des Réanalyses NCEP/NCAR et application aux précipitations du Sud-Est de la France, PhD thesis, Institut National Polytechnique Grenoble (INPG), 2004.</mixed-citation></ref>
      <ref id="bib1.bibx7"><label>Bourgin et al.(2015)Bourgin, Andréassian, Perrin, and Oudin</label><mixed-citation>Bourgin, F., Andréassian, V., Perrin, C., and Oudin, L.: Transferring global uncertainty estimates from gauged to ungauged catchments, Hydrol. Earth Syst. Sci., 19, 2535–2546, <ext-link xlink:href="https://doi.org/10.5194/hess-19-2535-2015" ext-link-type="DOI">10.5194/hess-19-2535-2015</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx8"><label>Breiman(2001)</label><mixed-citation> Breiman, L.: Random forests, Mach. Learn., 45, 5–32, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx9"><label>Breiman et al.(2017)Breiman, Friedman, Olshen, and Stone</label><mixed-citation>Breiman, L., Friedman, J., Olshen, R. A., and Stone, C. J.: Classification and regression trees, Routledge, <ext-link xlink:href="https://doi.org/10.1201/9781315139470" ext-link-type="DOI">10.1201/9781315139470</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx10"><label>Coron et al.(2017)Coron, Thirel, Delaigue, Perrin, and Andréassian</label><mixed-citation>Coron, L., Thirel, G., Delaigue, O., Perrin, C., and Andréassian, V.: The Suite of Lumped GR Hydrological Models in an R package, Environ. Modell. Softw., 94, 166–171, <ext-link xlink:href="https://doi.org/10.1016/j.envsoft.2017.05.002" ext-link-type="DOI">10.1016/j.envsoft.2017.05.002</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx11"><label>Coron et al.(2023)Coron, Delaigue, Thirel, Dorchies, Perrin, and Michel</label><mixed-citation>Coron, L., Delaigue, O., Thirel, G., Dorchies, D., Perrin, C., and Michel, C.: airGR: Suite of GR Hydrological Models for Precipitation-Runoff Modelling, r package version 1.7.4, <ext-link xlink:href="https://doi.org/10.15454/EX11NA" ext-link-type="DOI">10.15454/EX11NA</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx12"><label>Coron et al.(2025)</label><mixed-citation>Coron, L., Delaigue, O., Thirel, G., Dorchies, D., Perrin, C., Michel, C., Andréassian, V., Bourgin, F., Brigode, P., Le Moine, N., Mathevet, T., Mouelhi, S., Oudin, L., Pushpalatha, R., and Valéry, A.: airGR: Suite of GR Hydrological Models for Precipitation-Runoff Modelling Version 1.7.8, CRAN [code], <ext-link xlink:href="https://doi.org/10.32614/CRAN.package.airGR" ext-link-type="DOI">10.32614/CRAN.package.airGR</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx13"><label>Delaigue et al.(2024)</label><mixed-citation>Delaigue, O., Guimarães, G. M., Brigode, P., Génot, B., Perrin, C., and Andréassian, V.: CAMELS-FR dataset, Recherche Data Gouv, V3 [data set], <ext-link xlink:href="https://doi.org/10.57745/WH7FJR" ext-link-type="DOI">10.57745/WH7FJR</ext-link>,  2024.</mixed-citation></ref>
      <ref id="bib1.bibx14"><label>Delaigue et al.(2025)Delaigue, Guimarães, Brigode, Génot, Perrin, Soubeyroux, Janet, Addor, and Andréassian</label><mixed-citation>Delaigue, O., Guimarães, G. M., Brigode, P., Génot, B., Perrin, C., Soubeyroux, J.-M., Janet, B., Addor, N., and Andréassian, V.: CAMELS-FR dataset: a large-sample hydroclimatic dataset for France to explore hydrological diversity and support model benchmarking, Earth Syst. Sci. Data, 17, 1461–1479, <ext-link xlink:href="https://doi.org/10.5194/essd-17-1461-2025" ext-link-type="DOI">10.5194/essd-17-1461-2025</ext-link>, 2025.</mixed-citation></ref>
      <ref id="bib1.bibx15"><label>Delle Monache et al.(2013)Delle Monache, Eckel, Rife, Nagarajan, and Searight</label><mixed-citation> Delle Monache, L., Eckel, F. A., Rife, D. L., Nagarajan, B., and Searight, K.: Probabilistic weather prediction with an analog ensemble, Mon. Weather Rev., 141, 3498–3516, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx16"><label>Dufeu et al.(2022)Dufeu, Mougin, Foray, Baillon, Lamblin, Hebrard, Chaleon, Romon, Cobos, Gouin et al.</label><mixed-citation>Dufeu, E., Mougin, F., Foray, A., Baillon, M., Lamblin, R., Hebrard, F., Chaleon, C., Romon, S., Cobos, L., Gouin, P., Audouy, J.-N., Martin, R., and Poligot-Pitsch, S.: Finalisation de l’opération HYDRO 3 de modernisation du système d’information national des données hydrométriques, LHB, 108, 2099317, <ext-link xlink:href="https://doi.org/10.1080/27678490.2022.2099317" ext-link-type="DOI">10.1080/27678490.2022.2099317</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx17"><label>Evin et al.(2014)Evin, Thyer, Kavetski, McInerney, and Kuczera</label><mixed-citation> Evin, G., Thyer, M., Kavetski, D., McInerney, D., and Kuczera, G.: Comparison of joint versus postprocessor approaches for hydrological uncertainty estimation accounting for error autocorrelation and heteroscedasticity, Water Resour. Res., 50, 2350–2375, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx18"><label>Fang et al.(2024)Fang, Johnson, Yeghiazarian, and Sankarasubramanian</label><mixed-citation>Fang, S., Johnson, J. M., Yeghiazarian, L., and Sankarasubramanian, A.: Improved national-scale above-normal flow prediction for gauged and ungauged basins using a spatio-temporal hierarchical model, Water Resour. Res., 60, e2023WR034557, <ext-link xlink:href="https://doi.org/10.1029/2023WR034557" ext-link-type="DOI">10.1029/2023WR034557</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx19"><label>Georgakakos et al.(2004)Georgakakos, Seo, Gupta, Schaake, and Butts</label><mixed-citation> Georgakakos, K. P., Seo, D.-J., Gupta, H., Schaake, J., and Butts, M. B.: Towards the characterization of streamflow simulation uncertainty through multimodel ensembles, J. Hydrol., 298, 222–241, 2004.</mixed-citation></ref>
      <ref id="bib1.bibx20"><label>Gneiting et al.(2005)Gneiting, Raftery, Westveld, and Goldman</label><mixed-citation>Gneiting, T., Raftery, A. E., Westveld, A. H., and Goldman, T.: Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation, Mon. Weather Rev., 133, 1098–1118, <ext-link xlink:href="https://doi.org/10.1175/MWR2904.1" ext-link-type="DOI">10.1175/MWR2904.1</ext-link>, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx21"><label>Gneiting et al.(2007)Gneiting, Balabdaoui, and Raftery</label><mixed-citation> Gneiting, T., Balabdaoui, F., and Raftery, A. E.: Probabilistic forecasts, calibration and sharpness, J. Roy. Stat. Soc. B, 69, 243–268, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx22"><label>Golian et al.(2021)Golian, Murphy, and Meresa</label><mixed-citation>Golian, S., Murphy, C., and Meresa, H.: Regionalization of hydrological models for flow estimation in ungauged catchments in Ireland, Journal of Hydrology: Regional Studies, 36, 100859, <ext-link xlink:href="https://doi.org/10.1016/j.ejrh.2021.100859" ext-link-type="DOI">10.1016/j.ejrh.2021.100859</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx23"><label>Gupta et al.(2024)Gupta, Hantush, Govindaraju, and Beven</label><mixed-citation>Gupta, A., Hantush, M. M., Govindaraju, R. S., and Beven, K.: Evaluation of hydrological models at gauged and ungauged basins using machine learning-based limits-of-acceptability and hydrological signatures, J. Hydrol., 641, 131774, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2024.131774" ext-link-type="DOI">10.1016/j.jhydrol.2024.131774</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx24"><label>Gupta et al.(2009)Gupta, Kling, Yilmaz, and Martinez</label><mixed-citation> Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.: Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling, J. Hydrol., 377, 80–91, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx25"><label>Hallouin et al.(2024)Hallouin, Bourgin, Perrin, Ramos, and Andréassian</label><mixed-citation>Hallouin, T., Bourgin, F., Perrin, C., Ramos, M.-H., and Andréassian, V.: EvalHyd v0.1.2: a polyglot tool for the evaluation of deterministic and probabilistic streamflow predictions, Geosci. Model Dev., 17, 4561–4578, <ext-link xlink:href="https://doi.org/10.5194/gmd-17-4561-2024" ext-link-type="DOI">10.5194/gmd-17-4561-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx26"><label>Hashemi et al.(2022)Hashemi, Brigode, Garambois, and Javelle</label><mixed-citation>Hashemi, R., Brigode, P., Garambois, P.-A., and Javelle, P.: How can we benefit from regime information to make more effective use of long short-term memory (LSTM) runoff models?, Hydrol. Earth Syst. Sci., 26, 5793–5816, <ext-link xlink:href="https://doi.org/10.5194/hess-26-5793-2022" ext-link-type="DOI">10.5194/hess-26-5793-2022</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx27"><label>Hastie et al.(2001)Hastie, Tibshirani, and Friedman</label><mixed-citation>Hastie, T., Tibshirani, R., and Friedman, J.: The Elements of Statistical Learning, Springer Series in Statistics, Springer New York Inc., New York, NY, USA, <ext-link xlink:href="https://doi.org/10.1007/978-0-387-84858-7" ext-link-type="DOI">10.1007/978-0-387-84858-7</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx28"><label>Hu et al.(2023)Hu, Cervone, Young, and Delle Monache</label><mixed-citation> Hu, W., Cervone, G., Young, G., and Delle Monache, L.: Machine learning weather analogs for near-surface variables, Bound.-Lay. Meteorol., 186, 711–735, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx29"><label>Hwang et al.(2019)Hwang, Orenstein, Cohen, Pfeiffer, and Mackey</label><mixed-citation>Hwang, J., Orenstein, P., Cohen, J., Pfeiffer, K., and Mackey, L.: Improving subseasonal forecasting in the western US with machine learning, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining, 2325–2335, <ext-link xlink:href="https://doi.org/10.1145/3292500.3330674" ext-link-type="DOI">10.1145/3292500.3330674</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx30"><label>Jehn et al.(2020)Jehn, Bestian, Breuer, Kraft, and Houska</label><mixed-citation>Jehn, F. U., Bestian, K., Breuer, L., Kraft, P., and Houska, T.: Using hydrological and climatic catchment clusters to explore drivers of catchment behavior, Hydrol. Earth Syst. Sci., 24, 1081–1100, <ext-link xlink:href="https://doi.org/10.5194/hess-24-1081-2020" ext-link-type="DOI">10.5194/hess-24-1081-2020</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx31"><label>Johnson et al.(2023)Johnson, Fang, Sankarasubramanian, Rad, Kindl da Cunha, Jennings, Clarke, Mazrooei, and Yeghiazarian</label><mixed-citation>Johnson, J. M., Fang, S., Sankarasubramanian, A., Rad, A. M., Kindl da Cunha, L., Jennings, K. S., Clarke, K. C., Mazrooei, A., and Yeghiazarian, L.: Comprehensive analysis of the NOAA National Water Model: A call for heterogeneous formulations and diagnostic model selection, J. Geophys. Res.-Atmos., 128, e2023JD038534, <ext-link xlink:href="https://doi.org/10.1029/2023JD038534" ext-link-type="DOI">10.1029/2023JD038534</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx32"><label>Johnson(2024)</label><mixed-citation>Johnson, R. A.: quantile-forest: A Python Package for Quantile Regression Forests, Journal of Open Source Software, 9, 5976, <ext-link xlink:href="https://doi.org/10.21105/joss.05976" ext-link-type="DOI">10.21105/joss.05976</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx33"><label>Kling et al.(2012)Kling, Fuchs, and Paulin</label><mixed-citation> Kling, H., Fuchs, M., and Paulin, M.: Runoff conditions in the upper Danube basin under an ensemble of climate change scenarios, J. Hydrol., 424, 264–277, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx34"><label>Kratzert et al.(2018)Kratzert, Klotz, Brenner, Schulz, and Herrnegger</label><mixed-citation>Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005–6022, <ext-link xlink:href="https://doi.org/10.5194/hess-22-6005-2018" ext-link-type="DOI">10.5194/hess-22-6005-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx35"><label>Kratzert et al.(2024)Kratzert, Gauch, Klotz, and Nearing</label><mixed-citation>Kratzert, F., Gauch, M., Klotz, D., and Nearing, G.: HESS Opinions: Never train a Long Short-Term Memory (LSTM) network on a single basin, Hydrol. Earth Syst. Sci., 28, 4187–4201, <ext-link xlink:href="https://doi.org/10.5194/hess-28-4187-2024" ext-link-type="DOI">10.5194/hess-28-4187-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx36"><label>Krzysztofowicz(1999)</label><mixed-citation> Krzysztofowicz, R.: Bayesian theory of probabilistic forecasting via deterministic hydrologic model, Water Resour. Res., 35, 2739–2750, 1999.</mixed-citation></ref>
      <ref id="bib1.bibx37"><label>Kuczera and Parent(1998)</label><mixed-citation> Kuczera, G. and Parent, E.: Monte Carlo assessment of parameter uncertainty in conceptual catchment models: the Metropolis algorithm, J. Hydrol., 211, 69–85, 1998.</mixed-citation></ref>
      <ref id="bib1.bibx38"><label>Leleu et al.(2014)Leleu, Tonnelier, Puechberty, Gouin, Viquendi, Cobos, Foray, Baillon, and Ndima</label><mixed-citation>Leleu, I., Tonnelier, I., Puechberty, R., Gouin, P., Viquendi, I., Cobos, L., Foray, A., Baillon, M., and Ndima, P.-O.: La refonte du système d'information national pour la gestion et la mise à disposition des données hydrométriques, La Houille Blanche, 100, 25–32, <ext-link xlink:href="https://doi.org/10.1051/lhb/2014004" ext-link-type="DOI">10.1051/lhb/2014004</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx39"><label>Li et al.(2016)Li, Wang, Bennett, and Robertson</label><mixed-citation>Li, M., Wang, Q. J., Bennett, J. C., and Robertson, D. E.: Error reduction and representation in stages (ERRIS) in hydrological modelling for ensemble streamflow forecasting, Hydrol. Earth Syst. Sci., 20, 3561–3579, <ext-link xlink:href="https://doi.org/10.5194/hess-20-3561-2016" ext-link-type="DOI">10.5194/hess-20-3561-2016</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx40"><label>Louppe(2014)</label><mixed-citation> Louppe, G.: Understanding random forests: From theory to practice, Universite de Liege (Belgium), 2014.</mixed-citation></ref>
      <ref id="bib1.bibx41"><label>Magni et al.(2023)Magni, Sutanudjaja, Shen, and Karssenberg</label><mixed-citation> Magni, M., Sutanudjaja, E. H., Shen, Y., and Karssenberg, D.: Global streamflow modelling using process-informed machine learning, J. Hydroinform., 25, 1648–1666, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx42"><label>McInerney et al.(2019)McInerney, Kavetski, Thyer, Lerat, and Kuczera</label><mixed-citation> McInerney, D., Kavetski, D., Thyer, M., Lerat, J., and Kuczera, G.: Benefits of explicit treatment of zero flows in probabilistic hydrological modeling of ephemeral catchments, Water Resour. Res., 55, 11035–11060, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx43"><label>Meinshausen and Ridgeway(2006)</label><mixed-citation>Meinshausen, N. and Ridgeway, G.: Quantile regression forests, J. Mach. Learn. Res., 7, <uri>https://jmlr.org/papers/v7/meinshausen06a.html</uri> (last access: 10 June 2026), 2006.</mixed-citation></ref>
      <ref id="bib1.bibx44"><label>Montero-Manso and Hyndman(2021)</label><mixed-citation> Montero-Manso, P. and Hyndman, R. J.: Principles and algorithms for forecasting groups of time series: Locality and globality, Int. J. Forecasting, 37, 1632–1653, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx45"><label>Nash and Sutcliffe(1970)</label><mixed-citation> Nash, J. E. and Sutcliffe, J. V.: River flow forecasting through conceptual models part I—A discussion of principles, J. Hydrol., 10, 282–290, 1970.</mixed-citation></ref>
      <ref id="bib1.bibx46"><label>Oshiro et al.(2012)Oshiro, Perez, and Baranauskas</label><mixed-citation>Oshiro, T. M., Perez, P. S., and Baranauskas, J. A.: How many trees in a random forest?, in: International workshop on machine learning and data mining in pattern recognition, Springer, 154–168, <ext-link xlink:href="https://doi.org/10.1007/978-3-642-31537-4_13" ext-link-type="DOI">10.1007/978-3-642-31537-4_13</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx47"><label>Oudin et al.(2005)Oudin, Hervieu, Michel, Perrin, Andréassian, Anctil, and Loumagne</label><mixed-citation> Oudin, L., Hervieu, F., Michel, C., Perrin, C., Andréassian, V., Anctil, F., and Loumagne, C.: Which potential evapotranspiration input for a lumped rainfall–runoff model?: Part 2—Towards a simple and efficient potential evapotranspiration model for rainfall–runoff modelling, J. Hydrol., 303, 290–306, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx48"><label>Oudin et al.(2008)Oudin, Andréassian, Perrin, Michel, and Le Moine</label><mixed-citation>Oudin, L., Andréassian, V., Perrin, C., Michel, C., and Le Moine, N.: Spatial proximity, physical similarity, regression and ungaged catchments: A comparison of regionalization approaches based on 913 French catchments, Water Resour. Res., 44, <ext-link xlink:href="https://doi.org/10.1029/2007WR006240" ext-link-type="DOI">10.1029/2007WR006240</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx49"><label>Papacharalampous and Langousis(2022)</label><mixed-citation>Papacharalampous, G. and Langousis, A.: Probabilistic water demand forecasting using quantile regression algorithms, Water Resour. Res., 58, e2021WR030216, <ext-link xlink:href="https://doi.org/10.1029/2021WR030216" ext-link-type="DOI">10.1029/2021WR030216</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx50"><label>Pham et al.(2020)Pham, Luo, and Finley</label><mixed-citation>Pham, L. T., Luo, L., and Finley, A.: Evaluation of random forests for short-term daily streamflow forecasting in rainfall- and snowmelt-driven watersheds, Hydrol. Earth Syst. Sci., 25, 2997–3015, <ext-link xlink:href="https://doi.org/10.5194/hess-25-2997-2021" ext-link-type="DOI">10.5194/hess-25-2997-2021</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx51"><label>Poncelet et al.(2017)Poncelet, Merz, Merz, Parajka, Oudin, Andréassian, and Perrin</label><mixed-citation> Poncelet, C., Merz, R., Merz, B., Parajka, J., Oudin, L., Andréassian, V., and Perrin, C.: Process-based interpretation of conceptual hydrological model performance using a multinational catchment set, Water Resour. Res., 53, 7247–7268, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx52"><label>Pushpalatha et al.(2011)Pushpalatha, Perrin, Le Moine, Mathevet, and Andréassian</label><mixed-citation>Pushpalatha, R., Perrin, C., Le Moine, N., Mathevet, T., and Andréassian, V.: A downward structural sensitivity analysis of hydrological models to improve low-flow simulation, J. Hydrol., 411, 66–76, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2011.09.034" ext-link-type="DOI">10.1016/j.jhydrol.2011.09.034</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx53"><label>Pushpalatha et al.(2012)Pushpalatha, Perrin, Moine, and Andréassian</label><mixed-citation>Pushpalatha, R., Perrin, C., Moine, N. L., and Andréassian, V.: A review of efficiency criteria suitable for evaluating low-flow simulations, J. Hydrol., 420–421, 171–182, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2011.11.055" ext-link-type="DOI">10.1016/j.jhydrol.2011.11.055</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx54"><label>Raschka et al.(2020)Raschka, Patterson, and Nolet</label><mixed-citation>Raschka, S., Patterson, J., and Nolet, C.: Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence, arXiv [preprint], <ext-link xlink:href="https://arxiv.org/abs/2002.04803">arXiv:2002.04803</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx55"><label>Razavi and Coulibaly(2013)</label><mixed-citation> Razavi, T. and Coulibaly, P.: Streamflow prediction in ungauged basins: review of regionalization methods, J. Hydrol. Eng., 18, 958–975, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx56"><label>Renard et al.(2010)Renard, Kavetski, Kuczera, Thyer, and Franks</label><mixed-citation>Renard, B., Kavetski, D., Kuczera, G., Thyer, M., and Franks, S. W.: Understanding predictive uncertainty in hydrologic modeling: The challenge of identifying input and structural errors, Water Resour. Res., 46, <ext-link xlink:href="https://doi.org/10.1029/2009WR008328" ext-link-type="DOI">10.1029/2009WR008328</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx57"><label>Shen et al.(2022)Shen, Ruijsch, Lu, Sutanudjaja, and Karssenberg</label><mixed-citation>Shen, Y., Ruijsch, J., Lu, M., Sutanudjaja, E. H., and Karssenberg, D.: Random forests-based error-correction of streamflow from a large-scale hydrological model: Using model state variables to estimate error terms, Comput. Geosci., 159, 105019, <ext-link xlink:href="https://doi.org/10.1016/j.cageo.2021.105019" ext-link-type="DOI">10.1016/j.cageo.2021.105019</ext-link>, 2022.</mixed-citation></ref>
      <ref id="bib1.bibx58"><label>Solomatine and Shrestha(2009)</label><mixed-citation>Solomatine, D. P. and Shrestha, D. L.: A novel method to estimate model uncertainty using machine learning techniques, Water Resour. Res., 45, <ext-link xlink:href="https://doi.org/10.1029/2008WR006839" ext-link-type="DOI">10.1029/2008WR006839</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx59"><label>Taillardat and Mestre(2020)</label><mixed-citation>Taillardat, M. and Mestre, O.: From research to applications – examples of operational ensemble post-processing in France using machine learning, Nonlin. Processes Geophys., 27, 329–347, <ext-link xlink:href="https://doi.org/10.5194/npg-27-329-2020" ext-link-type="DOI">10.5194/npg-27-329-2020</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx60"><label>Taillardat et al.(2016)Taillardat, Mestre, Zamo, and Naveau</label><mixed-citation> Taillardat, M., Mestre, O., Zamo, M., and Naveau, P.: Calibrated ensemble forecasts using quantile regression forests and ensemble model output statistics, Mon. Weather Rev., 144, 2375–2393, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx61"><label>Tanguy et al.(2023)Tanguy, Chevuturi, Marchant, Mackay, Parry, and Hannaford</label><mixed-citation>Tanguy, M., Chevuturi, A., Marchant, B. P., Mackay, J. D., Parry, S., and Hannaford, J.: How will climate change affect the spatial coherence of streamflow and groundwater droughts in Great Britain?, Environ. Res. Lett., 18, 064048, <ext-link xlink:href="https://doi.org/10.1088/1748-9326/acd655" ext-link-type="DOI">10.1088/1748-9326/acd655</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx62"><label>Teja et al.(2023)Teja, Manikanta, Das, and Umamahesh</label><mixed-citation>Teja, K. N., Manikanta, V., Das, J., and Umamahesh, N.: Enhancing the predictability of flood forecasts by combining Numerical Weather Prediction ensembles with multiple hydrological models, J. Hydrol., 625, 130176, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2023.130176" ext-link-type="DOI">10.1016/j.jhydrol.2023.130176</ext-link>, 2023.</mixed-citation></ref>
      <ref id="bib1.bibx63"><label>Thirel et al.(2024)Thirel, Santos, Delaigue, and Perrin</label><mixed-citation>Thirel, G., Santos, L., Delaigue, O., and Perrin, C.: On the use of streamflow transformations for hydrological model calibration, Hydrol. Earth Syst. Sci., 28, 4837–4860, <ext-link xlink:href="https://doi.org/10.5194/hess-28-4837-2024" ext-link-type="DOI">10.5194/hess-28-4837-2024</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx64"><label>Tiberi-Wadier et al.(2021)Tiberi-Wadier, Goutal, Ricci, Sergent, Taillardat, Bouttier, and Monteil</label><mixed-citation>Tiberi-Wadier, A.-L., Goutal, N., Ricci, S., Sergent, P., Taillardat, M., Bouttier, F., and Monteil, C.: Strategies for hydrologic ensemble generation and calibration: On the merits of using model-based predictors, J. Hydrol., 599, 126233, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2021.126233" ext-link-type="DOI">10.1016/j.jhydrol.2021.126233</ext-link>, 021.</mixed-citation></ref>
      <ref id="bib1.bibx65"><label>Todini(2008)</label><mixed-citation>Todini, E.: A model conditional processor to assess predictive uncertainty in flood forecasting, International Journal of River Basin Management, 6, 123–137, <ext-link xlink:href="https://doi.org/10.1080/15715124.2008.9635342" ext-link-type="DOI">10.1080/15715124.2008.9635342</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx66"><label>Troin et al.(2021)Troin, Arsenault, Wood, Brissette, and Martel</label><mixed-citation>Troin, M., Arsenault, R., Wood, A. W., Brissette, F., and Martel, J.-L.: Generating ensemble streamflow forecasts: A review of methods and approaches over the past 40 years, Water Ressour. Res., 57, <ext-link xlink:href="https://doi.org/10.1029/2020WR028392" ext-link-type="DOI">10.1029/2020WR028392</ext-link>, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx67"><label>Tyralis and Papacharalampous(2024)</label><mixed-citation>Tyralis, H. and Papacharalampous, G.: A review of predictive uncertainty estimation with machine learning, Artif. Intell. Rev., 57, 94, <ext-link xlink:href="https://doi.org/10.1007/s10462-023-10698-8" ext-link-type="DOI">10.1007/s10462-023-10698-8</ext-link>, 2024.</mixed-citation></ref>
      <ref id="bib1.bibx68"><label>Tyralis et al.(2019)Tyralis, Papacharalampous, Burnetas, and Langousis</label><mixed-citation>Tyralis, H., Papacharalampous, G., Burnetas, A., and Langousis, A.: Hydrological post-processing using stacked generalization of quantile regression algorithms: Large-scale application over CONUS, J. Hydrol., 577, 123957, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2019.123957" ext-link-type="DOI">10.1016/j.jhydrol.2019.123957</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx69"><label>Valéry et al.(2014)Valéry, Andréassian, and Perrin</label><mixed-citation>Valéry, A., Andréassian, V., and Perrin, C.: ‘As simple as possible but not simpler’: What is useful in a temperature-based snow-accounting routine? Part 2–Sensitivity analysis of the Cemaneige snow accounting routine on 380 catchments, J. Hydrol., 517, 1176–1187, <ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2014.04.058" ext-link-type="DOI">10.1016/j.jhydrol.2014.04.058</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx70"><label>Vidal et al.(2010)Vidal, Martin, Franchisteguy, Baillon, and Soubeyroux</label><mixed-citation>Vidal, J.-P., Martin, E., Franchisteguy, L., Baillon, M., and Soubeyroux, J.-M.: A 50-year high-resolution atmospheric reanalysis over France with the Safran system, Int. J. Climat., 30, <ext-link xlink:href="https://doi.org/10.1002/joc.2003" ext-link-type="DOI">10.1002/joc.2003</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx71"><label>Wani et al.(2017)Wani, Beckers, Weerts, and Solomatine</label><mixed-citation>Wani, O., Beckers, J. V. L., Weerts, A. H., and Solomatine, D. P.: Residual uncertainty estimation using instance-based learning with applications to hydrologic forecasting, Hydrol. Earth Syst. Sci., 21, 4021–4036, <ext-link xlink:href="https://doi.org/10.5194/hess-21-4021-2017" ext-link-type="DOI">10.5194/hess-21-4021-2017</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx72"><label>White et al.(2017)White, Carlsen, Robertson, Klein, Lazo, Kumar, Vitart, Coughlan de Perez, Ray, Murray et al.</label><mixed-citation>White, C. J., Carlsen, H., Robertson, A. W., Klein, R. J. T., Lazo, J., Kumar, A., Vitart, F., Coughlan de Perez, E., Ray, A. J., Murray, V., Bharwani, S., Macleod, D., James, R., Fleming, L. E., Morse, A. P., Eggen, B., Graham, R., Kjellström, E., Becker, E., Pegion, K. V., Holbrook, N. J., McEvoy, D., Depledge, M., Perkins-Kirkpatrick, S. E., Brown, T. J., Street, R., Jones, L., Remenyi, T., Hodgson-Johnston, I., Buontempo, C., Lamb, R., Meinke, H., Arheimer, B., and Zebiak, S.: Potential applications of subseasonal-to-seasonal (S2S) predictions, Meteorol. Appl., 24, 315–325, 2017.  </mixed-citation></ref>
      <ref id="bib1.bibx73"><label>Zhang et al.(2023)Zhang, Ye, Analui, Nguyen, Sorooshian, Hsu, and Wang</label><mixed-citation>Zhang, Y., Ye, A., Analui, B., Nguyen, P., Sorooshian, S., Hsu, K., and Wang, Y.: Comparing quantile regression forest and mixture density long short-term memory models for probabilistic post-processing of satellite precipitation-driven streamflow simulations, Hydrol. Earth Syst. Sci., 27, 4529–4550, <ext-link xlink:href="https://doi.org/10.5194/hess-27-4529-2023" ext-link-type="DOI">10.5194/hess-27-4529-2023</ext-link>, 2023.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Multi-site learning for hydrological uncertainty prediction: the case of quantile random forests</article-title-html>
<abstract-html/>
<ref-html id="bib1.bib1"><label>Auer et al.(2024)Auer, Gauch, Kratzert, Nearing, Hochreiter, and Klotz</label><mixed-citation>
      
Auer, A., Gauch, M., Kratzert, F., Nearing, G., Hochreiter, S., and Klotz, D.:
A data-centric perspective on the information needed for hydrological uncertainty predictions, Hydrol. Earth Syst. Sci., 28, 4099–4126, <a href="https://doi.org/10.5194/hess-28-4099-2024" target="_blank">https://doi.org/10.5194/hess-28-4099-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Bates and Campbell(2001)</label><mixed-citation>
      
Bates, B. C. and Campbell, E. P.:
A Markov chain Monte Carlo scheme for parameter estimation and inference in conceptual rainfall-runoff modeling, Water Resour. Res., 37, 937–947, 2001.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Bellier et al.(2017)Bellier, Zin, and Bontron</label><mixed-citation>
      
Bellier, J., Zin, I., and Bontron, G.:
Sample stratification in verification of ensemble forecasts of continuous scalar variables: Potential benefits and pitfalls, Mon. Weather Rev., 145, 3529–3544, <a href="https://doi.org/10.1175/MWR-D-16-0487.1" target="_blank">https://doi.org/10.1175/MWR-D-16-0487.1</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Bennett et al.(2021)Bennett, Robertson, Wang, Li, and Perraud</label><mixed-citation>
      
Bennett, J. C., Robertson, D. E., Wang, Q. J., Li, M., and Perraud, J.-M.:
Propagating reliable estimates of hydrological forecast uncertainty to many lead times, J. Hydrol., 603, 126798, <a href="https://doi.org/10.1016/j.jhydrol.2021.126798" target="_blank">https://doi.org/10.1016/j.jhydrol.2021.126798</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Bertola et al.(2023)Bertola, Blöschl, Bohac, Borga, Castellarin, Chirico, Claps, Dallan, Danilovich, Ganora et al.</label><mixed-citation>
      
Bertola, M., Blöschl, G., Bohac, M., Borga, M., Castellarin, A., Chirico, G. B., Claps, P., Dallan, E., Danilovich, I., Ganora, D., Gorbachova, L., Ledvinka, O., Mavrova-Guirguinova, M., Montanari, A., Ovcharuk, V., Viglione, A., Volpi, E., Arheimer, B., Aronica, G. T., Bonacci, O., Čanjevac, I., Csik, A., Frolova, N., Gnandt, B., Gribovszki, Z., Gül, A., Günther, K., Guse, B., Hannaford, J., Harrigan, S., Kireeva, M., Kohnová, S., Komma, J., Kriauciuniene, J., Kronvang, B., Lawrence, D., Lüdtke, S., Mediero, L., Merz, B., Molnar, P., Murphy, C., Oskoruš, D., Osuch, M., Parajka, J., Pfister, L., Radevski, I., Sauquet, E., Schröter, K., Šraj, M., Szolgay, J., Turner, S., Valent, P., Veijalainen, N., Ward, P. J., Willems, P., and Zivkovic, N.:
Megafloods in Europe can be anticipated from observations in hydrologically similar catchments, Nat. Geosci., 16, 982–988, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Bontron(2004)</label><mixed-citation>
      
Bontron, G.:
Prévision quantitative des précipitations: Adaptation probabiliste par recherche d'analogues. Utilisation des Réanalyses NCEP/NCAR et application aux précipitations du Sud-Est de la France, PhD thesis, Institut National Polytechnique Grenoble (INPG), 2004.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Bourgin et al.(2015)Bourgin, Andréassian, Perrin, and Oudin</label><mixed-citation>
      
Bourgin, F., Andréassian, V., Perrin, C., and Oudin, L.:
Transferring global uncertainty estimates from gauged to ungauged catchments, Hydrol. Earth Syst. Sci., 19, 2535–2546, <a href="https://doi.org/10.5194/hess-19-2535-2015" target="_blank">https://doi.org/10.5194/hess-19-2535-2015</a>, 2015.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Breiman(2001)</label><mixed-citation>
      
Breiman, L.:
Random forests, Mach. Learn., 45, 5–32, 2001.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Breiman et al.(2017)Breiman, Friedman, Olshen, and Stone</label><mixed-citation>
      
Breiman, L., Friedman, J., Olshen, R. A., and Stone, C. J.:
Classification and regression trees, Routledge, <a href="https://doi.org/10.1201/9781315139470" target="_blank">https://doi.org/10.1201/9781315139470</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Coron et al.(2017)Coron, Thirel, Delaigue, Perrin, and Andréassian</label><mixed-citation>
      
Coron, L., Thirel, G., Delaigue, O., Perrin, C., and Andréassian, V.:
The Suite of Lumped GR Hydrological Models in an R package, Environ. Modell. Softw., 94, 166–171, <a href="https://doi.org/10.1016/j.envsoft.2017.05.002" target="_blank">https://doi.org/10.1016/j.envsoft.2017.05.002</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Coron et al.(2023)Coron, Delaigue, Thirel, Dorchies, Perrin, and Michel</label><mixed-citation>
      
Coron, L., Delaigue, O., Thirel, G., Dorchies, D., Perrin, C., and Michel, C.:
airGR: Suite of GR Hydrological Models for Precipitation-Runoff Modelling, r package version 1.7.4, <a href="https://doi.org/10.15454/EX11NA" target="_blank">https://doi.org/10.15454/EX11NA</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Coron et al.(2025)</label><mixed-citation>
      
Coron, L., Delaigue, O., Thirel, G., Dorchies, D., Perrin, C., Michel, C., Andréassian, V., Bourgin, F., Brigode, P., Le Moine, N., Mathevet, T., Mouelhi, S., Oudin, L., Pushpalatha, R., and Valéry, A.: airGR: Suite of GR Hydrological Models for Precipitation-Runoff Modelling Version 1.7.8, CRAN [code], <a href="https://doi.org/10.32614/CRAN.package.airGR" target="_blank">https://doi.org/10.32614/CRAN.package.airGR</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Delaigue et al.(2024)</label><mixed-citation>
      
Delaigue, O., Guimarães, G. M., Brigode, P., Génot, B., Perrin, C., and Andréassian, V.: CAMELS-FR dataset, Recherche Data Gouv, V3 [data set], <a href="https://doi.org/10.57745/WH7FJR" target="_blank">https://doi.org/10.57745/WH7FJR</a>,  2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Delaigue et al.(2025)Delaigue, Guimarães, Brigode, Génot, Perrin, Soubeyroux, Janet, Addor, and Andréassian</label><mixed-citation>
      
Delaigue, O., Guimarães, G. M., Brigode, P., Génot, B., Perrin, C., Soubeyroux, J.-M., Janet, B., Addor, N., and Andréassian, V.:
CAMELS-FR dataset: a large-sample hydroclimatic dataset for France to explore hydrological diversity and support model benchmarking, Earth Syst. Sci. Data, 17, 1461–1479, <a href="https://doi.org/10.5194/essd-17-1461-2025" target="_blank">https://doi.org/10.5194/essd-17-1461-2025</a>, 2025.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Delle Monache et al.(2013)Delle Monache, Eckel, Rife, Nagarajan, and Searight</label><mixed-citation>
      
Delle Monache, L., Eckel, F. A., Rife, D. L., Nagarajan, B., and Searight, K.:
Probabilistic weather prediction with an analog ensemble, Mon. Weather Rev., 141, 3498–3516, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Dufeu et al.(2022)Dufeu, Mougin, Foray, Baillon, Lamblin, Hebrard, Chaleon, Romon, Cobos, Gouin et al.</label><mixed-citation>
      
Dufeu, E., Mougin, F., Foray, A., Baillon, M., Lamblin, R., Hebrard, F., Chaleon, C., Romon, S., Cobos, L., Gouin, P., Audouy, J.-N., Martin, R., and Poligot-Pitsch, S.:
Finalisation de l’opération HYDRO 3 de modernisation du système d’information national des données hydrométriques, LHB, 108, 2099317, <a href="https://doi.org/10.1080/27678490.2022.2099317" target="_blank">https://doi.org/10.1080/27678490.2022.2099317</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Evin et al.(2014)Evin, Thyer, Kavetski, McInerney, and Kuczera</label><mixed-citation>
      
Evin, G., Thyer, M., Kavetski, D., McInerney, D., and Kuczera, G.:
Comparison of joint versus postprocessor approaches for hydrological uncertainty estimation accounting for error autocorrelation and heteroscedasticity, Water Resour. Res., 50, 2350–2375, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Fang et al.(2024)Fang, Johnson, Yeghiazarian, and Sankarasubramanian</label><mixed-citation>
      
Fang, S., Johnson, J. M., Yeghiazarian, L., and Sankarasubramanian, A.:
Improved national-scale above-normal flow prediction for gauged and ungauged basins using a spatio-temporal hierarchical model, Water Resour. Res., 60, e2023WR034557, <a href="https://doi.org/10.1029/2023WR034557" target="_blank">https://doi.org/10.1029/2023WR034557</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Georgakakos et al.(2004)Georgakakos, Seo, Gupta, Schaake, and Butts</label><mixed-citation>
      
Georgakakos, K. P., Seo, D.-J., Gupta, H., Schaake, J., and Butts, M. B.:
Towards the characterization of streamflow simulation uncertainty through multimodel ensembles, J. Hydrol., 298, 222–241, 2004.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Gneiting et al.(2005)Gneiting, Raftery, Westveld, and Goldman</label><mixed-citation>
      
Gneiting, T., Raftery, A. E., Westveld, A. H., and Goldman, T.:
Calibrated Probabilistic Forecasting Using Ensemble Model Output Statistics and Minimum CRPS Estimation, Mon. Weather Rev., 133, 1098–1118, <a href="https://doi.org/10.1175/MWR2904.1" target="_blank">https://doi.org/10.1175/MWR2904.1</a>, 2005.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Gneiting et al.(2007)Gneiting, Balabdaoui, and Raftery</label><mixed-citation>
      
Gneiting, T., Balabdaoui, F., and Raftery, A. E.:
Probabilistic forecasts, calibration and sharpness, J. Roy. Stat. Soc. B, 69, 243–268, 2007.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Golian et al.(2021)Golian, Murphy, and Meresa</label><mixed-citation>
      
Golian, S., Murphy, C., and Meresa, H.:
Regionalization of hydrological models for flow estimation in ungauged catchments in Ireland, Journal of Hydrology: Regional Studies, 36, 100859, <a href="https://doi.org/10.1016/j.ejrh.2021.100859" target="_blank">https://doi.org/10.1016/j.ejrh.2021.100859</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Gupta et al.(2024)Gupta, Hantush, Govindaraju, and Beven</label><mixed-citation>
      
Gupta, A., Hantush, M. M., Govindaraju, R. S., and Beven, K.:
Evaluation of hydrological models at gauged and ungauged basins using machine learning-based limits-of-acceptability and hydrological signatures, J. Hydrol., 641, 131774, <a href="https://doi.org/10.1016/j.jhydrol.2024.131774" target="_blank">https://doi.org/10.1016/j.jhydrol.2024.131774</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Gupta et al.(2009)Gupta, Kling, Yilmaz, and Martinez</label><mixed-citation>
      
Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.:
Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling, J. Hydrol., 377, 80–91, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Hallouin et al.(2024)Hallouin, Bourgin, Perrin, Ramos, and Andréassian</label><mixed-citation>
      
Hallouin, T., Bourgin, F., Perrin, C., Ramos, M.-H., and Andréassian, V.:
EvalHyd v0.1.2: a polyglot tool for the evaluation of deterministic and probabilistic streamflow predictions, Geosci. Model Dev., 17, 4561–4578, <a href="https://doi.org/10.5194/gmd-17-4561-2024" target="_blank">https://doi.org/10.5194/gmd-17-4561-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Hashemi et al.(2022)Hashemi, Brigode, Garambois, and Javelle</label><mixed-citation>
      
Hashemi, R., Brigode, P., Garambois, P.-A., and Javelle, P.:
How can we benefit from regime information to make more effective use of long short-term memory (LSTM) runoff models?, Hydrol. Earth Syst. Sci., 26, 5793–5816, <a href="https://doi.org/10.5194/hess-26-5793-2022" target="_blank">https://doi.org/10.5194/hess-26-5793-2022</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Hastie et al.(2001)Hastie, Tibshirani, and Friedman</label><mixed-citation>
      
Hastie, T., Tibshirani, R., and Friedman, J.:
The Elements of Statistical Learning, Springer Series in Statistics, Springer New York Inc., New York, NY, USA, <a href="https://doi.org/10.1007/978-0-387-84858-7" target="_blank">https://doi.org/10.1007/978-0-387-84858-7</a>, 2001.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Hu et al.(2023)Hu, Cervone, Young, and Delle Monache</label><mixed-citation>
      
Hu, W., Cervone, G., Young, G., and Delle Monache, L.:
Machine learning weather analogs for near-surface variables, Bound.-Lay. Meteorol., 186, 711–735, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Hwang et al.(2019)Hwang, Orenstein, Cohen, Pfeiffer, and Mackey</label><mixed-citation>
      
Hwang, J., Orenstein, P., Cohen, J., Pfeiffer, K., and Mackey, L.:
Improving subseasonal forecasting in the western US with machine learning, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining, 2325–2335, <a href="https://doi.org/10.1145/3292500.3330674" target="_blank">https://doi.org/10.1145/3292500.3330674</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Jehn et al.(2020)Jehn, Bestian, Breuer, Kraft, and Houska</label><mixed-citation>
      
Jehn, F. U., Bestian, K., Breuer, L., Kraft, P., and Houska, T.:
Using hydrological and climatic catchment clusters to explore drivers of catchment behavior, Hydrol. Earth Syst. Sci., 24, 1081–1100, <a href="https://doi.org/10.5194/hess-24-1081-2020" target="_blank">https://doi.org/10.5194/hess-24-1081-2020</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Johnson et al.(2023)Johnson, Fang, Sankarasubramanian, Rad, Kindl da Cunha, Jennings, Clarke, Mazrooei, and Yeghiazarian</label><mixed-citation>
      
Johnson, J. M., Fang, S., Sankarasubramanian, A., Rad, A. M., Kindl da Cunha, L., Jennings, K. S., Clarke, K. C., Mazrooei, A., and Yeghiazarian, L.:
Comprehensive analysis of the NOAA National Water Model: A call for heterogeneous formulations and diagnostic model selection, J. Geophys. Res.-Atmos., 128, e2023JD038534, <a href="https://doi.org/10.1029/2023JD038534" target="_blank">https://doi.org/10.1029/2023JD038534</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Johnson(2024)</label><mixed-citation>
      
Johnson, R. A.:
quantile-forest: A Python Package for Quantile Regression Forests, Journal of Open Source Software, 9, 5976, <a href="https://doi.org/10.21105/joss.05976" target="_blank">https://doi.org/10.21105/joss.05976</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Kling et al.(2012)Kling, Fuchs, and Paulin</label><mixed-citation>
      
Kling, H., Fuchs, M., and Paulin, M.:
Runoff conditions in the upper Danube basin under an ensemble of climate change scenarios, J. Hydrol., 424, 264–277, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Kratzert et al.(2018)Kratzert, Klotz, Brenner, Schulz, and Herrnegger</label><mixed-citation>
      
Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.:
Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005–6022, <a href="https://doi.org/10.5194/hess-22-6005-2018" target="_blank">https://doi.org/10.5194/hess-22-6005-2018</a>, 2018.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Kratzert et al.(2024)Kratzert, Gauch, Klotz, and Nearing</label><mixed-citation>
      
Kratzert, F., Gauch, M., Klotz, D., and Nearing, G.:
HESS Opinions: Never train a Long Short-Term Memory (LSTM) network on a single basin, Hydrol. Earth Syst. Sci., 28, 4187–4201, <a href="https://doi.org/10.5194/hess-28-4187-2024" target="_blank">https://doi.org/10.5194/hess-28-4187-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Krzysztofowicz(1999)</label><mixed-citation>
      
Krzysztofowicz, R.:
Bayesian theory of probabilistic forecasting via deterministic hydrologic model, Water Resour. Res., 35, 2739–2750, 1999.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Kuczera and Parent(1998)</label><mixed-citation>
      
Kuczera, G. and Parent, E.:
Monte Carlo assessment of parameter uncertainty in conceptual catchment models: the Metropolis algorithm, J. Hydrol., 211, 69–85, 1998.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Leleu et al.(2014)Leleu, Tonnelier, Puechberty, Gouin, Viquendi, Cobos, Foray, Baillon, and Ndima</label><mixed-citation>
      
Leleu, I., Tonnelier, I., Puechberty, R., Gouin, P., Viquendi, I., Cobos, L., Foray, A., Baillon, M., and Ndima, P.-O.:
La refonte du système d'information national pour la gestion et la mise à disposition des données hydrométriques, La Houille Blanche, 100, 25–32, <a href="https://doi.org/10.1051/lhb/2014004" target="_blank">https://doi.org/10.1051/lhb/2014004</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Li et al.(2016)Li, Wang, Bennett, and Robertson</label><mixed-citation>
      
Li, M., Wang, Q. J., Bennett, J. C., and Robertson, D. E.:
Error reduction and representation in stages (ERRIS) in hydrological modelling for ensemble streamflow forecasting, Hydrol. Earth Syst. Sci., 20, 3561–3579, <a href="https://doi.org/10.5194/hess-20-3561-2016" target="_blank">https://doi.org/10.5194/hess-20-3561-2016</a>, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Louppe(2014)</label><mixed-citation>
      
Louppe, G.:
Understanding random forests: From theory to practice, Universite de Liege (Belgium), 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Magni et al.(2023)Magni, Sutanudjaja, Shen, and Karssenberg</label><mixed-citation>
      
Magni, M., Sutanudjaja, E. H., Shen, Y., and Karssenberg, D.:
Global streamflow modelling using process-informed machine learning, J. Hydroinform., 25, 1648–1666, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>McInerney et al.(2019)McInerney, Kavetski, Thyer, Lerat, and Kuczera</label><mixed-citation>
      
McInerney, D., Kavetski, D., Thyer, M., Lerat, J., and Kuczera, G.:
Benefits of explicit treatment of zero flows in probabilistic hydrological modeling of ephemeral catchments, Water Resour. Res., 55, 11035–11060, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Meinshausen and Ridgeway(2006)</label><mixed-citation>
      
Meinshausen, N. and Ridgeway, G.:
Quantile regression forests, J. Mach. Learn. Res., 7, <a href="https://jmlr.org/papers/v7/meinshausen06a.html" target="_blank"/> (last access: 10 June 2026), 2006.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Montero-Manso and Hyndman(2021)</label><mixed-citation>
      
Montero-Manso, P. and Hyndman, R. J.:
Principles and algorithms for forecasting groups of time series: Locality and globality, Int. J. Forecasting, 37, 1632–1653, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Nash and Sutcliffe(1970)</label><mixed-citation>
      
Nash, J. E. and Sutcliffe, J. V.:
River flow forecasting through conceptual models part I—A discussion of principles, J. Hydrol., 10, 282–290, 1970.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Oshiro et al.(2012)Oshiro, Perez, and Baranauskas</label><mixed-citation>
      
Oshiro, T. M., Perez, P. S., and Baranauskas, J. A.:
How many trees in a random forest?, in: International workshop on machine learning and data mining in pattern recognition, Springer, 154–168, <a href="https://doi.org/10.1007/978-3-642-31537-4_13" target="_blank">https://doi.org/10.1007/978-3-642-31537-4_13</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Oudin et al.(2005)Oudin, Hervieu, Michel, Perrin, Andréassian, Anctil, and Loumagne</label><mixed-citation>
      
Oudin, L., Hervieu, F., Michel, C., Perrin, C., Andréassian, V., Anctil, F., and Loumagne, C.:
Which potential evapotranspiration input for a lumped rainfall–runoff model?: Part 2—Towards a simple and efficient potential evapotranspiration model for rainfall–runoff modelling, J. Hydrol., 303, 290–306, 2005.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Oudin et al.(2008)Oudin, Andréassian, Perrin, Michel, and Le Moine</label><mixed-citation>
      
Oudin, L., Andréassian, V., Perrin, C., Michel, C., and Le Moine, N.:
Spatial proximity, physical similarity, regression and ungaged catchments: A comparison of regionalization approaches based on 913 French catchments, Water Resour. Res., 44, <a href="https://doi.org/10.1029/2007WR006240" target="_blank">https://doi.org/10.1029/2007WR006240</a>, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Papacharalampous and Langousis(2022)</label><mixed-citation>
      
Papacharalampous, G. and Langousis, A.:
Probabilistic water demand forecasting using quantile regression algorithms, Water Resour. Res., 58, e2021WR030216, <a href="https://doi.org/10.1029/2021WR030216" target="_blank">https://doi.org/10.1029/2021WR030216</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Pham et al.(2020)Pham, Luo, and Finley</label><mixed-citation>
      
Pham, L. T., Luo, L., and Finley, A.:
Evaluation of random forests for short-term daily streamflow forecasting in rainfall- and snowmelt-driven watersheds, Hydrol. Earth Syst. Sci., 25, 2997–3015, <a href="https://doi.org/10.5194/hess-25-2997-2021" target="_blank">https://doi.org/10.5194/hess-25-2997-2021</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Poncelet et al.(2017)Poncelet, Merz, Merz, Parajka, Oudin, Andréassian, and Perrin</label><mixed-citation>
      
Poncelet, C., Merz, R., Merz, B., Parajka, J., Oudin, L., Andréassian, V., and Perrin, C.:
Process-based interpretation of conceptual hydrological model performance using a multinational catchment set, Water Resour. Res., 53, 7247–7268, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Pushpalatha et al.(2011)Pushpalatha, Perrin, Le Moine, Mathevet, and Andréassian</label><mixed-citation>
      
Pushpalatha, R., Perrin, C., Le Moine, N., Mathevet, T., and Andréassian, V.:
A downward structural sensitivity analysis of hydrological models to improve low-flow simulation, J. Hydrol., 411, 66–76, <a href="https://doi.org/10.1016/j.jhydrol.2011.09.034" target="_blank">https://doi.org/10.1016/j.jhydrol.2011.09.034</a>, 2011.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>Pushpalatha et al.(2012)Pushpalatha, Perrin, Moine, and Andréassian</label><mixed-citation>
      
Pushpalatha, R., Perrin, C., Moine, N. L., and Andréassian, V.:
A review of efficiency criteria suitable for evaluating low-flow simulations, J. Hydrol., 420–421, 171–182, <a href="https://doi.org/10.1016/j.jhydrol.2011.11.055" target="_blank">https://doi.org/10.1016/j.jhydrol.2011.11.055</a>, 2012.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>Raschka et al.(2020)Raschka, Patterson, and Nolet</label><mixed-citation>
      
Raschka, S., Patterson, J., and Nolet, C.:
Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence, arXiv [preprint], <a href="https://arxiv.org/abs/2002.04803" target="_blank">arXiv:2002.04803</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>Razavi and Coulibaly(2013)</label><mixed-citation>
      
Razavi, T. and Coulibaly, P.:
Streamflow prediction in ungauged basins: review of regionalization methods, J. Hydrol. Eng., 18, 958–975, 2013.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>Renard et al.(2010)Renard, Kavetski, Kuczera, Thyer, and Franks</label><mixed-citation>
      
Renard, B., Kavetski, D., Kuczera, G., Thyer, M., and Franks, S. W.:
Understanding predictive uncertainty in hydrologic modeling: The challenge of identifying input and structural errors, Water Resour. Res., 46, <a href="https://doi.org/10.1029/2009WR008328" target="_blank">https://doi.org/10.1029/2009WR008328</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>Shen et al.(2022)Shen, Ruijsch, Lu, Sutanudjaja, and Karssenberg</label><mixed-citation>
      
Shen, Y., Ruijsch, J., Lu, M., Sutanudjaja, E. H., and Karssenberg, D.:
Random forests-based error-correction of streamflow from a large-scale hydrological model: Using model state variables to estimate error terms, Comput. Geosci., 159, 105019, <a href="https://doi.org/10.1016/j.cageo.2021.105019" target="_blank">https://doi.org/10.1016/j.cageo.2021.105019</a>, 2022.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>Solomatine and Shrestha(2009)</label><mixed-citation>
      
Solomatine, D. P. and Shrestha, D. L.:
A novel method to estimate model uncertainty using machine learning techniques, Water Resour. Res., 45, <a href="https://doi.org/10.1029/2008WR006839" target="_blank">https://doi.org/10.1029/2008WR006839</a>, 2009.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>Taillardat and Mestre(2020)</label><mixed-citation>
      
Taillardat, M. and Mestre, O.:
From research to applications – examples of operational ensemble post-processing in France using machine learning, Nonlin. Processes Geophys., 27, 329–347, <a href="https://doi.org/10.5194/npg-27-329-2020" target="_blank">https://doi.org/10.5194/npg-27-329-2020</a>, 2020.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib60"><label>Taillardat et al.(2016)Taillardat, Mestre, Zamo, and Naveau</label><mixed-citation>
      
Taillardat, M., Mestre, O., Zamo, M., and Naveau, P.:
Calibrated ensemble forecasts using quantile regression forests and ensemble model output statistics, Mon. Weather Rev., 144, 2375–2393, 2016.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib61"><label>Tanguy et al.(2023)Tanguy, Chevuturi, Marchant, Mackay, Parry, and Hannaford</label><mixed-citation>
      
Tanguy, M., Chevuturi, A., Marchant, B. P., Mackay, J. D., Parry, S., and Hannaford, J.:
How will climate change affect the spatial coherence of streamflow and groundwater droughts in Great Britain?, Environ. Res. Lett., 18, 064048, <a href="https://doi.org/10.1088/1748-9326/acd655" target="_blank">https://doi.org/10.1088/1748-9326/acd655</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib62"><label>Teja et al.(2023)Teja, Manikanta, Das, and Umamahesh</label><mixed-citation>
      
Teja, K. N., Manikanta, V., Das, J., and Umamahesh, N.:
Enhancing the predictability of flood forecasts by combining Numerical Weather Prediction ensembles with multiple hydrological models, J. Hydrol., 625, 130176, <a href="https://doi.org/10.1016/j.jhydrol.2023.130176" target="_blank">https://doi.org/10.1016/j.jhydrol.2023.130176</a>, 2023.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib63"><label>Thirel et al.(2024)Thirel, Santos, Delaigue, and Perrin</label><mixed-citation>
      
Thirel, G., Santos, L., Delaigue, O., and Perrin, C.:
On the use of streamflow transformations for hydrological model calibration, Hydrol. Earth Syst. Sci., 28, 4837–4860, <a href="https://doi.org/10.5194/hess-28-4837-2024" target="_blank">https://doi.org/10.5194/hess-28-4837-2024</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib64"><label>Tiberi-Wadier et al.(2021)Tiberi-Wadier, Goutal, Ricci, Sergent, Taillardat, Bouttier, and Monteil</label><mixed-citation>
      
Tiberi-Wadier, A.-L., Goutal, N., Ricci, S., Sergent, P., Taillardat, M., Bouttier, F., and Monteil, C.:
Strategies for hydrologic ensemble generation and calibration: On the merits of using model-based predictors, J. Hydrol., 599, 126233, <a href="https://doi.org/10.1016/j.jhydrol.2021.126233" target="_blank">https://doi.org/10.1016/j.jhydrol.2021.126233</a>, 021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib65"><label>Todini(2008)</label><mixed-citation>
      
Todini, E.:
A model conditional processor to assess predictive uncertainty in flood forecasting, International Journal of River Basin Management, 6, 123–137, <a href="https://doi.org/10.1080/15715124.2008.9635342" target="_blank">https://doi.org/10.1080/15715124.2008.9635342</a>, 2008.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib66"><label>Troin et al.(2021)Troin, Arsenault, Wood, Brissette, and Martel</label><mixed-citation>
      
Troin, M., Arsenault, R., Wood, A. W., Brissette, F., and Martel, J.-L.:
Generating ensemble streamflow forecasts: A review of methods and approaches over the past 40 years, Water Ressour. Res., 57, <a href="https://doi.org/10.1029/2020WR028392" target="_blank">https://doi.org/10.1029/2020WR028392</a>, 2021.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib67"><label>Tyralis and Papacharalampous(2024)</label><mixed-citation>
      
Tyralis, H. and Papacharalampous, G.:
A review of predictive uncertainty estimation with machine learning, Artif. Intell. Rev., 57, 94, <a href="https://doi.org/10.1007/s10462-023-10698-8" target="_blank">https://doi.org/10.1007/s10462-023-10698-8</a>, 2024.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib68"><label>Tyralis et al.(2019)Tyralis, Papacharalampous, Burnetas, and Langousis</label><mixed-citation>
      
Tyralis, H., Papacharalampous, G., Burnetas, A., and Langousis, A.:
Hydrological post-processing using stacked generalization of quantile regression algorithms: Large-scale application over CONUS, J. Hydrol., 577, 123957, <a href="https://doi.org/10.1016/j.jhydrol.2019.123957" target="_blank">https://doi.org/10.1016/j.jhydrol.2019.123957</a>, 2019.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib69"><label>Valéry et al.(2014)Valéry, Andréassian, and Perrin</label><mixed-citation>
      
Valéry, A., Andréassian, V., and Perrin, C.:
‘As simple as possible but not simpler’: What is useful in a temperature-based snow-accounting routine? Part 2–Sensitivity analysis of the Cemaneige snow accounting routine on 380 catchments, J. Hydrol., 517, 1176–1187, <a href="https://doi.org/10.1016/j.jhydrol.2014.04.058" target="_blank">https://doi.org/10.1016/j.jhydrol.2014.04.058</a>, 2014.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib70"><label>Vidal et al.(2010)Vidal, Martin, Franchisteguy, Baillon, and Soubeyroux</label><mixed-citation>
      
Vidal, J.-P., Martin, E., Franchisteguy, L., Baillon, M., and Soubeyroux, J.-M.:
A 50-year high-resolution atmospheric reanalysis over France with the Safran system, Int. J. Climat., 30, <a href="https://doi.org/10.1002/joc.2003" target="_blank">https://doi.org/10.1002/joc.2003</a>, 2010.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib71"><label>Wani et al.(2017)Wani, Beckers, Weerts, and Solomatine</label><mixed-citation>
      
Wani, O., Beckers, J. V. L., Weerts, A. H., and Solomatine, D. P.:
Residual uncertainty estimation using instance-based learning with applications to hydrologic forecasting, Hydrol. Earth Syst. Sci., 21, 4021–4036, <a href="https://doi.org/10.5194/hess-21-4021-2017" target="_blank">https://doi.org/10.5194/hess-21-4021-2017</a>, 2017.

    </mixed-citation></ref-html>
<ref-html id="bib1.bib72"><label>White et al.(2017)White, Carlsen, Robertson, Klein, Lazo, Kumar, Vitart, Coughlan de Perez, Ray, Murray et al.</label><mixed-citation>
      
White, C. J., Carlsen, H., Robertson, A. W., Klein, R. J. T., Lazo, J., Kumar, A., Vitart, F., Coughlan de Perez, E., Ray, A. J., Murray, V., Bharwani, S., Macleod, D., James, R., Fleming, L. E., Morse, A. P., Eggen, B., Graham, R., Kjellström, E., Becker, E., Pegion, K. V., Holbrook, N. J., McEvoy, D., Depledge, M., Perkins-Kirkpatrick, S. E., Brown, T. J., Street, R., Jones, L., Remenyi, T., Hodgson-Johnston, I., Buontempo, C., Lamb, R., Meinke, H., Arheimer, B., and Zebiak, S.:
Potential applications of subseasonal-to-seasonal (S2S) predictions, Meteorol. Appl., 24, 315–325, 2017.


    </mixed-citation></ref-html>
<ref-html id="bib1.bib73"><label>Zhang et al.(2023)Zhang, Ye, Analui, Nguyen, Sorooshian, Hsu, and Wang</label><mixed-citation>
      
Zhang, Y., Ye, A., Analui, B., Nguyen, P., Sorooshian, S., Hsu, K., and Wang, Y.:
Comparing quantile regression forest and mixture density long short-term memory models for probabilistic post-processing of satellite precipitation-driven streamflow simulations, Hydrol. Earth Syst. Sci., 27, 4529–4550, <a href="https://doi.org/10.5194/hess-27-4529-2023" target="_blank">https://doi.org/10.5194/hess-27-4529-2023</a>, 2023.

    </mixed-citation></ref-html>--></article>
