<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0"><?xmltex \makeatother\@nolinetrue\makeatletter?>
  <front>
    <journal-meta><journal-id journal-id-type="publisher">HESS</journal-id><journal-title-group>
    <journal-title>Hydrology and Earth System Sciences</journal-title>
    <abbrev-journal-title abbrev-type="publisher">HESS</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Hydrol. Earth Syst. Sci.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1607-7938</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/hess-25-2045-2021</article-id><title-group><article-title>Rainfall–runoff prediction at multiple timescales with a single Long Short-Term Memory network</article-title><alt-title>Rainfall–runoff prediction at multiple timescales with a single LSTM</alt-title>
      </title-group><?xmltex \runningtitle{Rainfall--runoff prediction at multiple timescales with a single LSTM}?><?xmltex \runningauthor{M. Gauch et al.}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1 aff2">
          <name><surname>Gauch</surname><given-names>Martin</given-names></name>
          <email>gauch@ml.jku.at</email>
        <ext-link>https://orcid.org/0000-0002-4587-898X</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Kratzert</surname><given-names>Frederik</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-8897-7689</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Klotz</surname><given-names>Daniel</given-names></name>
          
        <ext-link>https://orcid.org/0000-0002-9843-6798</ext-link></contrib>
        <contrib contrib-type="author" corresp="no" rid="aff3 aff4">
          <name><surname>Nearing</surname><given-names>Grey</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Lin</surname><given-names>Jimmy</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1">
          <name><surname>Hochreiter</surname><given-names>Sepp</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>Institute for Machine Learning, Johannes Kepler University Linz, Linz, Austria</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Canada</institution>
        </aff>
        <aff id="aff3"><label>3</label><institution>Google Research, Mountain View, CA, USA</institution>
        </aff>
        <aff id="aff4"><label>4</label><institution>Department of Land, Air and Water Resources, University of California Davis, Davis, CA, USA</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Martin Gauch (gauch@ml.jku.at)</corresp></author-notes><pub-date><day>19</day><month>April</month><year>2021</year></pub-date>
      
      <volume>25</volume>
      <issue>4</issue>
      <fpage>2045</fpage><lpage>2062</lpage>
      <history>
        <date date-type="received"><day>15</day><month>October</month><year>2020</year></date>
           <date date-type="rev-request"><day>17</day><month>November</month><year>2020</year></date>
           <date date-type="rev-recd"><day>15</day><month>February</month><year>2021</year></date>
           <date date-type="accepted"><day>17</day><month>March</month><year>2021</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2021 Martin Gauch et al.</copyright-statement>
        <copyright-year>2021</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://hess.copernicus.org/articles/25/2045/2021/hess-25-2045-2021.html">This article is available from https://hess.copernicus.org/articles/25/2045/2021/hess-25-2045-2021.html</self-uri><self-uri xlink:href="https://hess.copernicus.org/articles/25/2045/2021/hess-25-2045-2021.pdf">The full text article is available as a PDF file from https://hess.copernicus.org/articles/25/2045/2021/hess-25-2045-2021.pdf</self-uri>
      <abstract><title>Abstract</title>
    <p id="d1e147">Long Short-Term Memory (LSTM) networks have been applied to daily discharge prediction with remarkable success.
Many practical applications, however, require predictions at more granular timescales.
For instance, accurate prediction of short but extreme flood peaks can make a lifesaving difference, yet such peaks may escape the coarse temporal resolution of daily predictions.
Naively training an LSTM on hourly data, however, entails very long input sequences that make learning difficult and computationally expensive.
In this study, we propose two multi-timescale LSTM (MTS-LSTM) architectures that jointly predict multiple timescales within one model, as they process long-past inputs at a different temporal resolution than more recent inputs.
In a benchmark on 516 basins across the continental United States, these models achieved significantly higher Nash–Sutcliffe efficiency (NSE) values than the US National Water Model.
Compared to naive prediction with distinct LSTMs per timescale, the multi-timescale architectures are computationally more efficient with no loss in accuracy.
Beyond prediction quality, the multi-timescale LSTM can process different input variables at different timescales, which is especially relevant to operational applications where the lead time of meteorological forcings depends on their temporal resolution.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

      <?xmltex \hack{\newpage}?>
<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d1e161">Rainfall–runoff modeling approaches based on deep learning – particularly Long Short-Term Memory (LSTM) networks – have proven highly successful in a number of studies.
LSTMs can predict hydrologic processes in multiple catchments using a single model and yield more accurate predictions than state-of-the-art process-based models in a variety of benchmarks <xref ref-type="bibr" rid="bib1.bibx29" id="paren.1"/>.</p>
      <p id="d1e167">Different applications require hydrologic information at different timescales.
For example, hydropower operators might care about daily or weekly (or longer) inputs into their reservoirs, while flood forecasting requires sub-daily predictions.
Yet, much of the work in applying deep learning to streamflow prediction has been at the daily timescale.
Daily forecasts make sense for medium- to long-range forecasts; however, daily input resolution mutes diurnal variations that may cause important variations in discharge signatures, such as evapotranspiration and snowmelt.
Thus, daily predictions are often too coarse to provide actionable information for short-range forecasts.
For example, in the event of flooding, the distinction between moderate discharge spread across the day and the same amount of water compressed into a few hours of flash flooding may pose a life-threatening difference.</p>
      <p id="d1e170">Because of this, hydrologic models often operate at multiple timescales using several independent setups of a traditional, process-based rainfall–runoff model.
For instance, the US National Oceanic and Atmospheric Administration’s (NOAA) National Water Model (NWM) produces hourly<?pagebreak page2046?> short-range forecasts every hour, as well as three- and six-hourly medium- to long-range forecasts every 6 h <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx41" id="paren.2"/>.
A similar approach could be used for machine learning models, but this has several disadvantages.
Namely, this would require a distinct deep learning model per timescale, which means that computational resource demands multiply with the number of predicted timescales.
Further, since the models cannot exchange information, predictions have the potential to be inconsistent across timescales.</p>
      <p id="d1e176">Another challenge in operational scenarios stems from the fact that in many cases multiple sets of forcing data are available.
Ideally, a model should be able to process all forcings products together, such that it can combine and correlate information across products and obtain better estimates of the true meteorologic conditions <xref ref-type="bibr" rid="bib1.bibx30" id="paren.3"><named-content content-type="pre">e.g.,</named-content></xref>.
Since different meteorological data products are often available at different timescales, models should not only be able to produce <italic>predictions</italic> at multiple timescales, but also to process <italic>inputs</italic> at different temporal resolutions.</p>
      <p id="d1e191">The issue of multiple input and output timescales is well-known in the field of machine learning (we refer readers with a machine learning or data science background to <xref ref-type="bibr" rid="bib1.bibx12" id="text.4"/> for a general introduction to rainfall–runoff modeling).
The architectural flexibility of recurrent neural models allows for approaches that jointly process the different timescales in a hierarchical fashion.
Techniques to “divide and conquer” long sequences through hierarchical processing date back decades <xref ref-type="bibr" rid="bib1.bibx44 bib1.bibx33" id="paren.5"><named-content content-type="pre">e.g.,</named-content></xref>.
More recently, <xref ref-type="bibr" rid="bib1.bibx25" id="text.6"/> proposed an architecture that efficiently learns long- and short-term relationships in time series.
Those authors partitioned the internals of their neural network into different groups, where each group was updated on individual, pre-defined, intervals.
However, the whole sequence was still processed on the highest frequency, which makes training slow.
<xref ref-type="bibr" rid="bib1.bibx6" id="text.7"/> extended the idea of hierarchical processing of multiple timescales: in their setup, the model adjusts its updating rates to the current input, e.g., to align with words or handwriting strokes.
A major limitation of this approach is that it requires a binary decision about whether to make an update or not, which complicates the training procedure.
<xref ref-type="bibr" rid="bib1.bibx35" id="text.8"/> proposed an LSTM architecture with a gating mechanism that allows state and output updates only during time slots of learned frequencies.
For time steps where the gate is closed, the old state and output are reused.
This helps discriminate superimposed input signals, but is likely unsuited for rainfall–runoff prediction because no aggregation takes place while the time gate is closed.
In a different research thread, <xref ref-type="bibr" rid="bib1.bibx17" id="text.9"/> proposed a multidimensional variant of LSTMs for problems with more than one temporal or spatial dimension.
Framed in the context of multi-timescale prediction, one could define each timescale as one temporal dimension and process the inputs at all timescales simultaneously.
Like the hierarchical approaches, however, this paradigm would process the full time series at all timescales and thus lead to slow training.</p>
      <p id="d1e215">Most of the models discussed above were designed for tasks like natural language processing and other non-physical applications.
Unlike these tasks, time series in rainfall–runoff modeling have regular frequencies with fixed translation factors (e.g., one day always has 24 h), whereas words in natural language or strokes in handwriting vary in their length.
The application discussed by <xref ref-type="bibr" rid="bib1.bibx4" id="text.10"/> was closer in this respect – they predicted hourly wind speeds given input data at multiple timescales.
However, like the aforementioned language and handwriting applications, the objective in this case was to predict a single time series – be it sentences, handwriting strokes, or wind speed.
Our objective encompasses <italic>multiple</italic> outputs, one for each target timescale.
In this regard, multi-timescale rainfall–runoff prediction has similarities to multi-objective optimization.
The different objectives (predictions at different frequencies) are closely related, since aggregation of discharge across time steps should be conservative:
for instance, every group of 24 hourly prediction steps should average (or sum, depending on the units) to one daily prediction step.
Rather than viewing the problem from a multi-objective perspective, <xref ref-type="bibr" rid="bib1.bibx32" id="text.11"/> modeled time-<italic>continuous</italic> functions with ODE-LSTMs, a method that combines LSTMs with a mixture of ordinary differential equations and recurrent neural networks (ODE-RNN).
The resulting models generate continuous predictions at arbitrary granularity.
Initially, this seems like a highly promising approach; however, it has several drawbacks in our application.
First, since one model generates predictions for all timescales, it is not straightforward to use different forcings products for different timescales.
Second, ODE-LSTMs were originally intended for scenarios where the input data arrives at <italic>irregular</italic> intervals.
In our context, the opposite is true: meteorological forcing data generally have fixed, regular frequencies.
Also, for practical purposes we do not actually need predictions at arbitrary granularity – a fixed set of target timescales is generally sufficient for hydrologic applications.
Lastly, in our exploratory experiments (see Appendix <xref ref-type="sec" rid="App1.Ch1.S3"/>), using ODE-LSTMs to predict at timescales that were not part of training was actually worse (and much slower) than (dis-)aggregating the fixed-timescale predictions of our proposed models (we report our exploration of ODE-LSTMs in Appendix <xref ref-type="sec" rid="App1.Ch1.S3"/>).</p>
      <p id="d1e238">In this paper, we show how LSTM-based architectures can jointly predict discharge at multiple timescales in one model while leveraging meteorological inputs at different timescales.
We make the following contributions:
<list list-type="bullet"><list-item>
      <p id="d1e243">First, we outline two LSTM architectures that predict discharge at multiple timescales (Sect. <xref ref-type="sec" rid="Ch1.S2.SS3"/>).
We capitalize on the fact that watersheds are damped systems: while the history of total mass and energy inputs are important, the impact of high-frequency variation becomes less important the farther we look back.
Our approach<?pagebreak page2047?> to providing multiple output timescales processes long-past input data at coarser temporal resolution than recent time steps.
This shortens the input sequences, since high-resolution inputs are only necessary for the last few time steps.
We benchmark daily and hourly predictions from these models against (i) a naive deep learning solution that trains individual LSTMs for each timescale and (ii) a traditional hydrologic model, the US National Water Model (Sect. <xref ref-type="sec" rid="Ch1.S3.SS1"/>).</p></list-item><list-item>
      <p id="d1e251">Second, we propose a regularization scheme that reduces inconsistencies across timescales as they arise from naive, per-timescale prediction (Sect. <xref ref-type="sec" rid="Ch1.S2.SS3.SSS2"/>).</p></list-item><list-item>
      <p id="d1e257">Third, we demonstrate that our multi-timescale LSTMs can ingest individual and multiple sets of forcing data for each target timescale, where each forcing product has regular but possibly individual time intervals.
This closely resembles operational forecasting use cases where forcings with high temporal resolution often have shorter lead times than forcings with low resolution (Sect. <xref ref-type="sec" rid="Ch1.S3.SS4"/>).</p></list-item></list></p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Data and methods</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Data</title>
      <p id="d1e277">To maintain some degree of continuity and comparability with previous benchmarking studies <xref ref-type="bibr" rid="bib1.bibx38 bib1.bibx29" id="paren.12"/>, we conducted our experiments on the CAMELS dataset <xref ref-type="bibr" rid="bib1.bibx36 bib1.bibx1" id="paren.13"/> in a way that is as comparable as possible.
The CAMELS dataset, however, provides discharge data only at daily timescales.
Out of the 531 CAMELS benchmarking basins used by previous studies, 516 basins have hourly stream gauge data available from the USGS Water Information System through the Instantaneous Values REST API <xref ref-type="bibr" rid="bib1.bibx45" id="paren.14"/>.
This USGS service provides historical measurements at varying sub-daily resolutions (usually every 15 to 60 min), which we averaged to hourly and daily time steps for each basin.
Since our forcing data and benchmark model data use UTC timestamps, we converted the USGS streamflow timestamps to UTC.
All discharge information used in this study was downloaded from the USGS API. We did not use the discharge from the CAMELS dataset, however, we chose only basins that are included in the CAMELS dataset.</p>
      <p id="d1e289">While CAMELS provides only daily meteorological forcing data, we needed hourly forcings.
To maintain some congruence with previous CAMELS experiments, we used the hourly NLDAS-2 product, which contains hourly meteorological data since 1979 <xref ref-type="bibr" rid="bib1.bibx47" id="paren.15"/>.
We spatially averaged the forcing variables listed in Table <xref ref-type="table" rid="Ch1.T1"/> for each basin.
Additionally, we averaged these basin-specific hourly meteorological variables to daily values.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T1"><?xmltex \currentcnt{1}?><label>Table 1</label><caption><p id="d1e300">NLDAS forcing variables used in this study <xref ref-type="bibr" rid="bib1.bibx47" id="paren.16"/>.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="2">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Variable</oasis:entry>
         <oasis:entry colname="col2">Units</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Total precipitation</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M1" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">kg</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Air temperature</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M2" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">K</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Surface pressure</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M3" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">Pa</mml:mi></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Surface downward longwave radiation</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M4" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">W</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Surface downward shortwave radiation</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M5" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">W</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">m</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Specific humidity</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M6" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">kg</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">kg</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Potential energy</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M7" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">J</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">kg</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Potential evaporation</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M8" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">J</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">kg</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Convective fraction</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M9" display="inline"><mml:mi>u</mml:mi></mml:math></inline-formula> wind component</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M10" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">m</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">s</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M11" display="inline"><mml:mi>v</mml:mi></mml:math></inline-formula> wind component</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M12" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">m</mml:mi><mml:mspace width="0.125em" linebreak="nobreak"/><mml:msup><mml:mi mathvariant="normal">s</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e575">Following common practice in machine learning, we split the dataset into training, validation, and test periods.
We used the training period (1 October  1990 to 30 September  2003) to fit the models given a set of hyperparameters.
Then, we applied the models to the validation period (1 October 2003 to 30 September  2008) to evaluate their accuracy on previously unseen data.
To find the ideal hyperparameters, we repeated this process several times (with different hyperparameters; see Appendix <xref ref-type="sec" rid="App1.Ch1.S4"/>) and selected the model that achieved the best validation accuracy.
Only after hyperparameters were selected for a final model on the validation dataset did we apply any trained model(s) to the test period (1 October 2008 to 30 September  2018).
This way, the model had never seen the test data during any part of the development or training process, which helps avoid overfitting.</p>
      <p id="d1e580">All LSTM models used in this study take as inputs the 11 forcing variables listed in Table <xref ref-type="table" rid="Ch1.T1"/>, concatenated at each time step with the same 27 static catchment attributes from the CAMELS dataset that were used by <xref ref-type="bibr" rid="bib1.bibx29" id="text.17"/>.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Benchmark models</title>
      <p id="d1e596">We used two groups of models as baselines for comparison with the proposed architectures: the LSTM proposed by <xref ref-type="bibr" rid="bib1.bibx29" id="text.18"/>, naively adapted to hourly streamflow modeling, and the NWM, a traditional hydrologic model used operationally by the National Oceanic and Atmospheric Administration <xref ref-type="bibr" rid="bib1.bibx8" id="paren.19"/>.</p>
<sec id="Ch1.S2.SS2.SSS1">
  <label>2.2.1</label><title>Traditional hydrologic model</title>
      <p id="d1e612">The US National Oceanic and Atmospheric Agency (NOAA) generates hourly streamflow predictions with the NWM <xref ref-type="bibr" rid="bib1.bibx41" id="paren.20"/>, which is a process-based model based on WRF-Hydro <xref ref-type="bibr" rid="bib1.bibx16" id="paren.21"/>.
Specifically, we benchmarked against the NWM v2 reanalysis product, which includes hourly streamflow predictions for the years 1993 through 2018.
For these predictions, the NWM was<?pagebreak page2048?> calibrated on meteorological NLDAS forcings for around 1500 basins and then regionalized to around <inline-formula><mml:math id="M13" display="inline"><mml:mn mathvariant="normal">2.7</mml:mn></mml:math></inline-formula> million outflow points (source: personal communication with scientists from NOAA, 2021).
To obtain predictions at lower (e.g., daily) temporal resolutions, we averaged the hourly predictions.</p>
</sec>
<sec id="Ch1.S2.SS2.SSS2">
  <label>2.2.2</label><title>Naive LSTM</title>
      <p id="d1e636">Long Short-Term Memory (LSTM) networks <xref ref-type="bibr" rid="bib1.bibx21" id="paren.22"/> are a flavor of recurrent neural networks designed to model long-term dependencies between input and output data.
LSTMs maintain an internal memory state that is updated at each time step by a set of activated functions called <italic>gates</italic>.
These gates control the input–state relationship (through the <italic>input gate</italic>), the state–output relationship (through the <italic>output gate</italic>), and the memory timescales (through the <italic>forget gate</italic>).
Figure <xref ref-type="fig" rid="Ch1.F1"/> illustrates this architecture. For a more detailed description of LSTMs, especially in the context of rainfall–runoff modeling, we refer to <xref ref-type="bibr" rid="bib1.bibx26" id="text.23"/>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1"><?xmltex \currentcnt{1}?><?xmltex \def\figurename{Figure}?><label>Figure 1</label><caption><p id="d1e662">Schematic architecture of an LSTM cell with input <inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, cell state <inline-formula><mml:math id="M15" display="inline"><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, output <inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, and activations for forget gate (<inline-formula><mml:math id="M17" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">f</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>), input gate (<inline-formula><mml:math id="M18" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mtext>i</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M19" display="inline"><mml:mrow><mml:msub><mml:mi>tanh⁡</mml:mi><mml:mtext>i</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>), and output gate (<inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi mathvariant="normal">o</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M21" display="inline"><mml:mrow><mml:msub><mml:mi>tanh⁡</mml:mi><mml:mi mathvariant="normal">o</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>). <inline-formula><mml:math id="M22" display="inline"><mml:mo>×</mml:mo></mml:math></inline-formula> and <inline-formula><mml:math id="M23" display="inline"><mml:mo>+</mml:mo></mml:math></inline-formula> denote element-wise multiplication and addition. Illustration derived from <xref ref-type="bibr" rid="bib1.bibx39" id="altparen.24"/>.</p></caption>
            <?xmltex \igopts{}?><graphic xlink:href="https://hess.copernicus.org/articles/25/2045/2021/hess-25-2045-2021-f01.png"/>

          </fig>

      <p id="d1e777">LSTMs can cope with longer time series than classic recurrent neural networks because they are not susceptible to vanishing gradients during the training procedure <xref ref-type="bibr" rid="bib1.bibx5 bib1.bibx21" id="paren.25"/>.
Since LSTMs process input sequences sequentially, longer time series result in longer training and inference time.
For daily predictions, this is not a problem, since look-back windows of 365 d appear to be sufficient for most basins, at least in the CAMELS dataset.
Therefore, our baseline for daily predictions is the LSTM model from <xref ref-type="bibr" rid="bib1.bibx30" id="text.26"/>, which was trained on daily data with a look-back of 365 d.</p>
      <p id="d1e787">For hourly data, even half a year corresponds to more than 4300 time steps, which results in very long training and inference runtime, as we will detail in Sect. <xref ref-type="sec" rid="Ch1.S3.SS3"/>.
In addition to the computational overhead, the LSTM forget gate makes it hard to learn long-term dependencies, because it effectively reintroduces vanishing gradients into the LSTM <xref ref-type="bibr" rid="bib1.bibx23" id="paren.27"/>.
Yet, we cannot simply remove the forget gate – both empirical LSTM analyses <xref ref-type="bibr" rid="bib1.bibx23 bib1.bibx18" id="paren.28"/> and our exploratory experiments (unpublished) showed that this deteriorates results.
To address this, <xref ref-type="bibr" rid="bib1.bibx15" id="text.29"/> proposed to initialize the bias of the forget gate to a small positive value (we used 3).
This starts training with an open gate and enables gradient flow across more time steps.</p>
      <p id="d1e801">We used this bias initialization trick for all our LSTM models, and it allowed us to include the LSTM with hourly inputs as the naive hourly baseline for our proposed models.
The architecture for this naive benchmark is identical to the daily LSTM, except that we ingested input sequences of 4320 h (180 d).
Further, we tuned the learning rate and batch size for the naive hourly LSTM, since it receives 24 times the amount of samples than the daily LSTM.
The extremely slow training impedes a more extensive hyperparameter search.
Appendix <xref ref-type="sec" rid="App1.Ch1.S4"/> details the grid of hyperparameters we evaluated to find a suitable configuration, as well as further details on the final hyperparameters.</p>
</sec>
</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Using LSTMs to predict multiple timescales</title>
      <p id="d1e815">We evaluated two different LSTM architectures that are capable of simultaneous predictions at multiple timescales.
For the sake of simplicity, the following explanations use the example of a two-timescale model that generates daily and hourly predictions.
Nevertheless, the architectures we describe here generalize to other timescales and to more than two timescales, as we will show in an experiment in Sect. <xref ref-type="sec" rid="Ch1.S3.SS5"/>.</p>
      <p id="d1e820">The first model, shared multi-timescale LSTM (sMTS-LSTM), is a simple extension of the naive approach.
Intuitively, it seems reasonable that the relationships that govern daily predictions hold, to some extent, for hourly predictions as well.
Hence, it may be possible to learn these dynamics in a single LSTM that processes the time series twice: once at a daily resolution and again at an hourly resolution.
Since we model a system where the resolution of long-past time steps is less important, we can simplify the second (hourly) pass by reusing the first part of the daily time series.
This way, we only need to use hourly inputs for the more recent time steps, which yields shorter time series that are easier to process.
From a more technical point of view, we first generate a daily prediction as usual – the LSTM ingests an input sequence of <inline-formula><mml:math id="M24" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mtext>D</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> time steps at daily resolution and outputs a prediction at the last time step (i.e., sequence-to-one prediction).
Next, we reset the hidden and cell states to their values from time step <inline-formula><mml:math id="M25" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">D</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub><mml:mo>/</mml:mo><mml:mn mathvariant="normal">24</mml:mn></mml:mrow></mml:math></inline-formula> and ingest the hourly input sequence of length <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to generate 24 predictions on an hourly timescale that correspond to the last daily prediction.
In other words, we reuse the initial daily time steps and use hourly inputs only for the remaining time steps.
To summarize this approach, we perform two forward passes through the same LSTM at each prediction step: one that generates a daily prediction and one<?pagebreak page2049?> that generates 24 corresponding hourly predictions.
Since the same LSTM processes input data at multiple timescales, the model needs a way to identify the current input timescale and distinguish daily from hourly inputs.
For this, we add a one-hot timescale encoding to the input sequence.
The key insight with this model is that the hourly forward pass starts with LSTM states from the daily forward pass, which act as a summary of long-term information up to that point.
In effect, the LSTM has access to a large look-back window but, unlike the naive hourly LSTM, it does not suffer from the performance impact of extremely long input sequences.</p>
      <p id="d1e867">The second architecture, illustrated in Fig. <xref ref-type="fig" rid="Ch1.F2"/>, is a more general variant of the sMTS-LSTM that is specifically built for multi-timescale predictions.
We call this a multi-timescale LSTM (MTS-LSTM).
The MTS-LSTM architecture stems from the idea that the daily and hourly predictions may behave differently enough that it may be challenging for one LSTM to learn dynamics at both timescales, as is required for the sMTS-LSTM.
Instead, we hypothesize that it may be easier to process inputs at different timescales using an individual LSTM per timescale.
The MTS-LSTM does not perform two forward passes (as sMTS-LSTM does). Instead, it uses the cell states and hidden states of a daily LSTM to initialize an hourly LSTM after some number of initial daily time steps (see Fig. <xref ref-type="fig" rid="Ch1.F2"/>).
More technically, we first generate a prediction with an LSTM acting at the coarsest timescale (here, daily) using a full input sequence of length <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">D</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> (e.g., 365 d).
Next, we reuse the daily hidden and cell states from  time step <inline-formula><mml:math id="M28" display="inline"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">D</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msub><mml:mo>/</mml:mo><mml:mn mathvariant="normal">24</mml:mn></mml:mrow></mml:math></inline-formula> as the initial states for an LSTM at a finer timescale (here, hourly), which generates the corresponding 24 hourly predictions.
Since the two LSTM branches may have different hidden sizes, we feed the states through a linear state transfer layer (<inline-formula><mml:math id="M29" display="inline"><mml:mrow><mml:msub><mml:mtext>FC</mml:mtext><mml:mtext>h</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M30" display="inline"><mml:mrow><mml:msub><mml:mtext>FC</mml:mtext><mml:mtext>c</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula>) before reusing them as initial hourly states.
In this setup, each LSTM branch only receives inputs of its respective timescale (i.e., separate weights and biases are learned for each input timescale), hence, we do not need to use a one-hot encoding to represent the timescale.</p>
      <p id="d1e930">Effectively, the sMTS-LSTM is an ablation of the MTS-LSTM.
The sMTS-LSTM is an MTS-LSTM where the different LSTM branches all share the same set of weights and states are transferred without any additional computation (i.e., the transfer layers <inline-formula><mml:math id="M31" display="inline"><mml:mrow><mml:msub><mml:mtext>FC</mml:mtext><mml:mtext>h</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M32" display="inline"><mml:mrow><mml:msub><mml:mtext>FC</mml:mtext><mml:mtext>c</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> are identity functions).
Conversely, the MTS-LSTM is a generalization of sMTS-LSTM.
Consider an MTS-LSTM that uses the same hidden size in all branches.
In theory, this model could learn to use identity matrices as transfer layers and to use equal weights in all LSTM branches.
Except for the one-hot encoding, this would make it an sMTS-LSTM.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2" specific-use="star"><?xmltex \currentcnt{2}?><?xmltex \def\figurename{Figure}?><label>Figure 2</label><caption><p id="d1e958">Illustration of the MTS-LSTM architecture that uses one distinct LSTM per timescale. In the depicted example, the daily and hourly input sequence lengths are <inline-formula><mml:math id="M33" display="inline"><mml:mrow><mml:msup><mml:mi>T</mml:mi><mml:mi mathvariant="normal">D</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">365</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:msup><mml:mi>T</mml:mi><mml:mtext>H</mml:mtext></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">72</mml:mn></mml:mrow></mml:math></inline-formula> (we chose this value for the sake of a tidy illustration; the benchmarked model uses <inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:msup><mml:mi>T</mml:mi><mml:mi mathvariant="normal">H</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:mn mathvariant="normal">336</mml:mn></mml:mrow></mml:math></inline-formula>). In the sMTS-LSTM model (i.e., without distinct LSTM branches), <inline-formula><mml:math id="M36" display="inline"><mml:mrow><mml:msub><mml:mtext>FC</mml:mtext><mml:mtext>c</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M37" display="inline"><mml:mrow><mml:msub><mml:mtext>FC</mml:mtext><mml:mtext>h</mml:mtext></mml:msub></mml:mrow></mml:math></inline-formula> are identity functions, and the two branches (including the fully connected output layers <inline-formula><mml:math id="M38" display="inline"><mml:mrow><mml:msup><mml:mtext>FC</mml:mtext><mml:mtext>H</mml:mtext></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M39" display="inline"><mml:mrow><mml:msup><mml:mtext>FC</mml:mtext><mml:mtext>D</mml:mtext></mml:msup></mml:mrow></mml:math></inline-formula>) share their model weights.</p></caption>
          <?xmltex \igopts{width=426.791339pt}?><graphic xlink:href="https://hess.copernicus.org/articles/25/2045/2021/hess-25-2045-2021-f02.png"/>

        </fig>

<sec id="Ch1.S2.SS3.SSS1">
  <label>2.3.1</label><title>Per-timescale input variables</title>
      <p id="d1e1064">An important advantage of the MTS-LSTM over the sMTS-LSTM architecture arises from its more flexible input dimensionality.
Since sMTS-LSTM processes each timescale with the same LSTM, the inputs at all timescales must have the same number of variables.
MTS-LSTM, on the other hand,  processes each timescale in an individual LSTM branch and can therefore ingest different input variables to predict the different timescales.
This can be a key differentiator in operational applications when, for instance, there exist daily weather forecasts with a much longer lead time than the available hourly forecasts, or when using remote sensing data that are available only at certain overpass frequencies.
Input products available at different time frequencies will generally have different numbers of variables, so sMTS-LSTM could only use the products that are available for the highest resolution (we can usually obtain all other resolutions by aggregation).
In contrast, MTS-LSTM can process daily forcings in its daily LSTM branch and hourly forcings in its hourly LSTM branch – regardless of the number of variables – since each branch can have a different input size.
As such, this per-timescale forcings strategy allows for using different inputs at different timescales.</p>
      <p id="d1e1067">To evaluate this capability, we used two sets of forcings as daily inputs: the Daymet and Maurer forcing sets that are included in the CAMELS dataset <xref ref-type="bibr" rid="bib1.bibx36" id="paren.30"/>.
For lack of other hourly forcing products, we conducted two experiments: in one, we continued to use only the hourly NLDAS forcings.
In the other, we ingested the same hourly NLDAS forcings as well as the corresponding day’s Daymet and Maurer forcings at each hour (we concatenated the daily values to the inputs at each hour of the day).
Since the Maurer forcings only range until 2008, we conducted this experiment on the validation period from October 2003 to September 2008.</p>
</sec>
<sec id="Ch1.S2.SS3.SSS2">
  <label>2.3.2</label><title>Cross-timescale consistency</title>
      <?pagebreak page2050?><p id="d1e1081">Since the MTS-LSTM and sMTS-LSTM architectures generate predictions at multiple timescales simultaneously, we can incentivize predictions that are consistent across timescales using loss regularization.
Unlike in other domains <xref ref-type="bibr" rid="bib1.bibx49" id="paren.31"><named-content content-type="pre">e.g., computer vision;</named-content></xref>, consistency is well-defined in our application: predictions are consistent if the mean of every day’s hourly predictions is the same as that day’s daily prediction.
Hence, we can explicitly state this constraint as a regularization term in our loss function.
Building on the basin-averaged NSE loss from <xref ref-type="bibr" rid="bib1.bibx29" id="text.32"/>, our loss function averages the losses from each individual timescale and regularizes with the mean squared difference between daily and day-averaged hourly predictions.
Note that although we describe the regularization with only two simultaneously predicted timescales, the approach generalizes to more timescales, as we can add the mean squared difference between each pair of timescale <inline-formula><mml:math id="M40" display="inline"><mml:mi mathvariant="italic">τ</mml:mi></mml:math></inline-formula> and next-finer timescale <inline-formula><mml:math id="M41" display="inline"><mml:mrow><mml:msup><mml:mi mathvariant="italic">τ</mml:mi><mml:mo>′</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula>.
All taken together, we used the following loss for a two-timescale daily–hourly model:
<?xmltex \hack{\newpage}?>
              <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M42" display="block"><mml:mtable class="split" columnspacing="1em" rowspacing="0.2ex" displaystyle="true" columnalign="right left"><mml:mtr><mml:mtd><mml:mrow><mml:msubsup><mml:mtext>NSE</mml:mtext><mml:mtext>reg</mml:mtext><mml:mrow><mml:mi mathvariant="normal">D</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">H</mml:mi></mml:mrow></mml:msubsup></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mo>=</mml:mo><mml:munder><mml:munder class="underbrace"><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">2</mml:mn></mml:mfrac></mml:mstyle><mml:munder><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi mathvariant="italic">τ</mml:mi><mml:mo>∈</mml:mo><mml:mo>{</mml:mo><mml:mi mathvariant="normal">D</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">H</mml:mi><mml:mo>}</mml:mo></mml:mrow></mml:munder><mml:mfenced close=")" open="("><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>B</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>b</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>B</mml:mi></mml:munderover><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>b</mml:mi><mml:mi mathvariant="italic">τ</mml:mi></mml:msubsup></mml:mrow></mml:munderover><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>t</mml:mi><mml:mi mathvariant="italic">τ</mml:mi></mml:msubsup><mml:mo>-</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mi>t</mml:mi><mml:mi mathvariant="italic">τ</mml:mi></mml:msubsup></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>b</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi mathvariant="italic">ϵ</mml:mi></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:mfenced></mml:mrow><mml:mo mathvariant="normal">︸</mml:mo></mml:munder><mml:mtext>per-timescale NSE</mml:mtext></mml:munder></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mo>+</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:munder><mml:munder class="underbrace"><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>B</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>b</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>B</mml:mi></mml:munderover><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>b</mml:mi><mml:mi mathvariant="normal">D</mml:mi></mml:msubsup></mml:mrow></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>b</mml:mi><mml:mi mathvariant="normal">D</mml:mi></mml:msubsup></mml:mrow></mml:munderover><mml:msup><mml:mfenced close=")" open="("><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>t</mml:mi><mml:mi mathvariant="normal">D</mml:mi></mml:msubsup><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">24</mml:mn></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>h</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mn mathvariant="normal">24</mml:mn></mml:munderover><mml:msubsup><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>h</mml:mi></mml:mrow><mml:mi mathvariant="normal">H</mml:mi></mml:msubsup></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mo mathvariant="normal">︸</mml:mo></mml:munder><mml:mtext>mean squared difference regularization</mml:mtext></mml:munder><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
            Here, <inline-formula><mml:math id="M43" display="inline"><mml:mi>B</mml:mi></mml:math></inline-formula> is the number of basins, <inline-formula><mml:math id="M44" display="inline"><mml:mrow><mml:msubsup><mml:mi>N</mml:mi><mml:mi>b</mml:mi><mml:mi mathvariant="italic">τ</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> is the number of samples for basin <inline-formula><mml:math id="M45" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> at timescale <inline-formula><mml:math id="M46" display="inline"><mml:mi mathvariant="italic">τ</mml:mi></mml:math></inline-formula>, and <inline-formula><mml:math id="M47" display="inline"><mml:mrow><mml:msubsup><mml:mi>y</mml:mi><mml:mi>t</mml:mi><mml:mi mathvariant="italic">τ</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M48" display="inline"><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>t</mml:mi><mml:mi mathvariant="italic">τ</mml:mi></mml:msubsup></mml:mrow></mml:math></inline-formula> are observed and predicted discharge values for the <inline-formula><mml:math id="M49" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula>th time step of basin <inline-formula><mml:math id="M50" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> at timescale <inline-formula><mml:math id="M51" display="inline"><mml:mi mathvariant="italic">τ</mml:mi></mml:math></inline-formula>. <inline-formula><mml:math id="M52" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>b</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the observed discharge variance of basin <inline-formula><mml:math id="M53" display="inline"><mml:mi>b</mml:mi></mml:math></inline-formula> over the whole training period and <inline-formula><mml:math id="M54" display="inline"><mml:mi mathvariant="italic">ϵ</mml:mi></mml:math></inline-formula> is a small value that guarantees numeric stability.
We predicted discharge in the same unit (<inline-formula><mml:math id="M55" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">h</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) at all timescales, so that the daily prediction is compared with an average across 24 h.</p>
</sec>
<sec id="Ch1.S2.SS3.SSS3">
  <label>2.3.3</label><title>Predicting more timescales</title>
      <p id="d1e1500">While all our experiments so far considered daily and hourly input and output data, the MTS-LSTM architecture generalizes to other timescales.
To demonstrate this, we used a setup similar to the operational NWM, albeit in a reanalysis setting: we trained an MTS-LSTM to predict the last 18 h (hourly), 10 d (three-hourly), and 30 d (six-hourly) of discharge – all within one MTS-LSTM.
Table <xref ref-type="table" rid="Ch1.T2"/> details the sizes of the input and output sequences for each timescale.
To achieve a sufficiently long look-back window without using exceedingly long input sequences, we additionally predicted one day of daily streamflow but did not evaluate these daily predictions.</p>
      <p id="d1e1505">Related to the selection of target timescales, a practical note: the state transfer from lower to higher timescales requires careful coordination of timescales and input sequence lengths.
To illustrate, consider a toy setup that predicts two- and three-hourly discharge.
Each timescale uses an input sequence length of 20 steps (i.e., 40 and 60 h, respectively).
The initial two-hourly LSTM state should then be the <inline-formula><mml:math id="M56" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mn mathvariant="normal">60</mml:mn><mml:mo>-</mml:mo><mml:mn mathvariant="normal">40</mml:mn><mml:mo>)</mml:mo><mml:mo>/</mml:mo><mml:mn mathvariant="normal">3</mml:mn><mml:mo>=</mml:mo><mml:mn mathvariant="normal">6.67</mml:mn><mml:mtext>th</mml:mtext></mml:mrow></mml:math></inline-formula> three-hourly state – in other words, the step widths are asynchronous at the point in time where the LSTM branches split.
Of course, one could simply select the 6th or 7th step, but then either 2 h remain unprocessed or 1 h is processed twice (once in the three- and once in the two-hourly LSTM branch).
Instead, we suggest selecting sequence lengths such that the LSTM splits at points where the timescales’ steps align.
In the above example, sequence lengths of 20 (three-hourly) and 21 (two-hourly) would align correctly: <inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mn mathvariant="normal">60</mml:mn><mml:mo>-</mml:mo><mml:mn mathvariant="normal">42</mml:mn><mml:mo>)</mml:mo><mml:mo>/</mml:mo><mml:mn mathvariant="normal">3</mml:mn><mml:mo>=</mml:mo><mml:mn mathvariant="normal">6</mml:mn></mml:mrow></mml:math></inline-formula>.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T2"><?xmltex \currentcnt{2}?><label>Table 2</label><caption><p id="d1e1561">Input sequence length and prediction window for each predicted timescale. The daily input merely acts as a means to extend the look-back window to a full year without generating overly long input sequences; we did not evaluate the daily predictions.
The specific input sequence lengths were chosen somewhat arbitrarily, as our intent was to demonstrate the capability rather than to address a particular application or operational challenge.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Timescale</oasis:entry>
         <oasis:entry colname="col2">Input sequence length</oasis:entry>
         <oasis:entry colname="col3">Prediction window</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Hourly</oasis:entry>
         <oasis:entry colname="col2">168 h (<inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:mo>=</mml:mo><mml:mn mathvariant="normal">7</mml:mn></mml:mrow></mml:math></inline-formula> d)</oasis:entry>
         <oasis:entry colname="col3">18 h</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Three-hourly</oasis:entry>
         <oasis:entry colname="col2">168 steps (<inline-formula><mml:math id="M59" display="inline"><mml:mrow><mml:mo>=</mml:mo><mml:mn mathvariant="normal">21</mml:mn></mml:mrow></mml:math></inline-formula> d)</oasis:entry>
         <oasis:entry colname="col3">80 steps (<inline-formula><mml:math id="M60" display="inline"><mml:mrow><mml:mo>=</mml:mo><mml:mn mathvariant="normal">10</mml:mn></mml:mrow></mml:math></inline-formula> d)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Six-hourly</oasis:entry>
         <oasis:entry colname="col2">360 steps (<inline-formula><mml:math id="M61" display="inline"><mml:mrow><mml:mo>=</mml:mo><mml:mn mathvariant="normal">90</mml:mn></mml:mrow></mml:math></inline-formula> d)</oasis:entry>
         <oasis:entry colname="col3">120 steps (<inline-formula><mml:math id="M62" display="inline"><mml:mrow><mml:mo>=</mml:mo><mml:mn mathvariant="normal">30</mml:mn></mml:mrow></mml:math></inline-formula> d)</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Daily</oasis:entry>
         <oasis:entry colname="col2">365 d</oasis:entry>
         <oasis:entry colname="col3">1 d</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
</sec>
<sec id="Ch1.S2.SS4">
  <label>2.4</label><title>Evaluation criteria</title>
      <p id="d1e1700">Following previous studies that used the CAMELS datasets <xref ref-type="bibr" rid="bib1.bibx30 bib1.bibx3" id="paren.33"/>, we benchmarked all models with respect to the metrics and signatures listed in Table <xref ref-type="table" rid="Ch1.T3"/>.
Hydrologic signatures are statistics of hydrographs, and thus not natively a comparison between predicted and<?pagebreak page2051?> observed values.
For these values, we calculated the Pearson correlation coefficient between the signatures of observed and predicted discharge across the 516 CAMELS basins used in this study.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T3" specific-use="star"><?xmltex \currentcnt{3}?><label>Table 3</label><caption><p id="d1e1711">Evaluation metrics (upper table section) and hydrologic signatures (lower table section) used in this study. For each signature, we calculated the Pearson correlation between the signatures of observed and predicted discharge across all basins. Descriptions of the signatures are taken from <xref ref-type="bibr" rid="bib1.bibx3" id="text.34"/>.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="justify" colwidth="2cm"/>
     <oasis:colspec colnum="2" colname="col2" align="justify" colwidth="8cm"/>
     <oasis:colspec colnum="3" colname="col3" align="justify" colwidth="5cm"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Metric/signature</oasis:entry>
         <oasis:entry colname="col2">Description</oasis:entry>
         <oasis:entry colname="col3">Reference</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">NSE</oasis:entry>
         <oasis:entry colname="col2">Nash–Sutcliffe efficiency</oasis:entry>
         <oasis:entry colname="col3">Eq. (3) in <xref ref-type="bibr" rid="bib1.bibx34" id="text.35"/></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">KGE</oasis:entry>
         <oasis:entry colname="col2">Kling–Gupta efficiency</oasis:entry>
         <oasis:entry colname="col3">Eq. (9) in <xref ref-type="bibr" rid="bib1.bibx19" id="text.36"/></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Pearson <inline-formula><mml:math id="M63" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">Pearson correlation between observed and simulated flow</oasis:entry>
         <oasis:entry colname="col3"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M64" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-NSE</oasis:entry>
         <oasis:entry colname="col2">Ratio of standard deviations of observed and simulated flow</oasis:entry>
         <oasis:entry colname="col3">From Eq. (4) in  <xref ref-type="bibr" rid="bib1.bibx19" id="text.37"/></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M65" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula>-NSE</oasis:entry>
         <oasis:entry colname="col2">Ratio of the means of observed and simulated flow</oasis:entry>
         <oasis:entry colname="col3">From Eq. (10) in <xref ref-type="bibr" rid="bib1.bibx19" id="text.38"/></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">FHV</oasis:entry>
         <oasis:entry colname="col2">Top 2 % peak flow bias</oasis:entry>
         <oasis:entry colname="col3">Eq. (A3) in <xref ref-type="bibr" rid="bib1.bibx48" id="text.39"/></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">FLV</oasis:entry>
         <oasis:entry colname="col2">Bottom 30 % low flow bias</oasis:entry>
         <oasis:entry colname="col3">Eq. (A4) in <xref ref-type="bibr" rid="bib1.bibx48" id="text.40"/></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">FMS</oasis:entry>
         <oasis:entry colname="col2">Bias of the slope of the flow duration curve between the 20 % and 80 % percentile</oasis:entry>
         <oasis:entry colname="col3">Eq. (A2) <xref ref-type="bibr" rid="bib1.bibx48" id="text.41"/></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Peak timing</oasis:entry>
         <oasis:entry colname="col2">Mean time lag between observed and simulated peaks</oasis:entry>
         <oasis:entry colname="col3">See Appendix <xref ref-type="sec" rid="App1.Ch1.S1"/></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Baseflow index</oasis:entry>
         <oasis:entry colname="col2">Ratio of mean daily/hourly baseflow to mean daily/hourly discharge</oasis:entry>
         <oasis:entry colname="col3">
                      <xref ref-type="bibr" rid="bib1.bibx31" id="text.42"/>
                    </oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">HFD mean</oasis:entry>
         <oasis:entry colname="col2">Mean half-flow date (date on which the cumulative discharge since October first reaches half of the annual discharge)</oasis:entry>
         <oasis:entry colname="col3">
                      <xref ref-type="bibr" rid="bib1.bibx9" id="text.43"/>
                    </oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">High-flow dur.</oasis:entry>
         <oasis:entry colname="col2">Average duration of high-flow events (number of consecutive steps <inline-formula><mml:math id="M66" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">9</mml:mn></mml:mrow></mml:math></inline-formula> times the median daily/hourly flow)</oasis:entry>
         <oasis:entry colname="col3"><xref ref-type="bibr" rid="bib1.bibx7" id="text.44"/>, Table 2 in <xref ref-type="bibr" rid="bib1.bibx46" id="text.45"/></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">High-flow freq.</oasis:entry>
         <oasis:entry colname="col2">Frequency of high-flow days/hours (<inline-formula><mml:math id="M67" display="inline"><mml:mrow><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">9</mml:mn></mml:mrow></mml:math></inline-formula> times the median daily/hourly flow)</oasis:entry>
         <oasis:entry colname="col3"><xref ref-type="bibr" rid="bib1.bibx7" id="text.46"/>, Table 2 in <xref ref-type="bibr" rid="bib1.bibx46" id="text.47"/></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Low-flow dur.</oasis:entry>
         <oasis:entry colname="col2">Average duration of low-flow events (number of consecutive days/hours <inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.2</mml:mn></mml:mrow></mml:math></inline-formula> times the mean flow)</oasis:entry>
         <oasis:entry colname="col3"><xref ref-type="bibr" rid="bib1.bibx40" id="text.48"/>, Table 2 in <xref ref-type="bibr" rid="bib1.bibx46" id="text.49"/></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Low-flow freq.</oasis:entry>
         <oasis:entry colname="col2">Frequency of low-flow days/hours (<inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.2</mml:mn></mml:mrow></mml:math></inline-formula> times the mean daily/hourly flow)</oasis:entry>
         <oasis:entry colname="col3"><xref ref-type="bibr" rid="bib1.bibx40" id="text.50"/>, Table 2 in <xref ref-type="bibr" rid="bib1.bibx46" id="text.51"/></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:mi>Q</mml:mi><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">5 % flow quantile (low flow)</oasis:entry>
         <oasis:entry colname="col3"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M71" display="inline"><mml:mrow><mml:mi>Q</mml:mi><mml:mn mathvariant="normal">95</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">95 % flow quantile (high flow)</oasis:entry>
         <oasis:entry colname="col3"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M72" display="inline"><mml:mi>Q</mml:mi></mml:math></inline-formula> mean</oasis:entry>
         <oasis:entry colname="col2">Mean daily/hourly discharge</oasis:entry>
         <oasis:entry colname="col3"/>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Runoff ratio</oasis:entry>
         <oasis:entry colname="col2">Runoff ratio (ratio of mean daily/hourly discharge to mean daily/hourly precipitation, using NLDAS precipitation)</oasis:entry>
         <oasis:entry colname="col3">Eq. (2) in <xref ref-type="bibr" rid="bib1.bibx43" id="text.52"/></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Slope FDC</oasis:entry>
         <oasis:entry colname="col2">Slope of the flow duration curve (between the log-transformed 33rd and 66th streamflow percentiles)</oasis:entry>
         <oasis:entry colname="col3">Eq. (3) in <xref ref-type="bibr" rid="bib1.bibx43" id="text.53"/></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Stream <?xmltex \hack{\hfill\break}?>elasticity</oasis:entry>
         <oasis:entry colname="col2">Streamflow precipitation elasticity (sensitivity of streamflow to changes in precipitation at the annual timescale, using NLDAS precipitation)</oasis:entry>
         <oasis:entry colname="col3">Eq. (7) in <xref ref-type="bibr" rid="bib1.bibx42" id="text.54"/></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Zero flow freq.</oasis:entry>
         <oasis:entry colname="col2">Frequency of days/hours with zero discharge</oasis:entry>
         <oasis:entry colname="col3"/>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e2133">To quantify the cross-timescale consistency of the different models, we calculated the root mean squared deviation between daily predictions and hourly predictions when aggregated to daily values:
            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M73" display="block"><mml:mrow><mml:msup><mml:mtext>MSD</mml:mtext><mml:mrow><mml:mi mathvariant="normal">D</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="normal">H</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msqrt><mml:mrow><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>T</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:msup><mml:mfenced open="(" close=")"><mml:mrow><mml:msubsup><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mi>t</mml:mi><mml:mi mathvariant="normal">D</mml:mi></mml:msubsup><mml:mo>-</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mn mathvariant="normal">24</mml:mn></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>h</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mn mathvariant="normal">24</mml:mn></mml:munderover><mml:msubsup><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>h</mml:mi></mml:mrow><mml:mtext>H</mml:mtext></mml:msubsup></mml:mrow></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt><mml:mspace width="0.125em" linebreak="nobreak"/><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p id="d1e2227">To account for randomness in LSTM weight initialization, all the LSTM results reported were calculated on hydrographs that result from averaging the predictions of 10 independently trained LSTM models.
Each independently trained model used the same hyperparameters (bold values in Table <xref ref-type="table" rid="App1.Ch1.S4.T9"/>), but a different random seed.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Results</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Benchmarking</title>
      <p id="d1e2248">Table <xref ref-type="table" rid="Ch1.T4"/> compares the median test period evaluation metrics of the MTS-LSTM and sMTS-LSTM architectures with the benchmark naive LSTM models and the process-based NWM.
Figure <xref ref-type="fig" rid="Ch1.F3"/> illustrates the cumulative distributions of per-basin NSE values of the different models.
By this metric, all LSTM models, even the naive ones, outperformed the NWM at both hourly and daily time steps.
All LSTM-based models performed slightly worse on hourly predictions than on daily predictions, an effect that was far more pronounced for the NWM.
Accordingly, the results in Table <xref ref-type="table" rid="Ch1.T4"/> list differences in median NSE values between the NWM and the LSTM models ranging from <inline-formula><mml:math id="M74" display="inline"><mml:mn mathvariant="normal">0.11</mml:mn></mml:math></inline-formula> to <inline-formula><mml:math id="M75" display="inline"><mml:mn mathvariant="normal">0.16</mml:mn></mml:math></inline-formula> (daily) and around <inline-formula><mml:math id="M76" display="inline"><mml:mn mathvariant="normal">0.19</mml:mn></mml:math></inline-formula> (hourly).
Among the LSTM models, differences between all metrics were comparatively small.
The sMTS-LSTM achieved the best daily and hourly median NSE at both timescales.
The naive models produced NSE values that were slightly lower, but the difference was not statistically significant at <inline-formula><mml:math id="M77" display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.001</mml:mn></mml:mrow></mml:math></inline-formula> (Wilcoxon signed-rank test: <inline-formula><mml:math id="M78" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">2.7</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> (hourly), <inline-formula><mml:math id="M79" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">8.7</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> (daily)).
The NSE distribution of the MTS-LSTM was significantly different from the sMTS-LSTM (<inline-formula><mml:math id="M80" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">3.7</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">19</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> (hourly), <inline-formula><mml:math id="M81" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">3.2</mml:mn><mml:mo>×</mml:mo><mml:msup><mml:mn mathvariant="normal">10</mml:mn><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">29</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> (daily)), but the absolute difference was small.
We leave it to the judgment of the reader to decide whether or not this difference is negligible from a hydrologic perspective.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3" specific-use="star"><?xmltex \currentcnt{3}?><?xmltex \def\figurename{Figure}?><label>Figure 3</label><caption><p id="d1e2381">Cumulative NSE distributions of the different models. Panel <bold>(a)</bold> (solid lines) shows NSEs of daily predictions, panel <bold>(b)</bold> (dashed lines) shows NSEs of hourly predictions, and panel <bold>(c)</bold> combines both plots into one for comparison.</p></caption>
          <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://hess.copernicus.org/articles/25/2045/2021/hess-25-2045-2021-f03.png"/>

        </fig>

      <p id="d1e2399">All LSTM models exhibited lower peak-timing errors than the NWM.
For hourly predictions, the median peak-timing error of the sMTS-LSTM was around 3.5 h, compared to more than 6 h for the NWM.
Peak-timing errors for daily predictions are smaller values, since the error is measured in days instead of hours.
The sMTS-LSTM yielded a median peak-timing error of <inline-formula><mml:math id="M82" display="inline"><mml:mn mathvariant="normal">0.3</mml:mn></mml:math></inline-formula> d, versus <inline-formula><mml:math id="M83" display="inline"><mml:mn mathvariant="normal">0.5</mml:mn></mml:math></inline-formula> d for the NWM.
The process-based NWM, in turn, often produced results with better flow bias metrics, especially with respect to low flows (FLV).
This agrees with prior work that indicates potential for improvement to deep learning models in arid climates <xref ref-type="bibr" rid="bib1.bibx30 bib1.bibx10" id="paren.55"/>.
Similarly, the lower section of Table <xref ref-type="table" rid="Ch1.T4"/> lists correlations between hydrologic signatures of observed and predicted discharge: the NWM had the highest correlations for high-, low-, and zero-flow frequencies at both hourly and daily timescales, as well as for flow duration curve slopes at the hourly timescale.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T4" specific-use="star"><?xmltex \currentcnt{4}?><label>Table 4</label><caption><p id="d1e2425">Median metrics (upper table section) and Pearson correlation between signatures of observed and predicted discharge (lower table section) across all 516 basins for the sMTS-LSTM, MTS-LSTM, naive daily and hourly LSTMs, and the process-based NWM. Bold values highlight results that are not significantly different from the best model in the respective metric or signature (<inline-formula><mml:math id="M84" display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.001</mml:mn></mml:mrow></mml:math></inline-formula>). See Table <xref ref-type="table" rid="Ch1.T3"/> for a description of the metrics and signatures.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="9">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right" colsep="1"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:colspec colnum="9" colname="col9" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry rowsep="1" namest="col2" nameend="col5" align="center" colsep="1">Daily </oasis:entry>
         <oasis:entry rowsep="1" namest="col6" nameend="col9" align="center">Hourly </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">sMTS-LSTM</oasis:entry>
         <oasis:entry colname="col3">MTS-LSTM</oasis:entry>
         <oasis:entry colname="col4">Naive</oasis:entry>
         <oasis:entry colname="col5">NWM</oasis:entry>
         <oasis:entry colname="col6">sMTS-LSTM</oasis:entry>
         <oasis:entry colname="col7">MTS-LSTM</oasis:entry>
         <oasis:entry colname="col8">Naive</oasis:entry>
         <oasis:entry colname="col9">NWM</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">NSE</oasis:entry>
         <oasis:entry colname="col2"><bold>0.762</bold></oasis:entry>
         <oasis:entry colname="col3">0.750</oasis:entry>
         <oasis:entry colname="col4"><bold>0.755</bold></oasis:entry>
         <oasis:entry colname="col5">0.636</oasis:entry>
         <oasis:entry colname="col6"><bold>0.752</bold></oasis:entry>
         <oasis:entry colname="col7">0.748</oasis:entry>
         <oasis:entry colname="col8"><bold>0.751</bold></oasis:entry>
         <oasis:entry colname="col9">0.559</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">MSE</oasis:entry>
         <oasis:entry colname="col2"><bold>0.002</bold></oasis:entry>
         <oasis:entry colname="col3">0.002</oasis:entry>
         <oasis:entry colname="col4"><bold>0.002</bold></oasis:entry>
         <oasis:entry colname="col5">0.004</oasis:entry>
         <oasis:entry colname="col6"><bold>0.003</bold></oasis:entry>
         <oasis:entry colname="col7">0.003</oasis:entry>
         <oasis:entry colname="col8"><bold>0.003</bold></oasis:entry>
         <oasis:entry colname="col9">0.005</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">RMSE</oasis:entry>
         <oasis:entry colname="col2"><bold>0.048</bold></oasis:entry>
         <oasis:entry colname="col3">0.049</oasis:entry>
         <oasis:entry colname="col4"><bold>0.048</bold></oasis:entry>
         <oasis:entry colname="col5">0.059</oasis:entry>
         <oasis:entry colname="col6"><bold>0.054</bold></oasis:entry>
         <oasis:entry colname="col7">0.055</oasis:entry>
         <oasis:entry colname="col8"><bold>0.054</bold></oasis:entry>
         <oasis:entry colname="col9">0.071</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">KGE</oasis:entry>
         <oasis:entry colname="col2">0.727</oasis:entry>
         <oasis:entry colname="col3">0.714</oasis:entry>
         <oasis:entry colname="col4"><bold>0.760</bold></oasis:entry>
         <oasis:entry colname="col5">0.666</oasis:entry>
         <oasis:entry colname="col6"><bold>0.731</bold></oasis:entry>
         <oasis:entry colname="col7">0.726</oasis:entry>
         <oasis:entry colname="col8"><bold>0.739</bold></oasis:entry>
         <oasis:entry colname="col9">0.638</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M85" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-NSE</oasis:entry>
         <oasis:entry colname="col2">0.819</oasis:entry>
         <oasis:entry colname="col3">0.813</oasis:entry>
         <oasis:entry colname="col4"><bold>0.873</bold></oasis:entry>
         <oasis:entry colname="col5"><bold>0.847</bold></oasis:entry>
         <oasis:entry colname="col6"><bold>0.828</bold></oasis:entry>
         <oasis:entry colname="col7"><bold>0.825</bold></oasis:entry>
         <oasis:entry colname="col8"><bold>0.837</bold></oasis:entry>
         <oasis:entry colname="col9"><bold>0.846</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Pearson <inline-formula><mml:math id="M86" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2"><bold>0.891</bold></oasis:entry>
         <oasis:entry colname="col3">0.882</oasis:entry>
         <oasis:entry colname="col4">0.885</oasis:entry>
         <oasis:entry colname="col5">0.821</oasis:entry>
         <oasis:entry colname="col6"><bold>0.885</bold></oasis:entry>
         <oasis:entry colname="col7">0.882</oasis:entry>
         <oasis:entry colname="col8">0.882</oasis:entry>
         <oasis:entry colname="col9">0.779</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M87" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula>-NSE</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M88" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>0.055</bold></oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M89" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>0.043</bold></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M90" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>0.042</bold></oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M91" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>0.038</bold></oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M92" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>0.045</bold></oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M93" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>0.039</bold></oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M94" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>0.039</bold></oasis:entry>
         <oasis:entry colname="col9"><inline-formula><mml:math id="M95" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>0.034</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">FHV</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M96" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>17.656</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M97" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>17.834</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M98" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>13.336</bold></oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M99" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>15.053</bold></oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M100" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>16.296</bold></oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M101" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>16.115</bold></oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M102" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>14.467</bold></oasis:entry>
         <oasis:entry colname="col9"><inline-formula><mml:math id="M103" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>14.174</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">FMS</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M104" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>9.025</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M105" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>13.421</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M106" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>10.273</oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M107" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>5.099</bold></oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M108" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>9.274</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M109" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>12.772</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M110" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>8.896</oasis:entry>
         <oasis:entry colname="col9"><inline-formula><mml:math id="M111" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>5.264</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">FLV</oasis:entry>
         <oasis:entry colname="col2"><bold>9.617</bold></oasis:entry>
         <oasis:entry colname="col3"><bold>9.730</bold></oasis:entry>
         <oasis:entry colname="col4"><bold>12.195</bold></oasis:entry>
         <oasis:entry colname="col5"><bold>0.775</bold></oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M112" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>35.214</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M113" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>35.354</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M114" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>35.097</oasis:entry>
         <oasis:entry colname="col9"><bold>6.315</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Peak timing</oasis:entry>
         <oasis:entry colname="col2"><bold>0.306</bold></oasis:entry>
         <oasis:entry colname="col3">0.333</oasis:entry>
         <oasis:entry colname="col4"><bold>0.310</bold></oasis:entry>
         <oasis:entry colname="col5">0.474</oasis:entry>
         <oasis:entry colname="col6"><bold>3.540</bold></oasis:entry>
         <oasis:entry colname="col7">3.757</oasis:entry>
         <oasis:entry colname="col8">3.754</oasis:entry>
         <oasis:entry colname="col9">5.957</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">NSE (mean)</oasis:entry>
         <oasis:entry colname="col2">0.662</oasis:entry>
         <oasis:entry colname="col3">0.602</oasis:entry>
         <oasis:entry colname="col4">0.631</oasis:entry>
         <oasis:entry colname="col5">0.471</oasis:entry>
         <oasis:entry colname="col6">0.652</oasis:entry>
         <oasis:entry colname="col7">0.620</oasis:entry>
         <oasis:entry colname="col8">0.644</oasis:entry>
         <oasis:entry colname="col9">0.364</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Number of NSEs <inline-formula><mml:math id="M115" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">10</oasis:entry>
         <oasis:entry colname="col3">12</oasis:entry>
         <oasis:entry colname="col4">18</oasis:entry>
         <oasis:entry colname="col5">37</oasis:entry>
         <oasis:entry colname="col6">13</oasis:entry>
         <oasis:entry colname="col7">17</oasis:entry>
         <oasis:entry colname="col8">13</oasis:entry>
         <oasis:entry colname="col9">46</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">High-flow freq.</oasis:entry>
         <oasis:entry colname="col2">0.599</oasis:entry>
         <oasis:entry colname="col3">0.486</oasis:entry>
         <oasis:entry colname="col4">0.598</oasis:entry>
         <oasis:entry colname="col5"><bold>0.730</bold></oasis:entry>
         <oasis:entry colname="col6">0.588</oasis:entry>
         <oasis:entry colname="col7">0.537</oasis:entry>
         <oasis:entry colname="col8">0.619</oasis:entry>
         <oasis:entry colname="col9"><bold>0.719</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">High-flow dur.</oasis:entry>
         <oasis:entry colname="col2"><bold>0.512</bold></oasis:entry>
         <oasis:entry colname="col3">0.463</oasis:entry>
         <oasis:entry colname="col4"><bold>0.491</bold></oasis:entry>
         <oasis:entry colname="col5">0.457</oasis:entry>
         <oasis:entry colname="col6">0.433</oasis:entry>
         <oasis:entry colname="col7">0.416</oasis:entry>
         <oasis:entry colname="col8"><bold>0.471</bold></oasis:entry>
         <oasis:entry colname="col9"><bold>0.316</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Low-flow freq.</oasis:entry>
         <oasis:entry colname="col2">0.774</oasis:entry>
         <oasis:entry colname="col3">0.657</oasis:entry>
         <oasis:entry colname="col4">0.774</oasis:entry>
         <oasis:entry colname="col5"><bold>0.796</bold></oasis:entry>
         <oasis:entry colname="col6">0.764</oasis:entry>
         <oasis:entry colname="col7">0.697</oasis:entry>
         <oasis:entry colname="col8">0.782</oasis:entry>
         <oasis:entry colname="col9"><bold>0.789</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Low-flow dur.</oasis:entry>
         <oasis:entry colname="col2"><bold>0.316</bold></oasis:entry>
         <oasis:entry colname="col3">0.285</oasis:entry>
         <oasis:entry colname="col4">0.280</oasis:entry>
         <oasis:entry colname="col5">0.303</oasis:entry>
         <oasis:entry colname="col6"><bold>0.309</bold></oasis:entry>
         <oasis:entry colname="col7">0.274</oasis:entry>
         <oasis:entry colname="col8">0.307</oasis:entry>
         <oasis:entry colname="col9">0.160</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Zero-flow freq.</oasis:entry>
         <oasis:entry colname="col2"><bold>0.392</bold></oasis:entry>
         <oasis:entry colname="col3"><bold>0.286</bold></oasis:entry>
         <oasis:entry colname="col4"><bold>0.409</bold></oasis:entry>
         <oasis:entry colname="col5"><bold>0.502</bold></oasis:entry>
         <oasis:entry colname="col6"><bold>0.363</bold></oasis:entry>
         <oasis:entry colname="col7"><bold>0.401</bold></oasis:entry>
         <oasis:entry colname="col8">0.493</oasis:entry>
         <oasis:entry colname="col9"><bold>0.505</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M116" display="inline"><mml:mrow><mml:mi>Q</mml:mi><mml:mn mathvariant="normal">95</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">0.979</oasis:entry>
         <oasis:entry colname="col3">0.978</oasis:entry>
         <oasis:entry colname="col4"><bold>0.980</bold></oasis:entry>
         <oasis:entry colname="col5"><bold>0.956</bold></oasis:entry>
         <oasis:entry colname="col6"><bold>0.980</bold></oasis:entry>
         <oasis:entry colname="col7"><bold>0.979</bold></oasis:entry>
         <oasis:entry colname="col8">0.979</oasis:entry>
         <oasis:entry colname="col9"><bold>0.956</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M117" display="inline"><mml:mrow><mml:mi>Q</mml:mi><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2"><bold>0.970</bold></oasis:entry>
         <oasis:entry colname="col3">0.945</oasis:entry>
         <oasis:entry colname="col4"><bold>0.979</bold></oasis:entry>
         <oasis:entry colname="col5">0.928</oasis:entry>
         <oasis:entry colname="col6"><bold>0.968</bold></oasis:entry>
         <oasis:entry colname="col7">0.955</oasis:entry>
         <oasis:entry colname="col8"><bold>0.964</bold></oasis:entry>
         <oasis:entry colname="col9">0.927</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M118" display="inline"><mml:mi>Q</mml:mi></mml:math></inline-formula> mean</oasis:entry>
         <oasis:entry colname="col2">0.985</oasis:entry>
         <oasis:entry colname="col3">0.984</oasis:entry>
         <oasis:entry colname="col4"><bold>0.986</bold></oasis:entry>
         <oasis:entry colname="col5"><bold>0.972</bold></oasis:entry>
         <oasis:entry colname="col6"><bold>0.984</bold></oasis:entry>
         <oasis:entry colname="col7">0.983</oasis:entry>
         <oasis:entry colname="col8">0.983</oasis:entry>
         <oasis:entry colname="col9">0.970</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">HFD mean</oasis:entry>
         <oasis:entry colname="col2">0.930</oasis:entry>
         <oasis:entry colname="col3"><bold>0.943</bold></oasis:entry>
         <oasis:entry colname="col4"><bold>0.945</bold></oasis:entry>
         <oasis:entry colname="col5">0.908</oasis:entry>
         <oasis:entry colname="col6"><bold>0.944</bold></oasis:entry>
         <oasis:entry colname="col7"><bold>0.948</bold></oasis:entry>
         <oasis:entry colname="col8">0.941</oasis:entry>
         <oasis:entry colname="col9">0.907</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Slope FDC</oasis:entry>
         <oasis:entry colname="col2">0.556</oasis:entry>
         <oasis:entry colname="col3">0.430</oasis:entry>
         <oasis:entry colname="col4"><bold>0.679</bold></oasis:entry>
         <oasis:entry colname="col5">0.663</oasis:entry>
         <oasis:entry colname="col6">0.635</oasis:entry>
         <oasis:entry colname="col7">0.633</oasis:entry>
         <oasis:entry colname="col8">0.647</oasis:entry>
         <oasis:entry colname="col9"><bold>0.712</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Stream elasticity</oasis:entry>
         <oasis:entry colname="col2"><bold>0.601</bold></oasis:entry>
         <oasis:entry colname="col3">0.560</oasis:entry>
         <oasis:entry colname="col4"><bold>0.615</bold></oasis:entry>
         <oasis:entry colname="col5">0.537</oasis:entry>
         <oasis:entry colname="col6"><bold>0.601</bold></oasis:entry>
         <oasis:entry colname="col7"><bold>0.563</bold></oasis:entry>
         <oasis:entry colname="col8"><bold>0.626</bold></oasis:entry>
         <oasis:entry colname="col9">0.588</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Runoff ratio</oasis:entry>
         <oasis:entry colname="col2">0.960</oasis:entry>
         <oasis:entry colname="col3"><bold>0.957</bold></oasis:entry>
         <oasis:entry colname="col4"><bold>0.962</bold></oasis:entry>
         <oasis:entry colname="col5"><bold>0.924</bold></oasis:entry>
         <oasis:entry colname="col6"><bold>0.955</bold></oasis:entry>
         <oasis:entry colname="col7">0.954</oasis:entry>
         <oasis:entry colname="col8">0.952</oasis:entry>
         <oasis:entry colname="col9"><bold>0.918</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Baseflow index</oasis:entry>
         <oasis:entry colname="col2"><bold>0.897</bold></oasis:entry>
         <oasis:entry colname="col3">0.818</oasis:entry>
         <oasis:entry colname="col4"><bold>0.904</bold></oasis:entry>
         <oasis:entry colname="col5">0.865</oasis:entry>
         <oasis:entry colname="col6"><bold>0.935</bold></oasis:entry>
         <oasis:entry colname="col7">0.908</oasis:entry>
         <oasis:entry colname="col8"><bold>0.932</bold></oasis:entry>
         <oasis:entry colname="col9">0.869</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e3631">Figure <xref ref-type="fig" rid="Ch1.F4"/> visualizes the distributions of NSE and peak-timing error across space for the hourly predictions with the sMTS-LSTM.
As in previous studies, the NSE values are lowest in arid basins of the Great Plains and southwestern United States.
The peak-timing error shows similar spatial patterns; however, the hourly peak-timing error also shows particularly high values along the southeastern coastline.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4" specific-use="star"><?xmltex \currentcnt{4}?><?xmltex \def\figurename{Figure}?><label>Figure 4</label><caption><p id="d1e3638">NSE and peak-timing error by basin for daily and hourly sMTS-LSTM predictions. Brighter colors correspond to better values. Note the different color scales for daily and hourly peak-timing error.</p></caption>
          <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://hess.copernicus.org/articles/25/2045/2021/hess-25-2045-2021-f04.png"/>

        </fig>

</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Cross-timescale consistency</title>
      <p id="d1e3655">Since the (non-naive) LSTM-based models jointly predict discharge at multiple timescales, we can incentivize predictions that are consistent across timescales.
As described in Sect. <xref ref-type="sec" rid="Ch1.S2.SS3.SSS2"/>, this happens through a regularized NSE loss function that penalizes inconsistencies.</p>
      <p id="d1e3660">To gauge the effectiveness of this regularization, we compared inconsistencies between timescales in the best benchmark model, the sMTS-LSTM, with and without regularization.
As a baseline, we also compared against the cross-timescale inconsistencies from two independent naive LSTMs.
Table <xref ref-type="table" rid="Ch1.T5"/> lists the mean, median, and maximum root mean squared deviation between the daily predictions and the hourly predictions when aggregated to daily values.
Without regularization, simultaneous prediction with the sMTS-LSTM yielded smaller inconsistencies than the naive approach (i.e., separate LSTMs at each timescale).
Cross-timescale regularization further reduced inconsistencies and results in a median root mean squared deviation of <inline-formula><mml:math id="M119" display="inline"><mml:mn mathvariant="normal">0.376</mml:mn></mml:math></inline-formula> <inline-formula><mml:math id="M120" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>.</p>
      <?pagebreak page2052?><p id="d1e3689">Besides reducing inconsistencies, the regularization term appeared to have a small but beneficial influence on the overall skill of the daily predictions: with regularization, the median NSE increased slightly from <inline-formula><mml:math id="M121" display="inline"><mml:mn mathvariant="normal">0.755</mml:mn></mml:math></inline-formula> to <inline-formula><mml:math id="M122" display="inline"><mml:mn mathvariant="normal">0.762</mml:mn></mml:math></inline-formula>.
Judging from the hyperparameter tuning results, this appears to be a systematic improvement (rather than a fluke), because for both sMTS-LSTM and MTS-LSTM, at least the three best hyperparameter configurations used regularization.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T5" specific-use="star"><?xmltex \currentcnt{5}?><label>Table 5</label><caption><p id="d1e3710">Median and maximum root mean squared deviation (<inline-formula><mml:math id="M123" display="inline"><mml:mrow class="unit"><mml:mi mathvariant="normal">mm</mml:mi><mml:mspace linebreak="nobreak" width="0.125em"/><mml:msup><mml:mi mathvariant="normal">d</mml:mi><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>) between daily and day-aggregated hourly predictions for the sMTS-LSTM with and without regularization, compared with independent prediction through naive LSTMs. <inline-formula><mml:math id="M124" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M125" display="inline"><mml:mi>d</mml:mi></mml:math></inline-formula> denote the significance (Wilcoxon signed-rank test) and effect size (Cohen’s <inline-formula><mml:math id="M126" display="inline"><mml:mi>d</mml:mi></mml:math></inline-formula>) of the difference to the inconsistencies of the regularized sMTS-LSTM.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="5">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Median</oasis:entry>
         <oasis:entry colname="col3">Maximum</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M127" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M128" display="inline"><mml:mi>d</mml:mi></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">sMTS-LSTM</oasis:entry>
         <oasis:entry colname="col2">0.376</oasis:entry>
         <oasis:entry colname="col3">1.670</oasis:entry>
         <oasis:entry colname="col4">–</oasis:entry>
         <oasis:entry colname="col5">–</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">sMTS-LSTM (no regularization)</oasis:entry>
         <oasis:entry colname="col2">0.398</oasis:entry>
         <oasis:entry colname="col3">2.176</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M129" display="inline"><mml:mrow><mml:mn mathvariant="normal">1.49</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">25</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5">0.091</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Naive</oasis:entry>
         <oasis:entry colname="col2">0.490</oasis:entry>
         <oasis:entry colname="col3">2.226</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M130" display="inline"><mml:mrow><mml:mn mathvariant="normal">7.09</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">72</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5">0.389</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="Ch1.S3.SS3">
  <label>3.3</label><title>Computational efficiency</title>
      <p id="d1e3886">In addition to differences in accuracy, the LSTM architectures have rather large differences in computational overhead and therefore runtime.
The naive hourly model must iterate through 4320 input sequence steps for each prediction, whereas the MTS-LSTM and sMTS-LSTM only require 365 daily and 336 hourly steps.
Consequently, where the naive hourly LSTM took more than one day to train on one NVIDIA V100 GPU, the MTS-LSTM and sMTS-LSTM took just over 6 (MTS-LSTM) and 8 h (sMTS-LSTM).</p>
      <p id="d1e3889">Moreover, while training is a one-time effort, the runtime advantage is even larger during inference: the naive model required around 9 h runtime to predict 10 years of hourly data for 516 basins on an NVIDIA V100 GPU.
This is about 40 times slower than the MTS-LSTM and sMTS-LSTM models, which both required around 13 min for the same task on the same hardware – and the multi-timescale models generate daily predictions in addition to hourly ones.</p>
</sec>
<sec id="Ch1.S3.SS4">
  <label>3.4</label><title>Per-timescale input variables</title>
      <p id="d1e3900">While the MTS-LSTM yielded slightly worse predictions than the sMTS-LSTM in our benchmark evaluation, it has the important ability to ingest different input variables at each<?pagebreak page2053?> timescale.
The following two experiments show how harnessing this feature can increase the accuracy of the MTS-LSTM beyond its shared version.
In the first experiment, we used two daily input forcing sets (Daymet and Maurer) and one hourly forcing set (NLDAS).
In the second experiment, we additionally ingested the daily forcings into the hourly LSTM branch.</p>
      <p id="d1e3903">Table <xref ref-type="table" rid="Ch1.T6"/> compares the results of these two multi-forcings experiments with the single-forcing models (MTS-LSTM and sMTS-LSTM) from the benchmarking section.
The second experiment – ingesting daily inputs into the hourly<?pagebreak page2054?> LSTM branch – yielded the best results.
The addition of daily forcings increased the median daily NSE by <inline-formula><mml:math id="M131" display="inline"><mml:mn mathvariant="normal">0.045</mml:mn></mml:math></inline-formula> – from <inline-formula><mml:math id="M132" display="inline"><mml:mn mathvariant="normal">0.766</mml:mn></mml:math></inline-formula> to <inline-formula><mml:math id="M133" display="inline"><mml:mn mathvariant="normal">0.811</mml:mn></mml:math></inline-formula>.
Even though the hourly LSTM branch only obtains low-resolution additional values, the hourly NSE increased by <inline-formula><mml:math id="M134" display="inline"><mml:mn mathvariant="normal">0.036</mml:mn></mml:math></inline-formula> – from <inline-formula><mml:math id="M135" display="inline"><mml:mn mathvariant="normal">0.776</mml:mn></mml:math></inline-formula> to <inline-formula><mml:math id="M136" display="inline"><mml:mn mathvariant="normal">0.812</mml:mn></mml:math></inline-formula>.
This multi-input version of the MTS-LSTM was the best model in this study, significantly better than the best single-forcing model (sMTS-LSTM).
An interesting observation is that it is <italic>daily</italic> inputs to the <italic>hourly</italic> LSTM branch that improved predictions. Using only hourly NLDAS forcings in the hourly branch, the hourly median NSE dropped to <inline-formula><mml:math id="M137" display="inline"><mml:mn mathvariant="normal">0.781</mml:mn></mml:math></inline-formula>.
Following <xref ref-type="bibr" rid="bib1.bibx30" id="text.56"/>, we expect that additional hourly forcings datasets will have further positive impact on the hourly accuracy (beyond the improvement we see with additional daily forcings).</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T6" specific-use="star"><?xmltex \currentcnt{6}?><label>Table 6</label><caption><p id="d1e3970">Median validation metrics across all 516 basins for the MTS-LSTMs trained on multiple sets of forcings (multi-forcing <inline-formula><mml:math id="M138" display="inline"><mml:mi>A</mml:mi></mml:math></inline-formula> uses daily Daymet and Maurer forcings as additional inputs into the hourly models, multi-forcing <inline-formula><mml:math id="M139" display="inline"><mml:mi>B</mml:mi></mml:math></inline-formula> uses just NLDAS as inputs into the hourly models). For comparison, the table shows the results for the single-forcing MTS-LSTM and sMTS-LSTM. Bold values highlight results that are not significantly different from the best model in the respective metric (<inline-formula><mml:math id="M140" display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.001</mml:mn></mml:mrow></mml:math></inline-formula>).</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="9">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right" colsep="1"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right" colsep="1"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right" colsep="1"/>
     <oasis:colspec colnum="8" colname="col8" align="right"/>
     <oasis:colspec colnum="9" colname="col9" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry rowsep="1" namest="col2" nameend="col5" align="center" colsep="1">Daily </oasis:entry>
         <oasis:entry rowsep="1" namest="col6" nameend="col9" align="center">Hourly </oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry rowsep="1" namest="col2" nameend="col3" align="center" colsep="1">Multi-forcing </oasis:entry>
         <oasis:entry rowsep="1" namest="col4" nameend="col5" align="center" colsep="1">Single-forcing </oasis:entry>
         <oasis:entry rowsep="1" namest="col6" nameend="col7" align="center" colsep="1">Multi-forcing </oasis:entry>
         <oasis:entry rowsep="1" namest="col8" nameend="col9" align="center">Single-forcing </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M141" display="inline"><mml:mi>A</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M142" display="inline"><mml:mi>B</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">sMTS-LSTM</oasis:entry>
         <oasis:entry colname="col5">MTS-LSTM</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M143" display="inline"><mml:mi>A</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M144" display="inline"><mml:mi>B</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col8">sMTS-LSTM</oasis:entry>
         <oasis:entry colname="col9">MTS-LSTM</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">NSE</oasis:entry>
         <oasis:entry colname="col2"><bold>0.811</bold></oasis:entry>
         <oasis:entry colname="col3"><bold>0.805</bold></oasis:entry>
         <oasis:entry colname="col4">0.785</oasis:entry>
         <oasis:entry colname="col5">0.766</oasis:entry>
         <oasis:entry colname="col6"><bold>0.812</bold></oasis:entry>
         <oasis:entry colname="col7">0.781</oasis:entry>
         <oasis:entry colname="col8">0.783</oasis:entry>
         <oasis:entry colname="col9">0.776</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">MSE</oasis:entry>
         <oasis:entry colname="col2"><bold>0.002</bold></oasis:entry>
         <oasis:entry colname="col3"><bold>0.002</bold></oasis:entry>
         <oasis:entry colname="col4">0.002</oasis:entry>
         <oasis:entry colname="col5">0.002</oasis:entry>
         <oasis:entry colname="col6"><bold>0.002</bold></oasis:entry>
         <oasis:entry colname="col7">0.002</oasis:entry>
         <oasis:entry colname="col8">0.002</oasis:entry>
         <oasis:entry colname="col9">0.002</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">RMSE</oasis:entry>
         <oasis:entry colname="col2"><bold>0.040</bold></oasis:entry>
         <oasis:entry colname="col3"><bold>0.039</bold></oasis:entry>
         <oasis:entry colname="col4">0.043</oasis:entry>
         <oasis:entry colname="col5">0.042</oasis:entry>
         <oasis:entry colname="col6"><bold>0.045</bold></oasis:entry>
         <oasis:entry colname="col7">0.049</oasis:entry>
         <oasis:entry colname="col8">0.048</oasis:entry>
         <oasis:entry colname="col9">0.049</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">KGE</oasis:entry>
         <oasis:entry colname="col2"><bold>0.782</bold></oasis:entry>
         <oasis:entry colname="col3">0.777</oasis:entry>
         <oasis:entry colname="col4"><bold>0.779</bold></oasis:entry>
         <oasis:entry colname="col5">0.760</oasis:entry>
         <oasis:entry colname="col6"><bold>0.801</bold></oasis:entry>
         <oasis:entry colname="col7">0.788</oasis:entry>
         <oasis:entry colname="col8">0.779</oasis:entry>
         <oasis:entry colname="col9">0.768</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M145" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula>-NSE</oasis:entry>
         <oasis:entry colname="col2"><bold>0.879</bold></oasis:entry>
         <oasis:entry colname="col3">0.874</oasis:entry>
         <oasis:entry colname="col4">0.865</oasis:entry>
         <oasis:entry colname="col5">0.853</oasis:entry>
         <oasis:entry colname="col6"><bold>0.905</bold></oasis:entry>
         <oasis:entry colname="col7">0.888</oasis:entry>
         <oasis:entry colname="col8">0.886</oasis:entry>
         <oasis:entry colname="col9">0.874</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Pearson <inline-formula><mml:math id="M146" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2"><bold>0.912</bold></oasis:entry>
         <oasis:entry colname="col3"><bold>0.911</bold></oasis:entry>
         <oasis:entry colname="col4">0.902</oasis:entry>
         <oasis:entry colname="col5">0.895</oasis:entry>
         <oasis:entry colname="col6"><bold>0.911</bold></oasis:entry>
         <oasis:entry colname="col7">0.898</oasis:entry>
         <oasis:entry colname="col8">0.901</oasis:entry>
         <oasis:entry colname="col9">0.895</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><inline-formula><mml:math id="M147" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula>-NSE</oasis:entry>
         <oasis:entry colname="col2">0.014</oasis:entry>
         <oasis:entry colname="col3">0.018</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M148" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.014</oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M149" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>0.007</bold></oasis:entry>
         <oasis:entry colname="col6">0.014</oasis:entry>
         <oasis:entry colname="col7">0.007</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M150" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.006</oasis:entry>
         <oasis:entry colname="col9"><inline-formula><mml:math id="M151" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>0.002</bold></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">FHV</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M152" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>10.993</bold></oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M153" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>11.870</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M154" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>13.562</oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M155" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>14.042</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M156" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>8.086</bold></oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M157" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>9.854</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M158" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>10.557</oasis:entry>
         <oasis:entry colname="col9"><inline-formula><mml:math id="M159" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>11.232</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">FMS</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M160" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>14.179</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M161" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>14.388</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M162" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>9.336</bold></oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M163" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>13.085</oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M164" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>13.114</oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M165" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>13.437</oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M166" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>10.606</bold></oasis:entry>
         <oasis:entry colname="col9"><inline-formula><mml:math id="M167" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>12.996</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">FLV</oasis:entry>
         <oasis:entry colname="col2"><bold>14.034</bold></oasis:entry>
         <oasis:entry colname="col3"><bold>12.850</bold></oasis:entry>
         <oasis:entry colname="col4"><bold>9.931</bold></oasis:entry>
         <oasis:entry colname="col5"><bold>14.486</bold></oasis:entry>
         <oasis:entry colname="col6"><inline-formula><mml:math id="M168" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>26.969</bold></oasis:entry>
         <oasis:entry colname="col7"><inline-formula><mml:math id="M169" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>27.567</bold></oasis:entry>
         <oasis:entry colname="col8"><inline-formula><mml:math id="M170" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>34.003</bold></oasis:entry>
         <oasis:entry colname="col9"><inline-formula><mml:math id="M171" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>62.439</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Peak timing</oasis:entry>
         <oasis:entry colname="col2"><bold>0.250</bold></oasis:entry>
         <oasis:entry colname="col3"><bold>0.273</bold></oasis:entry>
         <oasis:entry colname="col4"><bold>0.286</bold></oasis:entry>
         <oasis:entry colname="col5"><bold>0.308</bold></oasis:entry>
         <oasis:entry colname="col6">3.846</oasis:entry>
         <oasis:entry colname="col7">3.711</oasis:entry>
         <oasis:entry colname="col8"><bold>3.571</bold></oasis:entry>
         <oasis:entry colname="col9">3.800</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">NSE (mean)</oasis:entry>
         <oasis:entry colname="col2">0.661</oasis:entry>
         <oasis:entry colname="col3">0.663</oasis:entry>
         <oasis:entry colname="col4">0.669</oasis:entry>
         <oasis:entry colname="col5">0.603</oasis:entry>
         <oasis:entry colname="col6">0.679</oasis:entry>
         <oasis:entry colname="col7">0.630</oasis:entry>
         <oasis:entry colname="col8">0.657</oasis:entry>
         <oasis:entry colname="col9">0.615</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Number of NSEs <inline-formula><mml:math id="M172" display="inline"><mml:mrow><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">18</oasis:entry>
         <oasis:entry colname="col3">16</oasis:entry>
         <oasis:entry colname="col4">13</oasis:entry>
         <oasis:entry colname="col5">15</oasis:entry>
         <oasis:entry colname="col6">17</oasis:entry>
         <oasis:entry colname="col7">21</oasis:entry>
         <oasis:entry colname="col8">20</oasis:entry>
         <oasis:entry colname="col9">20</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
<sec id="Ch1.S3.SS5">
  <label>3.5</label><title>Predicting more timescales</title>
      <p id="d1e4728">In the above experiments, we evaluated models on daily and hourly predictions only.
The MTS-LSTM architectures, however, generalize to other timescales.</p>
      <p id="d1e4731">Table <xref ref-type="table" rid="Ch1.T7"/> lists the NSE values on the test period for each timescale.
To calculate metrics, we considered only the<?pagebreak page2055?> first eight three-hourly and four six-hourly predictions that were made on each day.
The hourly median NSE of <inline-formula><mml:math id="M173" display="inline"><mml:mn mathvariant="normal">0.747</mml:mn></mml:math></inline-formula> was barely different from the median NSE of the daily–hourly MTS-LSTM (<inline-formula><mml:math id="M174" display="inline"><mml:mn mathvariant="normal">0.748</mml:mn></mml:math></inline-formula>).
While the three-hourly predictions were roughly as accurate as the hourly predictions, the six-hourly predictions were slightly worse (median NSE <inline-formula><mml:math id="M175" display="inline"><mml:mn mathvariant="normal">0.734</mml:mn></mml:math></inline-formula>).</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T7"><?xmltex \currentcnt{7}?><label>Table 7</label><caption><p id="d1e4760">Median test period NSE and number of basins with NSE below zero across all 516 basins for NWM and the MTS-LSTM model trained to predict one-, three-, and six-hourly discharge. The three- and six-hourly NWM results are for aggregated hourly predictions.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Timescale</oasis:entry>
         <oasis:entry colname="col2">Model</oasis:entry>
         <oasis:entry colname="col3">Median NSE</oasis:entry>
         <oasis:entry colname="col4">Number of</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M176" display="inline"><mml:mrow><mml:mtext>NSEs</mml:mtext><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Hourly</oasis:entry>
         <oasis:entry colname="col2">MTS-LSTM</oasis:entry>
         <oasis:entry colname="col3">0.747</oasis:entry>
         <oasis:entry colname="col4">17</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">NWM</oasis:entry>
         <oasis:entry colname="col3">0.562</oasis:entry>
         <oasis:entry colname="col4">47</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Three-hourly</oasis:entry>
         <oasis:entry colname="col2">MTS-LSTM</oasis:entry>
         <oasis:entry colname="col3">0.746</oasis:entry>
         <oasis:entry colname="col4">15</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">NWM</oasis:entry>
         <oasis:entry colname="col3">0.570</oasis:entry>
         <oasis:entry colname="col4">44</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Six-hourly</oasis:entry>
         <oasis:entry colname="col2">MTS-LSTM</oasis:entry>
         <oasis:entry colname="col3">0.734</oasis:entry>
         <oasis:entry colname="col4">12</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">NWM</oasis:entry>
         <oasis:entry colname="col3">0.586</oasis:entry>
         <oasis:entry colname="col4">42</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

</sec>
</sec>
<sec id="Ch1.S4" sec-type="conclusions">
  <label>4</label><title>Discussion and Conclusion</title>
      <p id="d1e4918">The purpose of this work was to generalize LSTM-based rainfall–runoff modeling to multiple timescales.
This task is not as trivial as simply running different deep learning models at different timescales due to long look-back periods, associated memory leaks, physical constraints between predictions at different time steps, and computational expense.
With MTS-LSTM and sMTS-LSTM, we propose two LSTM-based rainfall–runoff models that make use of the specific (physical) nature of the simulation problem.</p>
      <p id="d1e4921">The results show that the advantage LSTMs have over process-based models on daily predictions <xref ref-type="bibr" rid="bib1.bibx29" id="paren.57"/> extends to sub-daily predictions.
An architecturally simple approach, what we call the sMTS-LSTM, can process long-term dependencies at much smaller computational overhead than a naive hourly LSTM. Nevertheless, the high accuracy of the naive hourly model shows that LSTMs, even with a forget gate, can cope with very long input sequences.
Additionally, LSTMs produce hourly predictions that are almost as good as their daily predictions, while the NWM's accuracy drops significantly from daily to hourly predictions.
A more extensive hyperparameter tuning might even increase the accuracy of the naive model; however, this is difficult to test because of the large computational expense of training LSTMs with high-resolution input sequences that are long enough to capture all hydrologically relevant history.</p>
      <p id="d1e4927">The high quality of the sMTS-LSTM model indicates that the “summary” state between daily and hourly LSTM components contains as much information as the naive model extracts from the full hourly input sequence.
This is an intuitive assumption, since watersheds are damped systems where the informative content of high-resolution forcings will almost certainly diminish at longer times in the past.</p>
      <p id="d1e4930">The MTS-LSTM adds the ability to use distinct sets of input variables for each timescale.
This is an important feature for many operational use cases, as forcings with high temporal resolution often have shorter lead times than low-resolution products.
In addition, per-timescale input variables allow for input data with lower temporal resolution, such as remote sensing products, without interpolation.
Besides these conceptual considerations, this feature boosts model accuracy beyond the best single-forcings model, as we can ingest multiple forcing products at each timescale.
Results from ingesting mixed input resolutions into the hourly LSTM branch (hourly NLDAS and daily Daymet and Maurer) highlight the flexibility of machine learning models and show that daily forcing histories can contain enough information to support hourly predictions.
Yet, there remain a number of steps before these models can be truly operational:
<list list-type="bullet"><list-item>
      <p id="d1e4935">First, like most academic rainfall–runoff models, our models operate in a reanalysis setting rather than performing actual lead-time forecasts.</p></list-item><list-item>
      <p id="d1e4939">Second, operational models predict other hydrologic variables in addition to streamflow.
Hence, multi-objective optimization of LSTM water models is an open branch of future research.</p></list-item><list-item>
      <p id="d1e4943">Third, the implementation we use in this paper predicts more granular timescales only whenever it predicts a low-resolution step. At each low-resolution step, it predicts multiple high-resolution steps to offset the lower frequency.
For instance, at daily and hourly target timescales, the LSTM predicts 24 hourly steps <italic>once a day</italic>.
In a reanalysis setting, this does not matter, but in a forecasting setting, one needs to generate hourly predictions more often than daily predictions.
Note, however, that this is merely a restriction of our implementation, not an architectural one: by allowing variable-length input sequences, we could produce one hourly prediction each hour (rather than 24 each day).</p></list-item></list></p>
      <p id="d1e4950"><?xmltex \hack{\newpage}?>In future work, we see potential benefits in combining MTS-LSTM with approaches that incorporate further physical constraints into machine learning models <xref ref-type="bibr" rid="bib1.bibx22" id="paren.58"><named-content content-type="pre">e.g., water balance;</named-content></xref>.
Further, it may be useful to extend our models such that they estimate uncertainty, as recently explored by <xref ref-type="bibr" rid="bib1.bibx24" id="text.59"/>, or to investigate architectures that pass information not just from coarse to fine timescales, but also vice versa.
Our experimental architecture from Appendix <xref ref-type="sec" rid="App1.Ch1.S2.SS2"/> may serve as a starting point for such models.</p>
      <p id="d1e4964">This work represents one step toward developing operational hydrologic models based on deep learning.
Overall, we believe that the MTS-LSTM is the most promising model for future use.
It can integrate forcings of different temporal resolutions, generate accurate and consistent predictions at multiple timescales, and its computational overhead both during training and inference is far smaller than that of individual models per timescale.</p><?xmltex \hack{\clearpage}?>
</sec>

      
      </body>
    <back><app-group>

<?pagebreak page2057?><app id="App1.Ch1.S1">
  <?xmltex \currentcnt{A}?><label>Appendix A</label><title>A peak-timing error metric</title>
      <p id="d1e4979">Especially for predictions at high temporal resolutions, it is important that a model not only captures the correct magnitude of a flood event, but also its timing.
We measure this with a “peak-timing” metric that quantifies the lag between observed and predicted peaks.
First, we heuristically extract the most important peaks from the observed time series: starting with all observed peaks, we discard all peaks with topographic prominence smaller than the observed standard deviation and subsequently remove the smallest remaining peak until all peaks have a distance of at least 100 steps.
Then, we search for the largest prediction within a window of one day (for hourly data) or three days (for daily data) around the observed peak.
Given the pairs of observed and predicted peak time, the peak-timing error is their mean absolute difference.</p>
</app>

<app id="App1.Ch1.S2">
  <?xmltex \currentcnt{B}?><label>Appendix B</label><title>Negative results</title>
      <p id="d1e4990">Throughout our experiments, we found LSTMs to be a highly resilient architecture: we tried many different approaches to multi-timescale prediction, and a large fraction of them worked reasonably well, although not quite as well as the MTS-LSTM models presented in this paper.
Nevertheless, we believe it makes sense to report some of these “negative” results – models that turned out not to work as well as the ones we finally settled on.
Note, however, that the following reports are based on exploratory experiments with a few seeds and no extensive hyperparameter tuning.</p>
<sec id="App1.Ch1.S2.SS1">
  <label>B1</label><title>Delta prediction</title>
      <p id="d1e5000">Extending our final MTS-LSTM, we tried to facilitate hourly predictions for the LSTM by ingesting the corresponding day’s prediction into the hourly LSTM branch and only predicting each hour’s deviation from the daily mean.
If anything, this approach slightly deteriorated the prediction accuracy (and made the architecture more complicated).</p>
      <p id="d1e5003">We then experimented with predicting 24 weights for each day that distribute the daily streamflow across the 24 h.
This would have yielded the elegant side-effect of guaranteed consistency across timescales: the mean hourly prediction would always be equal to the daily prediction.
Yet, the results were clearly worse, and, as we show in Sect. <xref ref-type="sec" rid="Ch1.S3.SS2"/>, we can achieve near-consistent results by incentive (regularization) rather than enforcement.
One possible reason for the reduced accuracy is that it may be harder for the LSTM to learn two different things – predicting hourly weights and daily streamflow – than to predict the same streamflow at two timescales.</p><?xmltex \hack{\newpage}?>
</sec>
<sec id="App1.Ch1.S2.SS2">
  <label>B2</label><title>Cross-timescale state exchange</title>
      <p id="d1e5017">Inspired by residual neural networks (ResNets) that use so-called skip connections to bypass layers of computation and allow for a better flow of gradients during training <xref ref-type="bibr" rid="bib1.bibx20" id="paren.60"/>, we devised a “ResNet-multi-timescale LSTM” where after each day we combine the hidden state of the hourly LSTM branch with the hidden state of the daily branch into the initial daily and hourly hidden states for the next day.
This way, we hoped, the daily LSTM branch might obtain more fine-grained information about the last few hours than it could infer from its daily inputs.
While the daily NSE remained roughly the same, the hourly predictions in this approach became much worse.</p>
</sec>
<sec id="App1.Ch1.S2.SS3">
  <label>B3</label><title>Multi-timescale input, single-timescale output</title>
      <p id="d1e5031">For both sMTS-LSTM and MTS-LSTM, we experimented with ingesting both daily and hourly data into the models, but only training them to predict hourly discharge.
In this setup, the models could fully focus on hourly predictions rather than trying to satisfy two possibly conflicting goals.
Interestingly, however, the hourly-only predictions were worse than the combined multi-timescale predictions.
One reason for this effect may be that the state summary that the daily LSTM branch passes to the hourly branch is worse, as the model obtains no training signal for its daily outputs.</p><?xmltex \hack{\clearpage}?>
</sec>
</app>

<?pagebreak page2058?><app id="App1.Ch1.S3">
  <?xmltex \currentcnt{C}?><label>Appendix C</label><title>Time-continuous prediction with ODE-LSTMs</title>
      <p id="d1e5044">As mentioned in the introduction, the combination of ordinary differential equations (ODEs) and LSTMs presents another approach to multi-timescale prediction – one that is more accurately characterized as <italic>time-continuous prediction</italic> (as opposed to our multi-objective learning approach that predicts at arbitrary but fixed timescales).
The ODE-LSTM passes each input time step through a normal LSTM, but then post-processes the resulting hidden state with an ODE that has its own learned weights.
In effect, the ODE component can adjust the LSTM's hidden state to the time step size.
We believe that our MTS-LSTM approach is better suited for operational streamflow prediction since ODE-LSTMs cannot directly process different input variables for different target timescales.
That said, from a scientific standpoint, we think that the idea of training a model that can then generalize to arbitrary granularity is of interest (e.g., toward a more comprehensive and interpretable understanding of hydrologic processes).</p>

<?xmltex \floatpos{h!}?><table-wrap id="App1.Ch1.S3.T8"><?xmltex \hack{\hsize\textwidth}?><?xmltex \currentcnt{C1}?><label>Table C1</label><caption><p id="d1e5054">Test period NSE for the MTS-LSTM and ODE-LSTM models trained on two and evaluated on three target timescales. <bold>(a)</bold> Training on daily and 12-hourly data, evaluation additionally on hourly predictions (hourly MTS-LSTM results obtained as 12-hourly predictions uniformly distributed across 12 h).
<bold>(b)</bold> Training on hourly and three-hourly data, evaluation additionally on daily predictions (daily MTS-LSTM predictions obtained as averaged three-hourly values).
The mean and median NSE are aggregated across the results for 10 basins; best values are highlighted in bold.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="7">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Timescale</oasis:entry>
         <oasis:entry colname="col3">Timescale used in training loss</oasis:entry>
         <oasis:entry rowsep="1" namest="col4" nameend="col5" align="center">Median NSE </oasis:entry>
         <oasis:entry rowsep="1" namest="col6" nameend="col7" align="center">Mean NSE </oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2"/>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4">MTS-LSTM</oasis:entry>
         <oasis:entry colname="col5">ODE-LSTM</oasis:entry>
         <oasis:entry colname="col6">MTS-LSTM</oasis:entry>
         <oasis:entry colname="col7">ODE-LSTM</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1"><bold>(a)</bold></oasis:entry>
         <oasis:entry colname="col2">Daily</oasis:entry>
         <oasis:entry colname="col3">yes</oasis:entry>
         <oasis:entry colname="col4"><bold>0.726</bold></oasis:entry>
         <oasis:entry colname="col5">0.720</oasis:entry>
         <oasis:entry colname="col6"><bold>0.664</bold></oasis:entry>
         <oasis:entry colname="col7">0.651</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">12-hourly</oasis:entry>
         <oasis:entry colname="col3">yes</oasis:entry>
         <oasis:entry colname="col4"><bold>0.734</bold></oasis:entry>
         <oasis:entry colname="col5">0.706</oasis:entry>
         <oasis:entry colname="col6"><bold>0.672</bold></oasis:entry>
         <oasis:entry colname="col7">0.638</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Hourly</oasis:entry>
         <oasis:entry colname="col3">no</oasis:entry>
         <oasis:entry colname="col4"><bold>0.706</bold></oasis:entry>
         <oasis:entry colname="col5">0.639</oasis:entry>
         <oasis:entry colname="col6"><bold>0.634</bold></oasis:entry>
         <oasis:entry colname="col7">0.592</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"><bold>(b)</bold></oasis:entry>
         <oasis:entry colname="col2">Daily</oasis:entry>
         <oasis:entry colname="col3">no</oasis:entry>
         <oasis:entry colname="col4"><bold>0.746</bold></oasis:entry>
         <oasis:entry colname="col5">0.587</oasis:entry>
         <oasis:entry colname="col6"><bold>0.718</bold></oasis:entry>
         <oasis:entry colname="col7">0.546</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Three-hourly</oasis:entry>
         <oasis:entry colname="col3">yes</oasis:entry>
         <oasis:entry colname="col4"><bold>0.728</bold></oasis:entry>
         <oasis:entry colname="col5">0.675</oasis:entry>
         <oasis:entry colname="col6"><bold>0.672</bold></oasis:entry>
         <oasis:entry colname="col7">0.593</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Hourly</oasis:entry>
         <oasis:entry colname="col3">yes</oasis:entry>
         <oasis:entry colname="col4"><bold>0.700</bold></oasis:entry>
         <oasis:entry colname="col5">0.677</oasis:entry>
         <oasis:entry colname="col6"><bold>0.633</bold></oasis:entry>
         <oasis:entry colname="col7">0.586</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e5289"><?xmltex \hack{\newpage}?>Although the idea of time-continuous predictions seemed promising, in our exploratory experiments it was better to use an MTS-LSTM and aggregate (or dis-aggregate) its predictions to the desired target temporal resolution.
Note that due to the slow training of ODE-LSTMs, we carried out the following experiments on 10 basins (training one model per basin; we used smaller hidden sizes, higher learning rates, and trained for more epochs to adjust the LSTMs to this setting).
Table <xref ref-type="table" rid="App1.Ch1.S3.T8"/> gives examples for predictions at untrained timescales. Table <xref ref-type="table" rid="App1.Ch1.S3.T8"/>a shows the mean and median NSE values across the 10 basins when we trained the models on daily and 12-hourly target data but then generated hourly predictions (for the MTS-LSTM, we obtained hourly predictions by uniformly spreading the 12-hourly prediction across 12 h).
Table <xref ref-type="table" rid="App1.Ch1.S3.T8"/>b shows the results when we trained on hourly and three-hourly target data but then predicted daily values (for the MTS-LSTM, we aggregated eight three-hourly predictions into one daily time step).
These initial results show that, in almost all cases, it is better to (dis-)aggregate MTS-LSTM predictions than to use an ODE-LSTM.</p><?xmltex \hack{\clearpage}?>
</app>

<?pagebreak page2059?><app id="App1.Ch1.S4">
  <?xmltex \currentcnt{D}?><label>Appendix D</label><title>Hyperparameter tuning</title>
      <p id="d1e5308">We performed a two-stage hyperparameter tuning for each of the multi-timescale models.
In the first stage, we trained architectural model parameters (regularization, hidden size, sequence length, dropout) for 30 epochs at a batch size of 512 and a learning rate that starts at <inline-formula><mml:math id="M177" display="inline"><mml:mn mathvariant="normal">0.001</mml:mn></mml:math></inline-formula>, reduces to <inline-formula><mml:math id="M178" display="inline"><mml:mn mathvariant="normal">0.0005</mml:mn></mml:math></inline-formula> after 10 epochs, and to <inline-formula><mml:math id="M179" display="inline"><mml:mn mathvariant="normal">0.0001</mml:mn></mml:math></inline-formula> after 20 epochs.
We selected the configuration with the best median NSE (we considered the average of daily and hourly median NSE) and, in the second stage, tuned the learning rate and batch size. Table <xref ref-type="table" rid="App1.Ch1.S4.T9"/> lists the parameter combinations we explored.</p>
      <p id="d1e5334">We did not tune architectural parameters of the naive LSTM models, since the architecture has already been extensively tuned by <xref ref-type="bibr" rid="bib1.bibx29" id="text.61"/>.
The 24 times larger training set of the naive hourly model did, however, require additional tuning of learning rate and batch size, and we only trained for one epoch.
As the extremely long input sequences greatly increase training time, we were only able to evaluate a relatively small parameter grid for the naive hourly model.</p>

<?xmltex \floatpos{h!}?><table-wrap id="App1.Ch1.S4.T9"><?xmltex \hack{\hsize\textwidth}?><?xmltex \currentcnt{D1}?><label>Table D1</label><caption><p id="d1e5344">Hyperparameter tuning grid. The upper table section lists the architectural parameters tuned in stage one, the lower section lists the parameters tuned in stage two. In both stages, we trained three seeds for each parameter combination and averaged their NSEs. Bold values denote the best hyperparameter combination for each model.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="5">
     <oasis:colspec colnum="1" colname="col1" align="justify" colwidth="3cm"/>
     <oasis:colspec colnum="2" colname="col2" align="justify" colwidth="3cm"/>
     <oasis:colspec colnum="3" colname="col3" align="justify" colwidth="3cm"/>
     <oasis:colspec colnum="4" colname="col4" align="justify" colwidth="3cm"/>
     <oasis:colspec colnum="5" colname="col5" align="justify" colwidth="3cm"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Naive (daily)</oasis:entry>
         <oasis:entry colname="col3">Naive (hourly)</oasis:entry>
         <oasis:entry colname="col4">sMTS-LSTM</oasis:entry>
         <oasis:entry colname="col5">MTS-LSTM</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Loss</oasis:entry>
         <oasis:entry colname="col2">NSE</oasis:entry>
         <oasis:entry colname="col3">NSE</oasis:entry>
         <oasis:entry colname="col4">NSE</oasis:entry>
         <oasis:entry colname="col5">NSE</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Regularization (see <?xmltex \hack{\hfill\break}?>Sect. <xref ref-type="sec" rid="Ch1.S2.SS3.SSS2"/>)</oasis:entry>
         <oasis:entry colname="col2">–</oasis:entry>
         <oasis:entry colname="col3">–</oasis:entry>
         <oasis:entry colname="col4"><bold>yes</bold>, no</oasis:entry>
         <oasis:entry colname="col5"><bold>yes</bold>, no</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Hidden size</oasis:entry>
         <oasis:entry colname="col2">256</oasis:entry>
         <oasis:entry colname="col3">256</oasis:entry>
         <oasis:entry colname="col4">64, <bold>128</bold>, 256, 512, 1024</oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M186" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">32</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M187" display="inline"><mml:mrow><mml:mn mathvariant="bold">2</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="bold">64</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M188" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">128</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M189" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">256</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M190" display="inline"><mml:mrow><mml:mn mathvariant="normal">2</mml:mn><mml:mo>×</mml:mo><mml:mn mathvariant="normal">512</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Sequence length</oasis:entry>
         <oasis:entry colname="col2">365 d</oasis:entry>
         <oasis:entry colname="col3">4320 h</oasis:entry>
         <oasis:entry colname="col4"><bold>365 d</bold> + 72, 168, <bold>336 h</bold></oasis:entry>
         <oasis:entry colname="col5"><bold>365 d</bold> + 72, 168, <bold>336 h</bold></oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Dropout</oasis:entry>
         <oasis:entry colname="col2">0.4</oasis:entry>
         <oasis:entry colname="col3">0.4</oasis:entry>
         <oasis:entry colname="col4">0.2, <bold>0.4</bold>, 0.6</oasis:entry>
         <oasis:entry colname="col5">0.2, <bold>0.4</bold>, 0.6</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Learning rate<inline-formula><mml:math id="M191" display="inline"><mml:msup><mml:mi/><mml:mo>*</mml:mo></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col2">(1: <inline-formula><mml:math id="M192" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula>, 10: <inline-formula><mml:math id="M193" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula>, 25: <inline-formula><mml:math id="M194" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M195" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>, <bold>1<italic>e</italic></bold><inline-formula><mml:math id="M196" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>4</bold></oasis:entry>
         <oasis:entry colname="col4">(1: <inline-formula><mml:math id="M197" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula>, 10: <inline-formula><mml:math id="M198" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula>, 25: <inline-formula><mml:math id="M199" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>), (1: <inline-formula><mml:math id="M200" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula>, 10: <inline-formula><mml:math id="M201" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>, 25: <inline-formula><mml:math id="M202" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>), <inline-formula><mml:math id="M203" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mn mathvariant="bold">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="bold">5</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="bold">4</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="bold">10</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="bold">1</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="bold">4</mml:mn></mml:mrow></mml:math></inline-formula>, <inline-formula><mml:math id="M204" display="inline"><mml:mrow><mml:mn mathvariant="bold">25</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="bold">5</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="bold">5</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, (1: <inline-formula><mml:math id="M205" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>, 10: <inline-formula><mml:math id="M206" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula>, 25: <inline-formula><mml:math id="M207" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col5">(1: <inline-formula><mml:math id="M208" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula>, 10: <inline-formula><mml:math id="M209" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula>, 25: <inline-formula><mml:math id="M210" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>), (1: <inline-formula><mml:math id="M211" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula>, 10: <inline-formula><mml:math id="M212" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>, 25: <inline-formula><mml:math id="M213" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>), <inline-formula><mml:math id="M214" display="inline"><mml:mrow><mml:mo>(</mml:mo><mml:mn mathvariant="bold">1</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="bold">5</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="bold">4</mml:mn><mml:mo>,</mml:mo><mml:mn mathvariant="bold">10</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="bold">1</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="bold">4</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:math></inline-formula> <inline-formula><mml:math id="M215" display="inline"><mml:mrow><mml:mn mathvariant="bold">25</mml:mn><mml:mo>:</mml:mo><mml:mn mathvariant="bold">5</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="bold">5</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, (1: <inline-formula><mml:math id="M216" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">4</mml:mn></mml:mrow></mml:math></inline-formula>, 10: <inline-formula><mml:math id="M217" display="inline"><mml:mrow><mml:mn mathvariant="normal">5</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula>, 25: <inline-formula><mml:math id="M218" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mi>e</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">5</mml:mn></mml:mrow></mml:math></inline-formula>)</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">Batch size</oasis:entry>
         <oasis:entry colname="col2">256</oasis:entry>
         <oasis:entry colname="col3"><bold>256</bold>, 512</oasis:entry>
         <oasis:entry colname="col4"><bold>128</bold>, 256, 512, 1024</oasis:entry>
         <oasis:entry colname="col5">128, <bold>256</bold>, 512, 1024</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Number of<?xmltex \hack{\hfill\break}?>combinations</oasis:entry>
         <oasis:entry colname="col2">1</oasis:entry>
         <oasis:entry colname="col3">4</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M219" display="inline"><mml:mrow><mml:mn mathvariant="normal">90</mml:mn><mml:mo>+</mml:mo><mml:mn mathvariant="normal">16</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5"><inline-formula><mml:math id="M220" display="inline"><mml:mrow><mml:mn mathvariant="normal">90</mml:mn><mml:mo>+</mml:mo><mml:mn mathvariant="normal">16</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table><table-wrap-foot><p id="d1e5347"><inline-formula><mml:math id="M180" display="inline"><mml:msup><mml:mi/><mml:mo>*</mml:mo></mml:msup></mml:math></inline-formula> (1: <inline-formula><mml:math id="M181" display="inline"><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, 5: <inline-formula><mml:math id="M182" display="inline"><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, 10: <inline-formula><mml:math id="M183" display="inline"><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula>, …) denotes a learning rate of <inline-formula><mml:math id="M184" display="inline"><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> for epochs 1 to 4, of <inline-formula><mml:math id="M185" display="inline"><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula> for epochs 5 to 10, etc.</p></table-wrap-foot></table-wrap>

<?xmltex \hack{\clearpage}?>
</app>
  </app-group><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d1e6156">We trained all our machine learning models with the <monospace>neuralhydrology</monospace> Python library (<ext-link xlink:href="https://doi.org/10.5281/zenodo.4688003" ext-link-type="DOI">10.5281/zenodo.4688003</ext-link>, <xref ref-type="bibr" rid="bib1.bibx27" id="altparen.62"/>).
All code to reproduce our models and analyses is available at <ext-link xlink:href="https://doi.org/10.5281/zenodo.4687991" ext-link-type="DOI">10.5281/zenodo.4687991</ext-link> (<xref ref-type="bibr" rid="bib1.bibx11" id="altparen.63"/>).
The trained models and their predictions are available at <ext-link xlink:href="https://doi.org/10.5281/zenodo.4071885" ext-link-type="DOI">10.5281/zenodo.4071885</ext-link> <xref ref-type="bibr" rid="bib1.bibx13" id="paren.64"/>.
Hourly NLDAS forcings and observed streamflow are available at <ext-link xlink:href="https://doi.org/10.5281/zenodo.4072700" ext-link-type="DOI">10.5281/zenodo.4072700</ext-link> <xref ref-type="bibr" rid="bib1.bibx14" id="paren.65"/>.
The CAMELS static attributes are accessible at <ext-link xlink:href="https://doi.org/10.5065/D6G73C3Q" ext-link-type="DOI">10.5065/D6G73C3Q</ext-link> (<xref ref-type="bibr" rid="bib1.bibx2" id="altparen.66"/>). The CAMELS forcing data are accessible at <ext-link xlink:href="https://ral.ucar.edu/sites/default/files/public/product-tool/camels-catchment-attributes-and-meteorology-for-large-sample-studies-dataset-downloads/basin_timeseries_v1p2_metForcing_obsFlow.zip">https://ral.ucar.edu/sites/default/files/public/product-tool/camels-catchment-attributes-and-meteorology-for-large-sample-studies-dataset-downloads/basin_timeseries_v1p2_metForcing_obsFlow.zip</ext-link> (<xref ref-type="bibr" rid="bib1.bibx37" id="altparen.67"/>), however, the Maurer forcings distributed with this dataset should be replaced with their updated version available at <ext-link xlink:href="https://doi.org/10.4211/hs.17c896843cf940339c3c3496d0c1c077" ext-link-type="DOI">10.4211/hs.17c896843cf940339c3c3496d0c1c077</ext-link> (<xref ref-type="bibr" rid="bib1.bibx28" id="altparen.68"/>).</p>
  </notes><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d1e6209">MG, FK, and DK designed all experiments.
MG conducted all experiments and analyzed the results, together with FK and DK.
GN supervised the manuscript from the hydrologic perspective, and JL and SH from the machine learning perspective.
All authors worked on the manuscript.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d1e6215">The authors declare that they have no conflict of interest.</p>
  </notes><ack><title>Acknowledgements</title><p id="d1e6221">This research was undertaken thanks in part to funding from the Canada First Research Excellence Fund and the Global Water Futures Program, and enabled by computational resources provided by Compute Ontario and Compute Canada. We acknowledge the support of Microsoft Canada under the AI for Social Good program of the Waterloo Artificial Intelligence Institute. The ELLIS Unit Linz, the LIT AI Lab, and the
Institute for Machine Learning
are supported by
the Federal State Upper Austria.
We thank the projects
AI-MOTION,
DeepToxGen,
AI-SNN,
DeepFlood,
Medical Cognitive Computing Center (MC3),
PRIMAL,
S3AI,
DL for granular flow,
ELISE (H2020-ICT-2019-3 ID: 951847), and
AIDD.
Further, we thank
Janssen Pharmaceutica,
UCB Biopharma SRL,
Merck Healthcare KGaA,
Audi.JKU Deep Learning Center,
TGW LOGISTICS GROUP GMBH,
Silicon Austria Labs (SAL),
FILL Gesellschaft mbH,
Anyline GmbH,
Google (Faculty Research Award),
ZF Friedrichshafen AG,
Robert Bosch GmbH,
Software Competence Center Hagenberg GmbH,
TÜV Austria,
and the NVIDIA Corporation.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d1e6226">This research has been supported by the Global Water Futures project, the Bundesministerium für Bildung, Wissenschaft und Forschung (grant nos. LIT-2018-6-YOU-212, LIT-2017-3-YOU-003, LIT-2018-6-YOU-214, and LIT-2019-8-YOU-213), the Österreichische Forschungsförderungsgesellschaft (grant nos. FFG-873979, FFG-872172, and FFG-871302), the European Commission's Horizon 2020 Framework Programme (AIDD (grant no. 956832)), Janssen Pharmaceuticals, UCB Biopharma SRL, the Merck Healthcare KGaA, the Audi.JKU Deep Learning Center, the TGW Logistics Group GmbH, Silicon Austria Labs (SAL), the FILL Gesellschaft mbH, the Anyline GmbH, Google, the ZF Friedrichshafen AG, the Robert Bosch GmbH, the Software Competence Center Hagenberg GmbH, TÜV Austria, the NVIDIA Corporation, and Microsoft Canada.</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d1e6232">This paper was edited by Fabrizio Fenicia and reviewed by Thomas Lees and one anonymous referee.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><?xmltex \def\ref@label{{Addor et~al.(2017a)Addor, Newman, Mizukami, and Clark}}?><label>Addor et al.(2017a)Addor, Newman, Mizukami, and Clark</label><?label Addor2017?><mixed-citation>Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, <ext-link xlink:href="https://doi.org/10.5194/hess-21-5293-2017" ext-link-type="DOI">10.5194/hess-21-5293-2017</ext-link>, 2017a.</mixed-citation></ref>
      <ref id="bib1.bibx2"><?xmltex \def\ref@label{{Addor et~al.(2017b)Addor, Newman, Mizukami, and Clark}}?><label>Addor et al.(2017b)Addor, Newman, Mizukami, and Clark</label><?label Addor2017b?><mixed-citation>Addor, N.,  Newman, A., Mizukami, M., and Clark, M. P.: Catchment attributes for large-sample studies [data set], Boulder, CO, UCAR/NCAR, <ext-link xlink:href="https://doi.org/10.5065/D6G73C3Q" ext-link-type="DOI">10.5065/D6G73C3Q</ext-link> (last access: 14 April 2021), 2017.</mixed-citation></ref>
      <ref id="bib1.bibx3"><?xmltex \def\ref@label{{Addor et~al.(2018)Addor, Nearing, Prieto, Newman, Le~Vine, and
Clark}}?><label>Addor et al.(2018)Addor, Nearing, Prieto, Newman, Le Vine, and
Clark</label><?label Addor2018?><mixed-citation>Addor, N., Nearing, G., Prieto, C., Newman, A. J., Le Vine, N., and Clark,
M. P.: A Ranking of Hydrological Signatures Based on Their Predictability in
Space, Water Resour. Res., 54, 8792–8812, <ext-link xlink:href="https://doi.org/10.1029/2018WR022606" ext-link-type="DOI">10.1029/2018WR022606</ext-link>,
2018.</mixed-citation></ref>
      <ref id="bib1.bibx4"><?xmltex \def\ref@label{{Araya et~al.(2019)Araya, Valle, and Allende}}?><label>Araya et al.(2019)Araya, Valle, and Allende</label><?label Araya2019?><mixed-citation>Araya, I. A., Valle, C., and Allende, H.: A Multi-Scale Model based on the
Long Short-Term Memory for day ahead hourly wind speed forecasting, Pattern
Recognition Letters, 136, 333–340, <ext-link xlink:href="https://doi.org/10.1016/j.patrec.2019.10.011" ext-link-type="DOI">10.1016/j.patrec.2019.10.011</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx5"><?xmltex \def\ref@label{{Bengio et~al.(1994)Bengio, Simard, and Frasconi}}?><label>Bengio et al.(1994)Bengio, Simard, and Frasconi</label><?label Bengio1994?><mixed-citation>Bengio, Y., Simard, P., and Frasconi, P.: Learning long-term dependencies with
gradient descent is difficult, IEEE Transactions on Neural Networks, 5,
157–166, <ext-link xlink:href="https://doi.org/10.1109/72.279181" ext-link-type="DOI">10.1109/72.279181</ext-link>, 1994.</mixed-citation></ref>
      <ref id="bib1.bibx6"><?xmltex \def\ref@label{{Chung et~al.(2016)Chung, Ahn, and Bengio}}?><label>Chung et al.(2016)Chung, Ahn, and Bengio</label><?label Chung2016?><mixed-citation>
Chung, J., Ahn, S., and Bengio, Y.: Hierarchical Multiscale Recurrent Neural
Networks, arXiv preprint, arXiv:1609.01704, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx7"><?xmltex \def\ref@label{{Clausen and Biggs(2000)}}?><label>Clausen and Biggs(2000)</label><?label Clausen2000?><mixed-citation>Clausen, B. and Biggs, B. J. F.: Flow variables for ecological studies in
temperate streams: groupings based on covariance, J. Hydrol., 237,
184–197, <ext-link xlink:href="https://doi.org/10.1016/S0022-1694(00)00306-1" ext-link-type="DOI">10.1016/S0022-1694(00)00306-1</ext-link>, 2000.</mixed-citation></ref>
      <ref id="bib1.bibx8"><?xmltex \def\ref@label{{Cosgrove and Klemmer(2019)}}?><label>Cosgrove and Klemmer(2019)</label><?label NOAANWM?><mixed-citation>Cosgrove, B. and Klemmer, C.: The National Water Model, available at:
<uri>https://water.noaa.gov/about/nwm</uri> (last access: 25 January 2021), 2019.</mixed-citation></ref>
      <ref id="bib1.bibx9"><?xmltex \def\ref@label{{Court(1962)}}?><label>Court(1962)</label><?label Court1962?><mixed-citation>Court, A.: Measures of streamflow timing, J. Geophys. Res.
(1896–1977), 67, 4335–4339, <ext-link xlink:href="https://doi.org/10.1029/JZ067i011p04335" ext-link-type="DOI">10.1029/JZ067i011p04335</ext-link>, 1962.</mixed-citation></ref>
      <ref id="bib1.bibx10"><?xmltex \def\ref@label{{Frame et~al.(2020)Frame, Nearing, Kratzert, and Rahman}}?><label>Frame et al.(2020)Frame, Nearing, Kratzert, and Rahman</label><?label Frame2020?><mixed-citation>Frame, J., Nearing, G., Kratzert, F., and Rahman, M.: Post processing the
U.S. National Water Model with a Long Short-Term Memory network, EarthArXiv,
<ext-link xlink:href="https://doi.org/10.31223/osf.io/4xhac" ext-link-type="DOI">10.31223/osf.io/4xhac</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx11"><?xmltex \def\ref@label{{Gauch(2021)}}?><label>Gauch(2021)</label><?label Gauch2021?><mixed-citation>Gauch, M.: Code for “Rainfall-Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network”, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.4687991" ext-link-type="DOI">10.5281/zenodo.4687991</ext-link> (last access: 14 April 2021), 2021.</mixed-citation></ref>
      <ref id="bib1.bibx12"><?xmltex \def\ref@label{{Gauch and Lin(2020)}}?><label>Gauch and Lin(2020)</label><?label Gauch2020?><mixed-citation>
Gauch, M. and Lin, J.: A Data Scientist's Guide to Streamflow Prediction, arXiv
preprint, arXiv:2006.12975, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx13"><?xmltex \def\ref@label{Gauch et al.(2020a)}?><label>Gauch et al.(2020a)</label><?label gauch2020a?><mixed-citation>Gauch, M., Kratzert, F., Klotz, D., Nearing, G., Lin, J.,  and Hochreiter, S.: Models and Predictions for “Rainfall-Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network” [data set], Zenodo, <ext-link xlink:href="https://doi.org/10.5281/zenodo.4095485" ext-link-type="DOI">10.5281/zenodo.4095485</ext-link>, 2020a.</mixed-citation></ref>
      <ref id="bib1.bibx14"><?xmltex \def\ref@label{Gauch et al.(2020b)}?><label>Gauch et al.(2020b)</label><?label gauch2020b?><mixed-citation>Gauch, M., Kratzert, F., Klotz, D., Nearing, G., Lin, J.,  and Hochreiter, S.:  Data for “Rainfall-Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network” [data set], Zenodo, <ext-link xlink:href="https://doi.org/10.5281/zenodo.4072701" ext-link-type="DOI">10.5281/zenodo.4072701</ext-link>, 2020b.</mixed-citation></ref>
      <ref id="bib1.bibx15"><?xmltex \def\ref@label{{Gers et~al.(1999)Gers, Schmidhuber, and Cummins}}?><label>Gers et al.(1999)Gers, Schmidhuber, and Cummins</label><?label Gers1999?><mixed-citation>
Gers, F. A., Schmidhuber, J., and Cummins, F.: Learning to forget: continual
prediction with LSTM, IET Conference Proceedings, pp. 850–855, 1999.</mixed-citation></ref>
      <ref id="bib1.bibx16"><?xmltex \def\ref@label{{Gochis et~al.(2020)Gochis, Barlage, Cabell, Casali, Dugger,
FitzGerald, McAllister, McCreight, RafieeiNasab, Read, Sampson, Yates, and
Zhang}}?><label>Gochis et al.(2020)Gochis, Barlage, Cabell, Casali, Dugger,
FitzGerald, McAllister, McCreight, RafieeiNasab, Read, Sampson, Yates, and
Zhang</label><?label Gochis2020?><mixed-citation>Gochis, D. J., Barlage, M., Cabell, R., Casali, M., Dugger, A., FitzGerald, K.,
McAllister, M., McCreight, J., RafieeiNasab, A., Read, L., Sampson, K.,
Yates, D., and Zhang, Y.: The WRF-Hydro<sup>®</sup> modeling system
technical description, available at:
<uri>https://ral.ucar.edu/sites/default/files/public/projects/Technical%20Description%20%26amp%3B%20User%20Guides/wrfhydrov511technicaldescription.pdf</uri> (last access: 14 April 2021),
2020.</mixed-citation></ref>
      <ref id="bib1.bibx17"><?xmltex \def\ref@label{{Graves et~al.(2007)Graves, Fern{\'{a}}ndez, and
Schmidhuber}}?><label>Graves et al.(2007)Graves, Fernández, and
Schmidhuber</label><?label Graves2007mdrnn?><mixed-citation>
Graves, A., Fernández, S., and Schmidhuber, J.: Multi-dimensional Recurrent
Neural Networks, in: Artificial Neural Networks – ICANN 2007, edited by:
de Sá, J. M., Alexandre, L. A., Duch, W., and Mandic, D., pp. 549–558,
Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx18"><?xmltex \def\ref@label{{Greff et~al.(2017)Greff, Srivastava, Koutn\'{i}k, Steunebrink, and
Schmidhuber}}?><label>Greff et al.(2017)Greff, Srivastava, Koutník, Steunebrink, and
Schmidhuber</label><?label Greff2017?><mixed-citation>Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., and
Schmidhuber, J.: LSTM: A Search Space Odyssey, IEEE Transactions on Neural
Networks and Learning Systems, 28, 2222–2232,
<ext-link xlink:href="https://doi.org/10.1109/TNNLS.2016.2582924" ext-link-type="DOI">10.1109/TNNLS.2016.2582924</ext-link>, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx19"><?xmltex \def\ref@label{{Gupta et~al.(2009)Gupta, Kling, Yilmaz, and Martinez}}?><label>Gupta et al.(2009)Gupta, Kling, Yilmaz, and Martinez</label><?label Gupta2009?><mixed-citation>Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.: Decomposition of
the mean squared error and NSE performance criteria: implications for
improving hydrological modelling, J. Hydrol., 377, 80–91,
<ext-link xlink:href="https://doi.org/10.1016/j.jhydrol.2009.08.003" ext-link-type="DOI">10.1016/j.jhydrol.2009.08.003</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx20"><?xmltex \def\ref@label{{He et~al.(2016)He, Zhang, Ren, and Sun}}?><label>He et al.(2016)He, Zhang, Ren, and Sun</label><?label He2016resnet?><mixed-citation>
He, K., Zhang, X., Ren, S., and Sun, J.: Deep Residual Learning for Image
Recognition, in: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), June 2016, Las Vegas, Nevada, 770–778,  2016.</mixed-citation></ref>
      <ref id="bib1.bibx21"><?xmltex \def\ref@label{{Hochreiter and Schmidhuber(1997)}}?><label>Hochreiter and Schmidhuber(1997)</label><?label Hochreiter1997?><mixed-citation>Hochreiter, S. and Schmidhuber, J.: Long Short-Term Memory, Neural
Computation, 9, 1735–1780, <ext-link xlink:href="https://doi.org/10.1162/neco.1997.9.8.1735" ext-link-type="DOI">10.1162/neco.1997.9.8.1735</ext-link>, 1997.</mixed-citation></ref>
      <ref id="bib1.bibx22"><?xmltex \def\ref@label{{Hoedt et~al.(2021)Hoedt, Kratzert, Klotz, Halmich, Holzleitner,
Nearing, Hochreiter, and Klambauer}}?><label>Hoedt et al.(2021)Hoedt, Kratzert, Klotz, Halmich, Holzleitner,
Nearing, Hochreiter, and Klambauer</label><?label Hoedt2021mclstm?><mixed-citation>Hoedt, P.-J., Kratzert, F., Klotz, D., Halmich, C., Holzleitner, M., Nearing,
G., Hochreiter, S., and Klambauer, G.: MC-LSTM: Mass-Conserving LSTM, available at: <uri>https://arxiv.org/abs/2101.05186</uri>,
2021.</mixed-citation></ref>
      <ref id="bib1.bibx23"><?xmltex \def\ref@label{{Jozefowicz et~al.(2015)Jozefowicz, Zaremba, and
Sutskever}}?><label>Jozefowicz et al.(2015)Jozefowicz, Zaremba, and
Sutskever</label><?label Jozefowicz2015?><mixed-citation>
Jozefowicz, R., Zaremba, W., and Sutskever, I.: An Empirical Exploration of
Recurrent Network Architectures, in: Proceedings of the 32nd International
Conference on Machine Learning, edited by: Bach, F. and Blei, D., vol. 37 of
Proceedings of Machine Learning Research, pp. 2342–2350, PMLR,
Lille, France, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx24"><?xmltex \def\ref@label{{Klotz et~al.(2021)Klotz, Kratzert, Gauch, Sampson, Klambauer,
Hochreiter, and Nearing}}?><label>Klotz et al.(2021)Klotz, Kratzert, Gauch, Sampson, Klambauer,
Hochreiter, and Nearing</label><?label Klotz2020uncertainty?><mixed-citation>Klotz, D., Kratzert, F., Gauch, M., Keefe Sampson, A., Brandstetter, J., Klambauer, G., Hochreiter, S., and Nearing, G.: Uncertainty Estimation with Deep Learning for Rainfall–Runoff Modelling, Hydrol. Earth Syst. Sci. Discuss. [preprint], <ext-link xlink:href="https://doi.org/10.5194/hess-2021-154" ext-link-type="DOI">10.5194/hess-2021-154</ext-link>, in review, 2021.</mixed-citation></ref>
      <ref id="bib1.bibx25"><?xmltex \def\ref@label{{Koutn\'{i}k et~al.(2014)Koutn\'{i}k, Greff, Gomez, and
Schmidhuber}}?><label>Koutník et al.(2014)Koutník, Greff, Gomez, and
Schmidhuber</label><?label Koutnik2014?><mixed-citation>
Koutník, J., Greff, K., Gomez, F., and Schmidhuber, J.: A Clockwork RNN,
arXiv preprint, arXiv:1402.3511, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx26"><?xmltex \def\ref@label{{Kratzert et~al.(2018)Kratzert, Klotz, Brenner, Schulz, and
Herrnegger}}?><label>Kratzert et al.(2018)Kratzert, Klotz, Brenner, Schulz, and
Herrnegger</label><?label Kratzert2018?><mixed-citation>Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005–6022, <ext-link xlink:href="https://doi.org/10.5194/hess-22-6005-2018" ext-link-type="DOI">10.5194/hess-22-6005-2018</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx27"><?xmltex \def\ref@label{Kratzert et al.(2020)}?><label>Kratzert et al.(2020)</label><?label Kratzertetal2020?><mixed-citation>Kratzert, F., Gauch, M., and Klotz, D.: NeuralHydrology Python Library, Zenodo [code], <ext-link xlink:href="https://doi.org/10.5281/zenodo.4688003" ext-link-type="DOI">10.5281/zenodo.4688003</ext-link> (last access: 14 April 2021), 2020.</mixed-citation></ref>
      <ref id="bib1.bibx28"><?xmltex \def\ref@label{{Kratzert(2019)}}?><label>Kratzert(2019)</label><?label Kratzertdata2019?><mixed-citation>Kratzert, F.: CAMELS Extended Maurer Forcing Data, HydroShare [data set], <ext-link xlink:href="https://doi.org/10.4211/hs.17c896843cf940339c3c3496d0c1c077" ext-link-type="DOI">10.4211/hs.17c896843cf940339c3c3496d0c1c077</ext-link> (last access: 14 April 2021), 2019.</mixed-citation></ref>
      <ref id="bib1.bibx29"><?xmltex \def\ref@label{{Kratzert et~al.(2019)Kratzert, Klotz, Shalev, Klambauer, Hochreiter,
and Nearing}}?><label>Kratzert et al.(2019)Kratzert, Klotz, Shalev, Klambauer, Hochreiter,
and Nearing</label><?label Kratzert2019?><mixed-citation>Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, G.: Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089–5110, <ext-link xlink:href="https://doi.org/10.5194/hess-23-5089-2019" ext-link-type="DOI">10.5194/hess-23-5089-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx30"><?xmltex \def\ref@label{{Kratzert et~al.(2020)Kratzert, Klotz, Hochreiter, and
Nearing}}?><label>Kratzert et al.(2020)Kratzert, Klotz, Hochreiter, and
Nearing</label><?label Kratzert2020?><mixed-citation>Kratzert, F., Klotz, D., Hochreiter, S., and Nearing, G. S.: A note on leveraging synergy in multiple meteorological datasets with deep learning for rainfall-runoff modeling, Hydrol. Earth Syst. Sci. Discuss. [preprint], <ext-link xlink:href="https://doi.org/10.5194/hess-2020-221" ext-link-type="DOI">10.5194/hess-2020-221</ext-link>, in review, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx31"><?xmltex \def\ref@label{{Ladson et~al.(2013)Ladson, Brown, Neal, and Nathan}}?><label>Ladson et al.(2013)Ladson, Brown, Neal, and Nathan</label><?label Ladson2013?><mixed-citation>Ladson, T. R., Brown, R., Neal, B., and Nathan, R.: A Standard Approach to
Baseflow Separation Using The Lyne and Hollick Filter, Australasian
J. Water Res., 17, 25–34, available at: <uri>https://www.tandfonline.com/doi/ref/10.7158/13241583.2013.11465417</uri> (last access: 14 April 2021),
2013.</mixed-citation></ref>
      <ref id="bib1.bibx32"><?xmltex \def\ref@label{{Lechner and Hasani(2020)}}?><label>Lechner and Hasani(2020)</label><?label Lechner2020odelstm?><mixed-citation>
Lechner, M. and Hasani, R.: Learning Long-Term Dependencies in
Irregularly-Sampled Time Series, arXiv preprint, arXiv:2006.04418, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx33"><?xmltex \def\ref@label{{Mozer(1991)}}?><label>Mozer(1991)</label><?label Mozer1991?><mixed-citation>
Mozer, M.: Induction of Multiscale Temporal Structure, in: Advances in Neural
Information Processing Systems 4, edited by: Moody, J. E., Hanson, S. J., and
Lippmann, R., pp. 275–282, Morgan Kaufmann, 1991.</mixed-citation></ref>
      <ref id="bib1.bibx34"><?xmltex \def\ref@label{{Nash and Sutcliffe(1970)}}?><label>Nash and Sutcliffe(1970)</label><?label Nash1970?><mixed-citation>Nash, J. E. and Sutcliffe, J. V.: River flow forecasting through conceptual
models part I – A discussion of principles, J. Hydrol., 10,
282–290, <ext-link xlink:href="https://doi.org/10.1016/0022-1694(70)90255-6" ext-link-type="DOI">10.1016/0022-1694(70)90255-6</ext-link>, 1970.</mixed-citation></ref>
      <ref id="bib1.bibx35"><?xmltex \def\ref@label{{Neil et~al.(2016)Neil, Pfeiffer, and Liu}}?><label>Neil et al.(2016)Neil, Pfeiffer, and Liu</label><?label Neil2016?><mixed-citation>
Neil, D., Pfeiffer, M., and Liu, S.-C.: Phased LSTM: Accelerating Recurrent
Network Training for Long or Event-based Sequences, in: Advances in Neural
Information Processing Systems 29, edited by: Lee, D. D., Sugiyama, M.,
Luxburg, U. V., Guyon, I., and Garnett, R., pp. 3882–3890, Curran
Associates, Inc., 2016.</mixed-citation></ref>
      <ref id="bib1.bibx36"><?xmltex \def\ref@label{{Newman et~al.(2014a)Newman, Sampson, Clark, Bock, Viger, and
Blodgett}}?><label>Newman et al.(2014a)Newman, Sampson, Clark, Bock, Viger, and
Blodgett</label><?label Newman2014?><mixed-citation>Newman, A., Sampson, K., Clark, M. P., Bock, A., Viger, R., and Blodgett, D.: A
large-sample watershed-scale hydrometeorological dataset for the contiguous
USA, UCAR/NCAR [data set],  <ext-link xlink:href="https://doi.org/10.5065/d6mw2f4d" ext-link-type="DOI">10.5065/d6mw2f4d</ext-link>, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx37"><?xmltex \def\ref@label{{Newman et~al.(2014b)}}?><label>Newman et al.(2014b)</label><?label Newman2014b?><mixed-citation>Newman, A., Sampson, K., Clark, M. P., Bock, A., Viger, R., and Blodgett, D.: CAMELS: Catchment Attributes and Meteorology for Large-sample Studies [data set], Boulder, CO, UCAR/NCAR, <ext-link xlink:href="https://ral.ucar.edu/sites/default/files/public/product-tool/camels-catchment-attributes-and-meteorology-for-large-sample-studies-dataset-downloads/basin_timeseries_v1p2_metForcing_obsFlow.zip">https://ral.ucar.edu/sites/default/files/public/product-tool/camels-catchment-attributes-and-meteorology-for-large-sample-studies-dataset-downloads/basin_timeseries_v1p2_metForcing_obsFlow.zip</ext-link> (last access: 14 April 2021), 2014.</mixed-citation></ref>
      <ref id="bib1.bibx38"><?xmltex \def\ref@label{{Newman et~al.(2017)Newman, Mizukami, Clark, Wood, Nijssen, and
Nearing}}?><label>Newman et al.(2017)Newman, Mizukami, Clark, Wood, Nijssen, and
Nearing</label><?label Newman2017?><mixed-citation>Newman, A., Mizukami, N., Clark, M. P., Wood, A. W., Nijssen, B., and Nearing,
G.: Benchmarking of a Physically Based Hydrologic Model, J.
Hydrometeorol., 18, 2215–2225, <ext-link xlink:href="https://doi.org/10.1175/JHM-D-16-0284.1" ext-link-type="DOI">10.1175/JHM-D-16-0284.1</ext-link>, 2017.</mixed-citation></ref>
      <?pagebreak page2062?><ref id="bib1.bibx39"><?xmltex \def\ref@label{{Olah(2015)}}?><label>Olah(2015)</label><?label Olah2015?><mixed-citation>Olah, C.: Understanding LSTM Networks, colah's blog, available at:
<uri>https://colah.github.io/posts/2015-08-Understanding-LSTMs/</uri> (last access: 14 April 2021), 2015.</mixed-citation></ref>
      <ref id="bib1.bibx40"><?xmltex \def\ref@label{{Olden and Poff(2003)}}?><label>Olden and Poff(2003)</label><?label Olden2003?><mixed-citation>Olden, J. D. and Poff, N. L.: Redundancy and the choice of hydrologic indices
for characterizing streamflow regimes, River Res. Appl., 19,
101–121, <ext-link xlink:href="https://doi.org/10.1002/rra.700" ext-link-type="DOI">10.1002/rra.700</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx41"><?xmltex \def\ref@label{{Salas et~al.(2018)Salas, Somos-Valenzuela, Dugger, Maidment, Gochis,
David, Yu, Ding, Clark, and Noman}}?><label>Salas et al.(2018)Salas, Somos-Valenzuela, Dugger, Maidment, Gochis,
David, Yu, Ding, Clark, and Noman</label><?label Salas2018nwm?><mixed-citation>Salas, F. R., Somos-Valenzuela, M. A., Dugger, A., Maidment, D. R., Gochis,
D. J., David, C. H., Yu, W., Ding, D., Clark, E. P., and Noman, N.: Towards
Real-Time Continental Scale Streamflow Simulation in Continuous and Discrete
Space, J. Am. Water Resour. Assoc., 54, 7–27,
<ext-link xlink:href="https://doi.org/10.1111/1752-1688.12586" ext-link-type="DOI">10.1111/1752-1688.12586</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx42"><?xmltex \def\ref@label{{Sankarasubramanian et~al.(2001)Sankarasubramanian, Vogel, and
Limbrunner}}?><label>Sankarasubramanian et al.(2001)Sankarasubramanian, Vogel, and
Limbrunner</label><?label Sankarasubramanian2001?><mixed-citation>Sankarasubramanian, A., Vogel, R. M., and Limbrunner, J. F.: Climate elasticity
of streamflow in the United States, Water Resour. Res., 37,
1771–1781, <ext-link xlink:href="https://doi.org/10.1029/2000WR900330" ext-link-type="DOI">10.1029/2000WR900330</ext-link>, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx43"><?xmltex \def\ref@label{{Sawicz et~al.(2011)Sawicz, Wagener, Sivapalan, Troch, and
Carrillo}}?><label>Sawicz et al.(2011)Sawicz, Wagener, Sivapalan, Troch, and
Carrillo</label><?label Sawicz2011?><mixed-citation>Sawicz, K., Wagener, T., Sivapalan, M., Troch, P. A., and Carrillo, G.: Catchment classification: empirical analysis of hydrologic similarity based on catchment function in the eastern USA, Hydrol. Earth Syst. Sci., 15, 2895–2911, <ext-link xlink:href="https://doi.org/10.5194/hess-15-2895-2011" ext-link-type="DOI">10.5194/hess-15-2895-2011</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx44"><?xmltex \def\ref@label{{Schmidhuber(1991)}}?><label>Schmidhuber(1991)</label><?label Schmidhuber1991?><mixed-citation>Schmidhuber, J.: Neural Sequence Chunkers, Tech. rep. FKI 148 91, Technische Universität München, Institut für Informatik, 1991.
 </mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bibx45"><?xmltex \def\ref@label{{{United States Geological Survey}(2021)}}?><label>United States Geological Survey(2021)</label><?label USGSIV?><mixed-citation>United States Geological Survey: USGS Instantaneous Values Web Service, available at:
<uri>https://waterservices.usgs.gov/rest/IV-Service.html</uri> (last access:
15 October 2020), 2021.</mixed-citation></ref>
      <ref id="bib1.bibx46"><?xmltex \def\ref@label{{Westerberg and McMillan(2015)}}?><label>Westerberg and McMillan(2015)</label><?label Westerberg2015?><mixed-citation>Westerberg, I. K. and McMillan, H. K.: Uncertainty in hydrological signatures, Hydrol. Earth Syst. Sci., 19, 3951–3968, <ext-link xlink:href="https://doi.org/10.5194/hess-19-3951-2015" ext-link-type="DOI">10.5194/hess-19-3951-2015</ext-link>, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx47"><?xmltex \def\ref@label{{Xia et~al.(2012)Xia, Mitchell, Ek, Sheffield, Cosgrove, Wood, Luo,
Alonge, Wei, Meng, Livneh, Lettenmaier, Koren, Duan, Mo, Fan, and
Mocko}}?><label>Xia et al.(2012)Xia, Mitchell, Ek, Sheffield, Cosgrove, Wood, Luo,
Alonge, Wei, Meng, Livneh, Lettenmaier, Koren, Duan, Mo, Fan, and
Mocko</label><?label Xia2012?><mixed-citation>Xia, Y., Mitchell, K., Ek, M., Sheffield, J., Cosgrove, B., Wood, E., Luo, L.,
Alonge, C., Wei, H., Meng, J., Livneh, B., Lettenmaier, D., Koren, V., Duan,
Q., Mo, K., Fan, Y., and Mocko, D.: Continental-scale water and energy flux
analysis and validation for the North American Land Data Assimilation
System project phase 2 (NLDAS-2): 1. Intercomparison and application of
model products, J. Geophys. Res.-Atmos., 117, D03109,
<ext-link xlink:href="https://doi.org/10.1029/2011JD016048" ext-link-type="DOI">10.1029/2011JD016048</ext-link>, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx48"><?xmltex \def\ref@label{{Yilmaz et~al.(2008)Yilmaz, Gupta, and Wagener}}?><label>Yilmaz et al.(2008)Yilmaz, Gupta, and Wagener</label><?label Yilmaz2008?><mixed-citation>Yilmaz, K. K., Gupta, H. V., and Wagener, T.: A process-based diagnostic
approach to model evaluation: Application to the NWS distributed hydrologic
model, Water Resour. Res., 44, W09417, <ext-link xlink:href="https://doi.org/10.1029/2007WR006716" ext-link-type="DOI">10.1029/2007WR006716</ext-link>, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx49"><?xmltex \def\ref@label{{Zamir et~al.(2020)Zamir, Sax, Cheerla, Suri, Cao, Malik, and
Guibas}}?><label>Zamir et al.(2020)Zamir, Sax, Cheerla, Suri, Cao, Malik, and
Guibas</label><?label Zamir2020?><mixed-citation>
Zamir, A. R., Sax, A., Cheerla, N., Suri, R., Cao, Z., Malik, J., and Guibas,
L. J.: Robust Learning Through Cross-Task Consistency, in: The IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), June 2020 (online), 11197–11206, 2020.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Rainfall–runoff prediction at multiple timescales with a single Long Short-Term Memory network</article-title-html>
<abstract-html><p>Long Short-Term Memory (LSTM) networks have been applied to daily discharge prediction with remarkable success.
Many practical applications, however, require predictions at more granular timescales.
For instance, accurate prediction of short but extreme flood peaks can make a lifesaving difference, yet such peaks may escape the coarse temporal resolution of daily predictions.
Naively training an LSTM on hourly data, however, entails very long input sequences that make learning difficult and computationally expensive.
In this study, we propose two multi-timescale LSTM (MTS-LSTM) architectures that jointly predict multiple timescales within one model, as they process long-past inputs at a different temporal resolution than more recent inputs.
In a benchmark on 516 basins across the continental United States, these models achieved significantly higher Nash–Sutcliffe efficiency (NSE) values than the US National Water Model.
Compared to naive prediction with distinct LSTMs per timescale, the multi-timescale architectures are computationally more efficient with no loss in accuracy.
Beyond prediction quality, the multi-timescale LSTM can process different input variables at different timescales, which is especially relevant to operational applications where the lead time of meteorological forcings depends on their temporal resolution.</p></abstract-html>
<ref-html id="bib1.bib1"><label>Addor et al.(2017a)Addor, Newman, Mizukami, and Clark</label><mixed-citation>
Addor, N., Newman, A. J., Mizukami, N., and Clark, M. P.: The CAMELS data set: catchment attributes and meteorology for large-sample studies, Hydrol. Earth Syst. Sci., 21, 5293–5313, <a href="https://doi.org/10.5194/hess-21-5293-2017" target="_blank">https://doi.org/10.5194/hess-21-5293-2017</a>, 2017a.
</mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Addor et al.(2017b)Addor, Newman, Mizukami, and Clark</label><mixed-citation>
Addor, N.,  Newman, A., Mizukami, M., and Clark, M. P.: Catchment attributes for large-sample studies [data set], Boulder, CO, UCAR/NCAR, <a href="https://doi.org/10.5065/D6G73C3Q" target="_blank">https://doi.org/10.5065/D6G73C3Q</a> (last access: 14 April 2021), 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Addor et al.(2018)Addor, Nearing, Prieto, Newman, Le Vine, and
Clark</label><mixed-citation>
Addor, N., Nearing, G., Prieto, C., Newman, A. J., Le Vine, N., and Clark,
M. P.: A Ranking of Hydrological Signatures Based on Their Predictability in
Space, Water Resour. Res., 54, 8792–8812, <a href="https://doi.org/10.1029/2018WR022606" target="_blank">https://doi.org/10.1029/2018WR022606</a>,
2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Araya et al.(2019)Araya, Valle, and Allende</label><mixed-citation>
Araya, I. A., Valle, C., and Allende, H.: A Multi-Scale Model based on the
Long Short-Term Memory for day ahead hourly wind speed forecasting, Pattern
Recognition Letters, 136, 333–340, <a href="https://doi.org/10.1016/j.patrec.2019.10.011" target="_blank">https://doi.org/10.1016/j.patrec.2019.10.011</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Bengio et al.(1994)Bengio, Simard, and Frasconi</label><mixed-citation>
Bengio, Y., Simard, P., and Frasconi, P.: Learning long-term dependencies with
gradient descent is difficult, IEEE Transactions on Neural Networks, 5,
157–166, <a href="https://doi.org/10.1109/72.279181" target="_blank">https://doi.org/10.1109/72.279181</a>, 1994.
</mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Chung et al.(2016)Chung, Ahn, and Bengio</label><mixed-citation>
Chung, J., Ahn, S., and Bengio, Y.: Hierarchical Multiscale Recurrent Neural
Networks, arXiv preprint, arXiv:1609.01704, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Clausen and Biggs(2000)</label><mixed-citation>
Clausen, B. and Biggs, B. J. F.: Flow variables for ecological studies in
temperate streams: groupings based on covariance, J. Hydrol., 237,
184–197, <a href="https://doi.org/10.1016/S0022-1694(00)00306-1" target="_blank">https://doi.org/10.1016/S0022-1694(00)00306-1</a>, 2000.
</mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Cosgrove and Klemmer(2019)</label><mixed-citation>
Cosgrove, B. and Klemmer, C.: The National Water Model, available at:
<a href="https://water.noaa.gov/about/nwm" target="_blank"/> (last access: 25 January 2021), 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Court(1962)</label><mixed-citation>
Court, A.: Measures of streamflow timing, J. Geophys. Res.
(1896–1977), 67, 4335–4339, <a href="https://doi.org/10.1029/JZ067i011p04335" target="_blank">https://doi.org/10.1029/JZ067i011p04335</a>, 1962.
</mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Frame et al.(2020)Frame, Nearing, Kratzert, and Rahman</label><mixed-citation>
Frame, J., Nearing, G., Kratzert, F., and Rahman, M.: Post processing the
U.S. National Water Model with a Long Short-Term Memory network, EarthArXiv,
<a href="https://doi.org/10.31223/osf.io/4xhac" target="_blank">https://doi.org/10.31223/osf.io/4xhac</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Gauch(2021)</label><mixed-citation>
Gauch, M.: Code for “Rainfall-Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network”, Zenodo [code], <a href="https://doi.org/10.5281/zenodo.4687991" target="_blank">https://doi.org/10.5281/zenodo.4687991</a> (last access: 14 April 2021), 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Gauch and Lin(2020)</label><mixed-citation>
Gauch, M. and Lin, J.: A Data Scientist's Guide to Streamflow Prediction, arXiv
preprint, arXiv:2006.12975, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Gauch et al.(2020a)</label><mixed-citation>
Gauch, M., Kratzert, F., Klotz, D., Nearing, G., Lin, J.,  and Hochreiter, S.: Models and Predictions for “Rainfall-Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network” [data set], Zenodo, <a href="https://doi.org/10.5281/zenodo.4095485" target="_blank">https://doi.org/10.5281/zenodo.4095485</a>, 2020a.
</mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Gauch et al.(2020b)</label><mixed-citation>
Gauch, M., Kratzert, F., Klotz, D., Nearing, G., Lin, J.,  and Hochreiter, S.:  Data for “Rainfall-Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network” [data set], Zenodo, <a href="https://doi.org/10.5281/zenodo.4072701" target="_blank">https://doi.org/10.5281/zenodo.4072701</a>, 2020b.
</mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Gers et al.(1999)Gers, Schmidhuber, and Cummins</label><mixed-citation>
Gers, F. A., Schmidhuber, J., and Cummins, F.: Learning to forget: continual
prediction with LSTM, IET Conference Proceedings, pp. 850–855, 1999.
</mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Gochis et al.(2020)Gochis, Barlage, Cabell, Casali, Dugger,
FitzGerald, McAllister, McCreight, RafieeiNasab, Read, Sampson, Yates, and
Zhang</label><mixed-citation>
Gochis, D. J., Barlage, M., Cabell, R., Casali, M., Dugger, A., FitzGerald, K.,
McAllister, M., McCreight, J., RafieeiNasab, A., Read, L., Sampson, K.,
Yates, D., and Zhang, Y.: The WRF-Hydro<span style="position:relative; bottom:0.5em; " class="text">®</span> modeling system
technical description, available at:
<a href="https://ral.ucar.edu/sites/default/files/public/projects/Technical%20Description%20%26amp%3B%20User%20Guides/wrfhydrov511technicaldescription.pdf" target="_blank"/> (last access: 14 April 2021),
2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Graves et al.(2007)Graves, Fernández, and
Schmidhuber</label><mixed-citation>
Graves, A., Fernández, S., and Schmidhuber, J.: Multi-dimensional Recurrent
Neural Networks, in: Artificial Neural Networks – ICANN 2007, edited by:
de Sá, J. M., Alexandre, L. A., Duch, W., and Mandic, D., pp. 549–558,
Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Greff et al.(2017)Greff, Srivastava, Koutník, Steunebrink, and
Schmidhuber</label><mixed-citation>
Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., and
Schmidhuber, J.: LSTM: A Search Space Odyssey, IEEE Transactions on Neural
Networks and Learning Systems, 28, 2222–2232,
<a href="https://doi.org/10.1109/TNNLS.2016.2582924" target="_blank">https://doi.org/10.1109/TNNLS.2016.2582924</a>, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Gupta et al.(2009)Gupta, Kling, Yilmaz, and Martinez</label><mixed-citation>
Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.: Decomposition of
the mean squared error and NSE performance criteria: implications for
improving hydrological modelling, J. Hydrol., 377, 80–91,
<a href="https://doi.org/10.1016/j.jhydrol.2009.08.003" target="_blank">https://doi.org/10.1016/j.jhydrol.2009.08.003</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>He et al.(2016)He, Zhang, Ren, and Sun</label><mixed-citation>
He, K., Zhang, X., Ren, S., and Sun, J.: Deep Residual Learning for Image
Recognition, in: Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (CVPR), June 2016, Las Vegas, Nevada, 770–778,  2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Hochreiter and Schmidhuber(1997)</label><mixed-citation>
Hochreiter, S. and Schmidhuber, J.: Long Short-Term Memory, Neural
Computation, 9, 1735–1780, <a href="https://doi.org/10.1162/neco.1997.9.8.1735" target="_blank">https://doi.org/10.1162/neco.1997.9.8.1735</a>, 1997.
</mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Hoedt et al.(2021)Hoedt, Kratzert, Klotz, Halmich, Holzleitner,
Nearing, Hochreiter, and Klambauer</label><mixed-citation>
Hoedt, P.-J., Kratzert, F., Klotz, D., Halmich, C., Holzleitner, M., Nearing,
G., Hochreiter, S., and Klambauer, G.: MC-LSTM: Mass-Conserving LSTM, available at: <a href="https://arxiv.org/abs/2101.05186" target="_blank"/>,
2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Jozefowicz et al.(2015)Jozefowicz, Zaremba, and
Sutskever</label><mixed-citation>
Jozefowicz, R., Zaremba, W., and Sutskever, I.: An Empirical Exploration of
Recurrent Network Architectures, in: Proceedings of the 32nd International
Conference on Machine Learning, edited by: Bach, F. and Blei, D., vol. 37 of
Proceedings of Machine Learning Research, pp. 2342–2350, PMLR,
Lille, France, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Klotz et al.(2021)Klotz, Kratzert, Gauch, Sampson, Klambauer,
Hochreiter, and Nearing</label><mixed-citation>
Klotz, D., Kratzert, F., Gauch, M., Keefe Sampson, A., Brandstetter, J., Klambauer, G., Hochreiter, S., and Nearing, G.: Uncertainty Estimation with Deep Learning for Rainfall–Runoff Modelling, Hydrol. Earth Syst. Sci. Discuss. [preprint], <a href="https://doi.org/10.5194/hess-2021-154" target="_blank">https://doi.org/10.5194/hess-2021-154</a>, in review, 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Koutník et al.(2014)Koutník, Greff, Gomez, and
Schmidhuber</label><mixed-citation>
Koutník, J., Greff, K., Gomez, F., and Schmidhuber, J.: A Clockwork RNN,
arXiv preprint, arXiv:1402.3511, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Kratzert et al.(2018)Kratzert, Klotz, Brenner, Schulz, and
Herrnegger</label><mixed-citation>
Kratzert, F., Klotz, D., Brenner, C., Schulz, K., and Herrnegger, M.: Rainfall–runoff modelling using Long Short-Term Memory (LSTM) networks, Hydrol. Earth Syst. Sci., 22, 6005–6022, <a href="https://doi.org/10.5194/hess-22-6005-2018" target="_blank">https://doi.org/10.5194/hess-22-6005-2018</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>Kratzert et al.(2020)</label><mixed-citation>
Kratzert, F., Gauch, M., and Klotz, D.: NeuralHydrology Python Library, Zenodo [code], <a href="https://doi.org/10.5281/zenodo.4688003" target="_blank">https://doi.org/10.5281/zenodo.4688003</a> (last access: 14 April 2021), 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Kratzert(2019)</label><mixed-citation>
Kratzert, F.: CAMELS Extended Maurer Forcing Data, HydroShare [data set], <a href="https://doi.org/10.4211/hs.17c896843cf940339c3c3496d0c1c077" target="_blank">https://doi.org/10.4211/hs.17c896843cf940339c3c3496d0c1c077</a> (last access: 14 April 2021), 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Kratzert et al.(2019)Kratzert, Klotz, Shalev, Klambauer, Hochreiter,
and Nearing</label><mixed-citation>
Kratzert, F., Klotz, D., Shalev, G., Klambauer, G., Hochreiter, S., and Nearing, G.: Towards learning universal, regional, and local hydrological behaviors via machine learning applied to large-sample datasets, Hydrol. Earth Syst. Sci., 23, 5089–5110, <a href="https://doi.org/10.5194/hess-23-5089-2019" target="_blank">https://doi.org/10.5194/hess-23-5089-2019</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Kratzert et al.(2020)Kratzert, Klotz, Hochreiter, and
Nearing</label><mixed-citation>
Kratzert, F., Klotz, D., Hochreiter, S., and Nearing, G. S.: A note on leveraging synergy in multiple meteorological datasets with deep learning for rainfall-runoff modeling, Hydrol. Earth Syst. Sci. Discuss. [preprint], <a href="https://doi.org/10.5194/hess-2020-221" target="_blank">https://doi.org/10.5194/hess-2020-221</a>, in review, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Ladson et al.(2013)Ladson, Brown, Neal, and Nathan</label><mixed-citation>
Ladson, T. R., Brown, R., Neal, B., and Nathan, R.: A Standard Approach to
Baseflow Separation Using The Lyne and Hollick Filter, Australasian
J. Water Res., 17, 25–34, available at: <a href="https://www.tandfonline.com/doi/ref/10.7158/13241583.2013.11465417" target="_blank"/> (last access: 14 April 2021),
2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Lechner and Hasani(2020)</label><mixed-citation>
Lechner, M. and Hasani, R.: Learning Long-Term Dependencies in
Irregularly-Sampled Time Series, arXiv preprint, arXiv:2006.04418, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Mozer(1991)</label><mixed-citation>
Mozer, M.: Induction of Multiscale Temporal Structure, in: Advances in Neural
Information Processing Systems 4, edited by: Moody, J. E., Hanson, S. J., and
Lippmann, R., pp. 275–282, Morgan Kaufmann, 1991.
</mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Nash and Sutcliffe(1970)</label><mixed-citation>
Nash, J. E. and Sutcliffe, J. V.: River flow forecasting through conceptual
models part I – A discussion of principles, J. Hydrol., 10,
282–290, <a href="https://doi.org/10.1016/0022-1694(70)90255-6" target="_blank">https://doi.org/10.1016/0022-1694(70)90255-6</a>, 1970.
</mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Neil et al.(2016)Neil, Pfeiffer, and Liu</label><mixed-citation>
Neil, D., Pfeiffer, M., and Liu, S.-C.: Phased LSTM: Accelerating Recurrent
Network Training for Long or Event-based Sequences, in: Advances in Neural
Information Processing Systems 29, edited by: Lee, D. D., Sugiyama, M.,
Luxburg, U. V., Guyon, I., and Garnett, R., pp. 3882–3890, Curran
Associates, Inc., 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Newman et al.(2014a)Newman, Sampson, Clark, Bock, Viger, and
Blodgett</label><mixed-citation>
Newman, A., Sampson, K., Clark, M. P., Bock, A., Viger, R., and Blodgett, D.: A
large-sample watershed-scale hydrometeorological dataset for the contiguous
USA, UCAR/NCAR [data set],  <a href="https://doi.org/10.5065/d6mw2f4d" target="_blank">https://doi.org/10.5065/d6mw2f4d</a>, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Newman et al.(2014b)</label><mixed-citation>
Newman, A., Sampson, K., Clark, M. P., Bock, A., Viger, R., and Blodgett, D.: CAMELS: Catchment Attributes and Meteorology for Large-sample Studies [data set], Boulder, CO, UCAR/NCAR, <a href="https://ral.ucar.edu/sites/default/files/public/product-tool/camels-catchment-attributes-and-meteorology-for-large-sample-studies-dataset-downloads/basin_timeseries_v1p2_metForcing_obsFlow.zip" target="_blank">https://ral.ucar.edu/sites/default/files/public/product-tool/camels-catchment-attributes-and-meteorology-for-large-sample-studies-dataset-downloads/basin_timeseries_v1p2_metForcing_obsFlow.zip</a> (last access: 14 April 2021), 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Newman et al.(2017)Newman, Mizukami, Clark, Wood, Nijssen, and
Nearing</label><mixed-citation>
Newman, A., Mizukami, N., Clark, M. P., Wood, A. W., Nijssen, B., and Nearing,
G.: Benchmarking of a Physically Based Hydrologic Model, J.
Hydrometeorol., 18, 2215–2225, <a href="https://doi.org/10.1175/JHM-D-16-0284.1" target="_blank">https://doi.org/10.1175/JHM-D-16-0284.1</a>, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Olah(2015)</label><mixed-citation>
Olah, C.: Understanding LSTM Networks, colah's blog, available at:
<a href="https://colah.github.io/posts/2015-08-Understanding-LSTMs/" target="_blank"/> (last access: 14 April 2021), 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Olden and Poff(2003)</label><mixed-citation>
Olden, J. D. and Poff, N. L.: Redundancy and the choice of hydrologic indices
for characterizing streamflow regimes, River Res. Appl., 19,
101–121, <a href="https://doi.org/10.1002/rra.700" target="_blank">https://doi.org/10.1002/rra.700</a>, 2003.
</mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Salas et al.(2018)Salas, Somos-Valenzuela, Dugger, Maidment, Gochis,
David, Yu, Ding, Clark, and Noman</label><mixed-citation>
Salas, F. R., Somos-Valenzuela, M. A., Dugger, A., Maidment, D. R., Gochis,
D. J., David, C. H., Yu, W., Ding, D., Clark, E. P., and Noman, N.: Towards
Real-Time Continental Scale Streamflow Simulation in Continuous and Discrete
Space, J. Am. Water Resour. Assoc., 54, 7–27,
<a href="https://doi.org/10.1111/1752-1688.12586" target="_blank">https://doi.org/10.1111/1752-1688.12586</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Sankarasubramanian et al.(2001)Sankarasubramanian, Vogel, and
Limbrunner</label><mixed-citation>
Sankarasubramanian, A., Vogel, R. M., and Limbrunner, J. F.: Climate elasticity
of streamflow in the United States, Water Resour. Res., 37,
1771–1781, <a href="https://doi.org/10.1029/2000WR900330" target="_blank">https://doi.org/10.1029/2000WR900330</a>, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Sawicz et al.(2011)Sawicz, Wagener, Sivapalan, Troch, and
Carrillo</label><mixed-citation>
Sawicz, K., Wagener, T., Sivapalan, M., Troch, P. A., and Carrillo, G.: Catchment classification: empirical analysis of hydrologic similarity based on catchment function in the eastern USA, Hydrol. Earth Syst. Sci., 15, 2895–2911, <a href="https://doi.org/10.5194/hess-15-2895-2011" target="_blank">https://doi.org/10.5194/hess-15-2895-2011</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Schmidhuber(1991)</label><mixed-citation>
Schmidhuber, J.: Neural Sequence Chunkers, Tech. rep. FKI 148 91, Technische Universität München, Institut für Informatik, 1991.

</mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>United States Geological Survey(2021)</label><mixed-citation>
United States Geological Survey: USGS Instantaneous Values Web Service, available at:
<a href="https://waterservices.usgs.gov/rest/IV-Service.html" target="_blank"/> (last access:
15 October 2020), 2021.
</mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Westerberg and McMillan(2015)</label><mixed-citation>
Westerberg, I. K. and McMillan, H. K.: Uncertainty in hydrological signatures, Hydrol. Earth Syst. Sci., 19, 3951–3968, <a href="https://doi.org/10.5194/hess-19-3951-2015" target="_blank">https://doi.org/10.5194/hess-19-3951-2015</a>, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Xia et al.(2012)Xia, Mitchell, Ek, Sheffield, Cosgrove, Wood, Luo,
Alonge, Wei, Meng, Livneh, Lettenmaier, Koren, Duan, Mo, Fan, and
Mocko</label><mixed-citation>
Xia, Y., Mitchell, K., Ek, M., Sheffield, J., Cosgrove, B., Wood, E., Luo, L.,
Alonge, C., Wei, H., Meng, J., Livneh, B., Lettenmaier, D., Koren, V., Duan,
Q., Mo, K., Fan, Y., and Mocko, D.: Continental-scale water and energy flux
analysis and validation for the North American Land Data Assimilation
System project phase 2 (NLDAS-2): 1. Intercomparison and application of
model products, J. Geophys. Res.-Atmos., 117, D03109,
<a href="https://doi.org/10.1029/2011JD016048" target="_blank">https://doi.org/10.1029/2011JD016048</a>, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Yilmaz et al.(2008)Yilmaz, Gupta, and Wagener</label><mixed-citation>
Yilmaz, K. K., Gupta, H. V., and Wagener, T.: A process-based diagnostic
approach to model evaluation: Application to the NWS distributed hydrologic
model, Water Resour. Res., 44, W09417, <a href="https://doi.org/10.1029/2007WR006716" target="_blank">https://doi.org/10.1029/2007WR006716</a>, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Zamir et al.(2020)Zamir, Sax, Cheerla, Suri, Cao, Malik, and
Guibas</label><mixed-citation>
Zamir, A. R., Sax, A., Cheerla, N., Suri, R., Cao, Z., Malik, J., and Guibas,
L. J.: Robust Learning Through Cross-Task Consistency, in: The IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), June 2020 (online), 11197–11206, 2020.
</mixed-citation></ref-html>--></article>
