<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing with OASIS Tables v3.0 20080202//EN" "journalpub-oasis3.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:oasis="http://docs.oasis-open.org/ns/oasis-exchange/table" xml:lang="en" dtd-version="3.0"><?xmltex \makeatother\@nolinetrue\makeatletter?>
  <front>
    <journal-meta><journal-id journal-id-type="publisher">HESS</journal-id><journal-title-group>
    <journal-title>Hydrology and Earth System Sciences</journal-title>
    <abbrev-journal-title abbrev-type="publisher">HESS</abbrev-journal-title><abbrev-journal-title abbrev-type="nlm-ta">Hydrol. Earth Syst. Sci.</abbrev-journal-title>
  </journal-title-group><issn pub-type="epub">1607-7938</issn><publisher>
    <publisher-name>Copernicus Publications</publisher-name>
    <publisher-loc>Göttingen, Germany</publisher-loc>
  </publisher></journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.5194/hess-25-2997-2021</article-id><title-group><article-title>Evaluation of random forests for short-term daily streamflow forecasting in rainfall- and snowmelt-driven watersheds</article-title><alt-title>Evaluation of random forests for short-term daily streamflow forecasting</alt-title>
      </title-group><?xmltex \runningtitle{Evaluation of random forests for short-term daily streamflow forecasting}?><?xmltex \runningauthor{L. T. Pham et al.}?>
      <contrib-group>
        <contrib contrib-type="author" corresp="yes" rid="aff1">
          <name><surname>Pham</surname><given-names>Leo Triet</given-names></name>
          <email>phamleo@msu.edu</email>
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff2">
          <name><surname>Luo</surname><given-names>Lifeng</given-names></name>
          
        </contrib>
        <contrib contrib-type="author" corresp="no" rid="aff1 aff2">
          <name><surname>Finley</surname><given-names>Andrew</given-names></name>
          
        </contrib>
        <aff id="aff1"><label>1</label><institution>Department of Forestry, Michigan State University, East Lansing, Michigan, USA</institution>
        </aff>
        <aff id="aff2"><label>2</label><institution>Department of Geography, Environment, and Spatial Sciences, Michigan State University, East Lansing, Michigan, USA</institution>
        </aff>
      </contrib-group>
      <author-notes><corresp id="corr1">Leo Triet Pham (phamleo@msu.edu)</corresp></author-notes><pub-date><day>3</day><month>June</month><year>2021</year></pub-date>
      
      <volume>25</volume>
      <issue>6</issue>
      <fpage>2997</fpage><lpage>3015</lpage>
      <history>
        <date date-type="received"><day>18</day><month>June</month><year>2020</year></date>
           <date date-type="rev-request"><day>23</day><month>June</month><year>2020</year></date>
           <date date-type="rev-recd"><day>6</day><month>April</month><year>2021</year></date>
           <date date-type="accepted"><day>15</day><month>April</month><year>2021</year></date>
      </history>
      <permissions>
        <copyright-statement>Copyright: © 2021 Leo Triet Pham et al.</copyright-statement>
        <copyright-year>2021</copyright-year>
      <license license-type="open-access"><license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri" xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p></license></permissions><self-uri xlink:href="https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021.html">This article is available from https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021.html</self-uri><self-uri xlink:href="https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021.pdf">The full text article is available as a PDF file from https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021.pdf</self-uri>
      <abstract><title>Abstract</title>
    <p id="d1e105">In the past decades, data-driven machine-learning (ML) models have emerged as promising tools for short-term streamflow forecasting. Among other qualities, the popularity of ML models for such applications is due to their relative ease in implementation, less strict distributional assumption, and competitive computational and predictive performance. Despite the encouraging results, most applications of ML for streamflow forecasting have been limited to watersheds in which rainfall is the major source of runoff. In this study, we evaluate the potential of random forests (RFs), a popular ML method, to make streamflow forecasts at 1 d of lead time at 86 watersheds in the Pacific Northwest. These watersheds cover diverse climatic conditions and physiographic settings and exhibit varied contributions of rainfall and snowmelt to their streamflow. Watersheds are classified into three hydrologic regimes based on the timing of center-of-annual flow volume: rainfall-dominated, transient, and snowmelt-dominated. RF performance is benchmarked against naïve  and multiple linear regression (MLR) models and evaluated using four criteria: coefficient of determination, root mean squared error, mean absolute error, and Kling–Gupta efficiency (KGE). Model evaluation scores suggest that the RF performs better in snowmelt-driven watersheds compared to rainfall-driven watersheds. The largest improvements in forecasts compared to benchmark models are found among rainfall-driven watersheds. RF performance deteriorates with increases in catchment slope and soil sandiness. We note disagreement between two popular measures of RF variable importance and recommend jointly considering these measures with the physical processes under study. These and other results presented provide new insights for effective application of RF-based streamflow forecasting.</p>
  </abstract>
    </article-meta>
  </front>
<body>
      

<sec id="Ch1.S1" sec-type="intro">
  <label>1</label><title>Introduction</title>
      <p id="d1e117">Nearly all aspects of water resource management, risk assessment, and early-warning systems for floods rely on accurate streamflow forecast. Yet streamflow forecasting remains a challenging task due to the dynamic nature of runoff in response to spatial and temporal variability in rainfall and catchment characteristics. Therefore, development of skillful and robust streamflow models is an active area of study in hydrology and related engineering disciplines.</p>
      <p id="d1e120">While physical models remain a common and powerful tool for predicting streamflow, ML models are gaining popularity due to some of their unique qualities and potential advantages. Compared with the often labor-intensive and computationally expensive task of parameterizing in a physical model <xref ref-type="bibr" rid="bib1.bibx69 bib1.bibx5" id="paren.1"/>, ML models are data-driven and can identify patterns in the input–output relationship without explicit knowledge of the physical processes and onerous computational demand. To make up for their limited ability to provide interpretation of the underlying mechanisms, ML models often require less calibration data than physical models, have demonstrated high accuracy in their predictive performance, are computationally efficient, and can be used in real-time forecasting <xref ref-type="bibr" rid="bib1.bibx1 bib1.bibx44" id="paren.2"/>. ML models are particularly useful when accurate prediction is the central inferential goal <xref ref-type="bibr" rid="bib1.bibx16" id="paren.3"/>, whereas a conceptual rainfall–runoff model can provide a better understanding of hydrologic phenomena and catchment yields and responses <xref ref-type="bibr" rid="bib1.bibx65" id="paren.4"/>. Artificial neural networks (ANNs), neuro-fuzzy methods (a combination of ANNs and fuzzy logic), support vector machines (SVMs), and<?pagebreak page2998?> decision trees (DTs) are reported to be among the most popular and effective for both short-term and long-term flood forecast <xref ref-type="bibr" rid="bib1.bibx44" id="paren.5"/>. For example, <xref ref-type="bibr" rid="bib1.bibx14" id="text.6"/> provided flood risk estimation at ungauged sites using an ANN at catchments across the United Kingdom. <xref ref-type="bibr" rid="bib1.bibx57" id="text.7"/> predicted streamflow at lead times of 1–7 d with local observations and climate indices using three ML methods: Bayesian neural network (BNN), SVM, and Gaussian process (GP). They found that BNN outperformed multiple linear regression (MLR) and the other two ML models. Their study also found that models trained using climate indices yielded improved longer lead time forecasts (e.g., 5–7 d). <xref ref-type="bibr" rid="bib1.bibx70" id="text.8"/> forecasted daily streamflow in four rivers in the United States with SVR, ANN, and RF coupled with a baseflow separation method (i.e., separating the two different components of streamflow into baseflow and surface flow). <xref ref-type="bibr" rid="bib1.bibx47" id="text.9"/> compared eight parametric, semi-parametric, and non-parametric ML algorithms to forecast urban reservoir levels in Atlanta, Georgia. Their results showed that RF yielded the most accurate forecasts.</p>
      <p id="d1e151">Despite the promising results reported in the existing literature, most ML streamflow forecast applications are limited to watersheds in which rainfall is the major contributor. In many settings, particularly non-arid mountainous regions in the western USA, a combination of rainfall and spring snowmelt can drive streamflow <xref ref-type="bibr" rid="bib1.bibx28 bib1.bibx32" id="paren.10"/>. The amount of snow accumulation and its contribution to discharge also vary among the watersheds <xref ref-type="bibr" rid="bib1.bibx31" id="paren.11"/>. Both watershed-scale hydrologic and statistical models have been used to assess the current and future stream hydrology and associated flood risks <xref ref-type="bibr" rid="bib1.bibx61 bib1.bibx77 bib1.bibx68 bib1.bibx49" id="paren.12"/>. <xref ref-type="bibr" rid="bib1.bibx60" id="text.13"/> simulated streamflows in 217 watersheds at annual and seasonal timescales using the variable infiltration capacity (VIC) model at <inline-formula><mml:math id="M1" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">16</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M2" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">20</mml:mn><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:mrow></mml:math></inline-formula> spatial resolutions. The study found that the model was able to capture the hydrologic behavior of the studied watersheds with reasonable accuracy. Yet the authors recommend that careful site-specific model calibration, using not only streamflow but also snow water equivalent (SWE) data,  would be expected to improve model performance and reduce model bias. <xref ref-type="bibr" rid="bib1.bibx49" id="text.14"/> applied <inline-formula><mml:math id="M3" display="inline"><mml:mi>Z</mml:mi></mml:math></inline-formula>-score regression to daily SWE from Snow Telemetry (SNOTEL) stations and year-to-date precipitation data to predict seasonal streamflow volume in unregulated streams in the western US. The authors reported that the skill of these forecasts is comparable to the official published outlooks. A natural question is whether ML models can produce a comparable performance in these watersheds in which streamflow contributions come from a mixture of snowmelt and rainfall and in which snowmelt dominates sources. Considering the prominent role of snowpack in water management and the contribution of rapid snowmelt to flood events, such a question is worth exploring. To this end, we evaluate the potential of RF in making short-term streamflow forecasts at 1 d of lead time across 86 watersheds in the Pacific Northwest Hydrologic Region (Fig. <xref ref-type="fig" rid="Ch1.F1"/>). The <xref ref-type="bibr" rid="bib1.bibx72" id="text.15"/> defines this region as hydrologic region 17 or HUC-17. HUC-17 consists of sub-basins and watersheds of the Columbia River that span varying hydrologic regimes. The selected watersheds have long-term records of unregulated streamflow and different streamflow contributions of rainfall and snowmelt. Drainage basin factors such as topography, vegetation, and soil can affect the response time and mechanisms of runoff <xref ref-type="bibr" rid="bib1.bibx17" id="paren.16"/>. Few studies have attempted to account for or report these effects on model performance. Without such consideration, it is difficult to determine if a data-driven model can be generalized to watersheds not included in the given study. Therefore, our objectives are (1) to examine and compare the performance of RF in a number of watersheds across hydrologic regimes and (2) to explore the role of catchment characteristics in model performance that are overlooked in previous studies.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F1" specific-use="star"><?xmltex \currentcnt{1}?><?xmltex \def\figurename{Figure}?><label>Figure 1</label><caption><p id="d1e216"><bold>(a)</bold> Elevation (m) shading map showing the Pacific Northwest Hydrologic Unit, 86 selected stream gauges (triangles), and their drainage area (cyan delineation lines), as well as SNOTEL stations (brown squares). Examples of annual hydrographs of <bold>(b)</bold> rainfall-dominated, <bold>(c)</bold> transient, and <bold>(d)</bold> snowmelt-dominated  watersheds. Panels <bold>(b–d)</bold> are based on 2009–2018 daily flow data at three sites: 12043300 (48.2<inline-formula><mml:math id="M4" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> N, 124.4<inline-formula><mml:math id="M5" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> W), 12048000 (48<inline-formula><mml:math id="M6" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> N, 123.1<inline-formula><mml:math id="M7" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> W), and 10396000 (42.7<inline-formula><mml:math id="M8" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> N, 118.9<inline-formula><mml:math id="M9" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula> W), respectively.</p></caption>
        <?xmltex \igopts{width=341.433071pt}?><graphic xlink:href="https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021-f01.png"/>

      </fig>

      <p id="d1e294">In practice, RF can be trained to forecast streamflow at various timescales depending on  the input variables provided. <xref ref-type="bibr" rid="bib1.bibx57" id="text.17"/> forecasted streamflow at 1–7 d lead times using three ML models and data from combinations of climate indices and local meteo-hydrologic observations. The authors concluded that models with local observations as predictors were generally best at shorter lead times, while models with local observations plus climate indices were best at longer lead times of 5–7 d. Also, the skillfullness of all three models decreased with increasing lead times. In our study, we focused on 1 d lead time forecasting and therefore did not include long-term climate information. At longer lead times, changes in weather conditions would likely exert much greater control on runoff and the performance of the model.</p>
      <?pagebreak page2999?><p id="d1e300">We select RF to forecast streamflow for two reasons. First, RF has been referenced to deliver high performance in short-term streamflow forecasts <xref ref-type="bibr" rid="bib1.bibx44 bib1.bibx52 bib1.bibx36 bib1.bibx63" id="paren.18"/>, making it a good candidate for our study. Second, RF allows for some level of interpretability. This is delivered through two measures of predictive contribution of variables: mean decrease in accuracy (MDA) and mean decrease in node impurity (MDI). These two measures have been widely used as means for variable selection in classification and regression studies in bioinformatics <xref ref-type="bibr" rid="bib1.bibx11" id="paren.19"/>, remote sensing classification <xref ref-type="bibr" rid="bib1.bibx50" id="paren.20"/>, and flood hazard risk assessment <xref ref-type="bibr" rid="bib1.bibx76" id="paren.21"/>. The interpretability of an ML model, however, can be a controversial subject and remains an active area of study <xref ref-type="bibr" rid="bib1.bibx59 bib1.bibx9" id="paren.22"/>. Both model-agnostic interpretation methods such as permutation-based feature importance <xref ref-type="bibr" rid="bib1.bibx6" id="paren.23"/> and model-specific interpretation methods, such as Gini-based for RF <xref ref-type="bibr" rid="bib1.bibx7" id="paren.24"/> and gradient-based for ANNs <xref ref-type="bibr" rid="bib1.bibx64" id="paren.25"/>, can provide useful insights into how ML models make their predictions. While the  interpretability does not directly translate to the interpretation of the physical processes, it can provide insight into relationships among predictors and streamflow response.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F2" specific-use="star"><?xmltex \currentcnt{2}?><?xmltex \def\figurename{Figure}?><label>Figure 2</label><caption><p id="d1e330">Structure of an RF and relevant parameters.</p></caption>
        <?xmltex \igopts{width=497.923228pt}?><graphic xlink:href="https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021-f02.png"/>

      </fig>

      <p id="d1e339">The remainder of the paper is arranged as follows. Section <xref ref-type="sec" rid="Ch1.S2"/> provides a brief introduction to RF, relevant parameters (which can also be referred to as “hyper-parameters” in the ML literature), and selected evaluation criteria. Section <xref ref-type="sec" rid="Ch1.S3"/> describes the study area, datasets, and predictor selection. Results and discussion are given in Sect. <xref ref-type="sec" rid="Ch1.S4"/> along with limitations and recommendations for future research. A summary and indications for future work are provided in Sect. <xref ref-type="sec" rid="Ch1.S5"/>.</p>
</sec>
<sec id="Ch1.S2">
  <label>2</label><title>Methodology</title>
<sec id="Ch1.S2.SS1">
  <label>2.1</label><title>Random forests</title>
      <p id="d1e365">Proposed by <xref ref-type="bibr" rid="bib1.bibx6" id="text.26"/>, RF is a supervised, non-parametric algorithm within the decision tree family that comprises an ensemble of decorrelated trees to yield prediction for classification and regression tasks. Non-parametric methods such as RF do not assume any particular family for the distribution of the data <xref ref-type="bibr" rid="bib1.bibx2" id="paren.27"/>. Since a single decision tree can produce high variance and is prone to noise <xref ref-type="bibr" rid="bib1.bibx27" id="paren.28"/>, RF addresses this limitation by generating multiple trees, with each tree built on a bootstrapped sample of the training data (Fig. <xref ref-type="fig" rid="Ch1.F2"/>, Algorithm 1). Each time a binary split is made in a tree (also<?pagebreak page3000?> known as a split node), a random subset of predictors (without replacement) from the full set of predictor variables is considered (Fig. <xref ref-type="fig" rid="Ch1.F2"/>). One predictor from these candidates is used to make the split where the expected sum variances of the response variable in the two resulting nodes is minimized (Algorithm 1, Step 3). The randomization process in generating the subset of features prevents one or more particularly strong predictor from getting repeatedly chosen at each split, resulting in highly correlated trees <xref ref-type="bibr" rid="bib1.bibx6" id="paren.29"/>. After all the trees are grown, the forests make a prediction on a new data point by having all trees run through the predictors. In the end, the forests cast a majority vote on a label class for the classification task or produce a value for the regression task by averaging all predictions. <xref ref-type="bibr" rid="bib1.bibx6" id="text.30"/> provided full details on RF and its merit. The <monospace>randomForest</monospace> package in R developed by <xref ref-type="bibr" rid="bib1.bibx37" id="text.31"/> was used for model training and validation in our study. The step-by-step process of building a regression RF follows Algorithm 1.</p>
      <p id="d1e394"><?xmltex \hack{\newpage}?><?xmltex \igopts{width=227.622047pt}?><inline-graphic xlink:href="https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021-g01.png"/></p>
      <p id="d1e401">Due to sampling with replacement, some observations may not be selected during the bootstrap. These are referred to as out-of-bag or OOB and used to estimate the error of the tree on unseen data. It has been estimated that approximately 37 % of samples constitute OOB data <xref ref-type="bibr" rid="bib1.bibx24" id="paren.32"/>. An average OOB error is calculated for each subsequently added tree to provide an estimate of the performance gain. The OOB error can be particularly sensitive to the number of random predictors used at each split <monospace>mtry</monospace> and number of trees <monospace>ntree</monospace> <xref ref-type="bibr" rid="bib1.bibx24" id="paren.33"/>. Generally, the<?pagebreak page3001?> predictive performance improves (or OOB error decreases) as <monospace>ntree</monospace> increases. However, recent research has shown that depending on the dataset, there is a limit for the number of trees at which additional growing does not improve performance <xref ref-type="bibr" rid="bib1.bibx48" id="paren.34"/>. It has been advised that <monospace>mtry</monospace> is set to no larger than <inline-formula><mml:math id="M10" display="inline"><mml:mrow><mml:mn mathvariant="normal">1</mml:mn><mml:mo>/</mml:mo><mml:mn mathvariant="normal">3</mml:mn></mml:mrow></mml:math></inline-formula> of the total number of predictors for optimal regression prediction <xref ref-type="bibr" rid="bib1.bibx37" id="paren.35"/>, which is also the default value in the <monospace>randomForest</monospace> function in R that is widely adopted in the literature.  Nevertheless, <xref ref-type="bibr" rid="bib1.bibx24" id="text.36"/> found that this value is dataset-dependent and could be tuned to improve the performance of RF. <xref ref-type="bibr" rid="bib1.bibx4" id="text.37"/> argued that the number of relevant predictors highly influences the optimal <monospace>mtry</monospace> value. In this study, we select the optimal <monospace>mtry</monospace> using an exhaustive search strategy, in which all possible values of <monospace>mtry</monospace> are considered, using the R package <monospace>Caret</monospace> <xref ref-type="bibr" rid="bib1.bibx33" id="paren.38"/>. While all considered parameters might have an effect on the performance of RF, we chose to focus on two parameters, <monospace>ntree</monospace> and <monospace>mtry</monospace>, for a number of reasons. The main reason is that these two parameters were originally introduced by <xref ref-type="bibr" rid="bib1.bibx6" id="text.39"/> in the development of the RF algorithm. Second, <monospace>ntree</monospace> in a forest is a parameter that is tunable but not optimized and should be set sufficiently high <xref ref-type="bibr" rid="bib1.bibx48 bib1.bibx55" id="paren.40"/> for RF to achieve good performance. It has been theoretically proven that more trees are always better <xref ref-type="bibr" rid="bib1.bibx55" id="paren.41"/>. In other words, an optimal <monospace>ntree</monospace> value can go to infinity. The reduction in error, however, becomes negligible after a sufficiently large number of trees. Furthermore, empirical results provided in previous works suggest that <monospace>mtry</monospace> is the most influential of the parameters in RF <xref ref-type="bibr" rid="bib1.bibx4 bib1.bibx73 bib1.bibx55" id="paren.42"/>. Figure <xref ref-type="fig" rid="Ch1.F2"/> illustrates the step-by-step operating principle of growing RF and its the relevant parameters.</p>
</sec>
<sec id="Ch1.S2.SS2">
  <label>2.2</label><title>Variable importance in random forests</title>
      <p id="d1e505">In addition to assessing a model's overall predictive ability, there is also interest in understanding the contribution of each predictor variable to model performance. There are two built-in measures for assessing variable importance in RF:  mean decrease in accuracy (MDA) and mean decrease in node impurity (MDI). Both were developed by Breiman <xref ref-type="bibr" rid="bib1.bibx7 bib1.bibx6" id="paren.43"/>. After all trees are grown, OOB data during training are used to compute the first measure. At each tree, the mean squared error (MSE) between predicted and observed is calculated. Then the values of each of the <inline-formula><mml:math id="M11" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> predictors are randomly permuted with other predictor variables held constant. The difference between the previous and new MSE is averaged over all trees. This is considered the predictor variable's MDA <xref ref-type="bibr" rid="bib1.bibx37" id="paren.44"/> and values are reported in percent difference in MSE. The procedure is repeated for each predictor variable. Given that there is a strong association between a predictor and response variable, breaking such a bond would potentially result in large error in the prediction (i.e., large MDA). The MDA value can be negative when a predictor has no predictive power and adds noise to the model. <xref ref-type="bibr" rid="bib1.bibx67" id="text.45"/>, however, expressed caution that permutation-based measures such as MDA could show a bias towards correlated predictor variables by overestimating their importance, particularly in high-dimensional datasets.</p>
      <p id="d1e524">The second method, MDI, measures each time a predictor is selected to make a split during training. It is based on the principle that a binary split only occurs when residual errors (or impurity) of two descendent nodes are less than that of their parent node. The MDI of a predictor is the sum of all gains across all trees divided by the number of trees. Because the scale of MDI depends on values of the response variable, raw MDI provides little interpretation. Following <xref ref-type="bibr" rid="bib1.bibx76" id="text.46"/>, we computed relative MDI for each variable, which in our case is calculated by dividing each predictor variable's MDI by the sum of MDI from all predictors at each watershed. When scaled by 100, this relative MDI is a percentage and can be interpreted as the relative contribution of each predictor to the total reduction in node impurities. In the case in which a predictor makes no contribution during the splitting, the relative MDI would be effectively zero. For both measures, the larger the value, the more important the predictor.</p>
</sec>
<sec id="Ch1.S2.SS3">
  <label>2.3</label><title>Benchmark models</title>
      <p id="d1e538">We benchmark the performance of RF during the validation period against multiple linear regression (MLR) and simple naïve models using the calculated Pearson correlation coefficient (<inline-formula><mml:math id="M12" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula>) between forecasted and observed values for each model. In the naïve model, we assume a “minimal-information” scenario, and the best estimate of the streamflow from the next day is the observed value from the current day <xref ref-type="bibr" rid="bib1.bibx22" id="paren.47"/>. Its <inline-formula><mml:math id="M13" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula>, in this case, is the 1 d autocorrelation coefficient in the time series and measures the strength of persistence. We train and verify the MLR model using the same datasets and predictors supplied to the RF model.</p>
</sec>
<sec id="Ch1.S2.SS4">
  <label>2.4</label><title>Performance evaluation criteria</title>
      <?pagebreak page3002?><p id="d1e566">There are different model performance criteria and each provides unique insights on the correspondence  between forecasted and observed streamflow values. While <italic>r</italic> and its square, namely the coefficient of determination (<inline-formula><mml:math id="M14" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>), are often used, <xref ref-type="bibr" rid="bib1.bibx34" id="text.48"/> discussed the limitation of these two measures when they were reported to be especially oversensitive to extreme values or outliers. The authors suggest that absolute error measures (i.e., root mean squared error or mean absolute error) and goodness-of-fit measures, such as the Nash–Sutcliffe efficiency (NSE), could provide more a reliable and conservative assessment of the models. Kling–Gupta efficiency (KGE) is a relatively new metric that was developed based on a decomposition of NSE <xref ref-type="bibr" rid="bib1.bibx23" id="paren.49"/>. This goodness-of-fit measure is gaining popularity as a benchmark metric for hydrologic models by addressing several shortcomings diagnosed with NSE. For these reasons, we selected the following four criteria to evaluate RF performance: <inline-formula><mml:math id="M15" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, RMSE, MAE, and KGE. These criteria cover various aspects of model’s performance and also provide intuitive interpretation as explained in the remainder of this section.</p>
      <p id="d1e600"><inline-formula><mml:math id="M16" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> can be interpreted as the proportion of the variance in the observed values that can be explained by the model. Values are in the range between 0 and 1; a value of 1 indicates the model is able to explain all variation in the observed dataset:
            <disp-formula id="Ch1.E1" content-type="numbered"><label>1</label><mml:math id="M17" display="block"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mfenced close=")" open="("><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mo>)</mml:mo><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:msqrt><mml:mrow><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt><mml:msqrt><mml:mrow><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>-</mml:mo><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac></mml:mstyle></mml:mfenced><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M18" display="inline"><mml:mi>N</mml:mi></mml:math></inline-formula> is the total number of observations during the validation period, and <inline-formula><mml:math id="M19" display="inline"><mml:mover accent="true"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover></mml:math></inline-formula> and <inline-formula><mml:math id="M20" display="inline"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> are the forecasted and observed values at day <inline-formula><mml:math id="M21" display="inline"><mml:mi>i</mml:mi></mml:math></inline-formula>, respectively, with
            <disp-formula id="Ch1.E2" content-type="numbered"><label>2</label><mml:math id="M22" display="block"><mml:mrow><mml:mover accent="true"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mtext>and</mml:mtext><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mover accent="true"><mml:mrow><mml:msub><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo mathvariant="normal">‾</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mn mathvariant="normal">1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mover accent="true"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p>
      <p id="d1e887">MAE provides an average magnitude of the errors in the model's predictions without considering the direction (underestimation or overestimation).
            <disp-formula id="Ch1.E3" content-type="numbered"><label>3</label><mml:math id="M23" display="block"><mml:mrow><mml:mi mathvariant="normal">MAE</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mo>|</mml:mo><mml:mover accent="true"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>|</mml:mo></mml:mrow><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle></mml:mrow></mml:math></disp-formula></p>
      <p id="d1e936">RMSE is the standard deviation of the residuals between the predictions and observations. It is more sensitive to larger error due to the squared operation. Both MAE and RMSE scores range between 0 and <inline-formula><mml:math id="M24" display="inline"><mml:mi mathvariant="normal">∞</mml:mi></mml:math></inline-formula>; a score of 0 indicates a perfect match between predicted and observed data. The standardization in streamflow measurements (described in Sect. 3) allows comparison of MAE and RMSE across gauges.
            <disp-formula id="Ch1.E4" content-type="numbered"><label>4</label><mml:math id="M25" display="block"><mml:mrow><mml:mi mathvariant="normal">RMSE</mml:mi><mml:mo>=</mml:mo><mml:msqrt><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:munderover><mml:mo movablelimits="false">∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>-</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle></mml:msqrt></mml:mrow></mml:math></disp-formula></p>
      <p id="d1e997">The KGE metric ranges between <inline-formula><mml:math id="M26" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mi mathvariant="normal">∞</mml:mi></mml:mrow></mml:math></inline-formula> and 1. While there currently is not a definitive KGE scale, <xref ref-type="bibr" rid="bib1.bibx30" id="text.50"/> showed that KGE values in the range between <inline-formula><mml:math id="M27" display="inline"><mml:mrow><mml:mi mathvariant="normal">−</mml:mi><mml:mn mathvariant="normal">0.41</mml:mn></mml:mrow></mml:math></inline-formula> and <inline-formula><mml:math id="M28" display="inline"><mml:mn mathvariant="normal">1</mml:mn></mml:math></inline-formula>  indicate the model improves upon the mean flow benchmark, which assumes that the predicted streamflow values equal to the mean of all observations. A KGE value of 1 suggests the model can perfectly reproduce observations. KGE is calculated as follows:
            <disp-formula id="Ch1.E5" content-type="numbered"><label>5</label><mml:math id="M29" display="block"><mml:mrow><mml:mi mathvariant="normal">KGE</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:mo>-</mml:mo><mml:msqrt><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="italic">α</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:mo>(</mml:mo><mml:mi mathvariant="italic">β</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn><mml:msup><mml:mo>)</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:msqrt><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M30" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> is the Pearson correlation coefficient, <inline-formula><mml:math id="M31" display="inline"><mml:mi mathvariant="italic">α</mml:mi></mml:math></inline-formula> is a measure of relative variability in the forecasted and observed values, and <inline-formula><mml:math id="M32" display="inline"><mml:mi mathvariant="italic">β</mml:mi></mml:math></inline-formula> represents the bias:
            <disp-formula id="Ch1.E6" content-type="numbered"><label>6</label><mml:math id="M33" display="block"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mspace linebreak="nobreak" width="0.25em"/><mml:mtext>and</mml:mtext><mml:mspace width="0.25em" linebreak="nobreak"/><mml:mi mathvariant="italic">β</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M34" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover></mml:msub></mml:mrow></mml:math></inline-formula> is the standard deviation in observations, <inline-formula><mml:math id="M35" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">σ</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the standard deviation in forecasted values, <inline-formula><mml:math id="M36" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mover accent="true"><mml:mi>y</mml:mi><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover></mml:msub></mml:mrow></mml:math></inline-formula> is the forecasted mean, and <inline-formula><mml:math id="M37" display="inline"><mml:mrow><mml:msub><mml:mi mathvariant="italic">μ</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is the observation mean.</p>
      <p id="d1e1219">In a hydrological forecast, one might be interested in the ability of the model to capture more extreme events rather than the overall performance. This is particularly relevant in flood risk assessment and flood forecasting wherein floods are associated with discharge exceeding a high percentile (typically  <inline-formula><mml:math id="M38" display="inline"><mml:mo>≥</mml:mo></mml:math></inline-formula> 90th) <xref ref-type="bibr" rid="bib1.bibx10" id="paren.51"/>. The definition of “extreme” depends on the objective of the study. Here, we adopt the peak-over-threshold method. For the validation period, we calculated the 90th, 95th, and 99th percentile streamflow values at each watershed. These are considered thresholds. If an observed daily streamflow exceeded this threshold, it would be considered an extreme event. We measure the ability of RF to capture these events using two additional criteria: probability of detection (POD) and false alarm rate (FAR). The calculation follows <xref ref-type="bibr" rid="bib1.bibx29" id="text.52"/>:
            <disp-formula id="Ch1.E7" content-type="numbered"><label>7</label><mml:math id="M39" display="block"><mml:mrow><mml:mi mathvariant="normal">POD</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mi>P</mml:mi><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false" mathvariant="normal">^</mml:mo></mml:mover><mml:mo>&gt;</mml:mo><mml:mi mathvariant="italic">ω</mml:mi><mml:mo>|</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&gt;</mml:mo><mml:mi mathvariant="italic">ω</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&gt;</mml:mo><mml:mi mathvariant="italic">ω</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow></mml:math></disp-formula>
          and
            <disp-formula id="Ch1.E8" content-type="numbered"><label>8</label><mml:math id="M40" display="block"><mml:mrow><mml:mi mathvariant="normal">FA</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true"><mml:mfrac style="display"><mml:mrow><mml:mi>P</mml:mi><mml:mo>(</mml:mo><mml:mover accent="true"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo mathvariant="normal" stretchy="false">^</mml:mo></mml:mover><mml:mo>&gt;</mml:mo><mml:mi mathvariant="italic">ω</mml:mi><mml:mo>|</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&lt;</mml:mo><mml:mi mathvariant="italic">ω</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&lt;</mml:mo><mml:mi mathvariant="italic">ω</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>
          where <inline-formula><mml:math id="M41" display="inline"><mml:mi mathvariant="italic">ω</mml:mi></mml:math></inline-formula> is a specified threshold.</p>
</sec>
</sec>
<sec id="Ch1.S3">
  <label>3</label><title>Study area and data</title>
<sec id="Ch1.S3.SS1">
  <label>3.1</label><title>Watersheds in the Pacific Northwest Hydrologic Region</title>
      <p id="d1e1379">In this study, we focus on watersheds in the Pacific Northwest hydrologic region (Fig. <xref ref-type="fig" rid="Ch1.F1"/>). This region covers an area of 836 517 km<inline-formula><mml:math id="M42" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:math></inline-formula> and encompasses all of Washington, six other states, and British Columbia, Canada. For the purpose of maintaining consistency in monitoring protocol and data, we only consider watersheds in US territory. The Columbia River and its tributaries make up the majority of the drainage area, traveling more than 2000 km with an extensive network of more than 100 hydroelectric dams and reservoirs built along these river channels. Hydropower in the Columbia<?pagebreak page3003?> River Basin supplies approximately 70 % of Pacific Northwest energy <xref ref-type="bibr" rid="bib1.bibx53" id="paren.53"/>. Flood control is also an important aspect of reservoir operation in this region.</p>
      <p id="d1e1396">The north–south-running Cascade Mountain Range divides the region into eastern and western parts and strongly influences the regional climate. The windward (west) side of the mountain receives an ample amount of winter precipitation compared to the leeward (east) side. When temperature falls near the freezing point, precipitation comes in the form of snow and provides water storage for dry summer months. Summers tend to be cool and comparatively dry. East of the Cascades, summer rainfall results from rapidly developing thunderstorm and convective events that can produce flash floods <xref ref-type="bibr" rid="bib1.bibx41" id="paren.54"/>. For this region, proximity to the ocean creates a more moderate climate with a narrower seasonal temperature range compared to the inland areas, particularly in the winter. Spatial trends and variations in annual mean temperature, total precipitation, drainage area, and elevation of the watersheds are shown in Fig. <xref ref-type="fig" rid="Ch1.F3"/>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F3" specific-use="star"><?xmltex \currentcnt{3}?><?xmltex \def\figurename{Figure}?><label>Figure 3</label><caption><p id="d1e1406">Gauge locations with a color gradient indicating variations in <bold>(a)</bold> drainage area (km<inline-formula><mml:math id="M43" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:math></inline-formula>), watershed mean elevation (m), <bold>(c)</bold> annual precipitation (cm), and <bold>(d)</bold> annual mean temperature (<inline-formula><mml:math id="M44" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C).</p></caption>
          <?xmltex \igopts{width=426.791339pt}?><graphic xlink:href="https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021-f03.png"/>

        </fig>

</sec>
<sec id="Ch1.S3.SS2">
  <label>3.2</label><title>Data</title>
<sec id="Ch1.S3.SS2.SSS1">
  <label>3.2.1</label><title>Streamflow</title>
      <p id="d1e1457">Our analysis uses streamflow data available through the USGS National Water Information System (NWIS)  (<uri>https://waterdata.usgs.gov/nwis/sw</uri>, last access: 6 May 2021). From NWIS, we selected daily streamflow time series for gauges using the following criteria: (1) continuous operation during the 10-year period between 2009 and 2018, (2) have than 10 % missing data, and (3) positioned in watersheds with “natural” flow that is minimally interrupted by anthropogenic intervention.  The third criterion was met using the GAGES-II: Geospatial Attributes of gauges for Evaluating Streamflow dataset <xref ref-type="bibr" rid="bib1.bibx19" id="paren.55"/> classification to identify watersheds with the least-disturbed hydrologic conditions representing natural flow. We performed additional screening by computing correlation coefficient between the respective gauge and mean basin streamflow and removed those with a correlation of less than 0.5. We also excluded small creeks with a drainage area less than 50 km<inline-formula><mml:math id="M45" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:math></inline-formula>. In total, 86 watersheds were selected (Fig. <xref ref-type="fig" rid="Ch1.F1"/>).</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T1" specific-use="star"><?xmltex \currentcnt{1}?><label>Table 1</label><caption><p id="d1e1480">Number of USGS gauges used in the study for each flow regime, mean watershed elevation, drainage area,  annual precipitation, and annual mean temperature ranges.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="6">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Hydrologic regime</oasis:entry>
         <oasis:entry colname="col2">Number of</oasis:entry>
         <oasis:entry colname="col3">Mean watershed</oasis:entry>
         <oasis:entry colname="col4">Drainage area</oasis:entry>
         <oasis:entry colname="col5">Mean annual</oasis:entry>
         <oasis:entry colname="col6">Mean annual</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">gauges</oasis:entry>
         <oasis:entry colname="col3">elevation (m)</oasis:entry>
         <oasis:entry colname="col4">(km<inline-formula><mml:math id="M46" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col5">precipitation (cm)</oasis:entry>
         <oasis:entry colname="col6">temperature (<inline-formula><mml:math id="M47" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C)</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Rainfall-dominated</oasis:entry>
         <oasis:entry colname="col2">33</oasis:entry>
         <oasis:entry colname="col3">239–1207</oasis:entry>
         <oasis:entry colname="col4">58–703</oasis:entry>
         <oasis:entry colname="col5">122.0–367.0</oasis:entry>
         <oasis:entry colname="col6">5.4–11.5</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Transient</oasis:entry>
         <oasis:entry colname="col2">28</oasis:entry>
         <oasis:entry colname="col3">813–1477</oasis:entry>
         <oasis:entry colname="col4">58–1855</oasis:entry>
         <oasis:entry colname="col5">63.2–314.0</oasis:entry>
         <oasis:entry colname="col6">4.16–8.42</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Snowmelt-dominated</oasis:entry>
         <oasis:entry colname="col2">25</oasis:entry>
         <oasis:entry colname="col3">1349–2509</oasis:entry>
         <oasis:entry colname="col4">51–3355</oasis:entry>
         <oasis:entry colname="col5">58.0–177.0</oasis:entry>
         <oasis:entry colname="col6">0.4–6.62</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e1634">Following methodology proposed in <xref ref-type="bibr" rid="bib1.bibx77" id="text.56"/>, the watersheds were further grouped into three classes of hydrologic regimes based on the timing of center-of-annual flow, which is defined as the date on which half of the total annual flow volume is exceeded. The annual flow calculations follow a water-year calendar that begins 1 October and ends 30 September. These three hydrologic regimes include “early” streams with flow time <inline-formula><mml:math id="M48" display="inline"><mml:mo>&lt;</mml:mo></mml:math></inline-formula> 150 (27 February), “late” streams with flow time <inline-formula><mml:math id="M49" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 200 (18 April), and “intermediate” streams with flow time between 150 and 200. These hydrologic regimes correspond to rainfall-dominated, snowmelt-dominated, and transient or transitional (mixture of rain and snowmelt) hydrographs, respectively. While this particular classification and its variants have been used in various studies related to water resources in this region <xref ref-type="bibr" rid="bib1.bibx40 bib1.bibx18 bib1.bibx74" id="paren.57"/>, we adopted this partition in our study for two reasons. First, as <xref ref-type="bibr" rid="bib1.bibx58" id="text.58"/> pointed out, the classification provides a summary of information about the type and timing of precipitation, the timing of snowmelt, and the contribution of these hydro-climatic variables to streamflow. This helps us assess model performance in consideration of sources of runoff. Second, the classification provides a basis to generalize the results to other watersheds that are not part of the study.</p>
      <p id="d1e1661">On average, records at these watersheds have less than 3 % missing data during the 2009–2018 period. The drainage area of the watersheds ranges between 51 and 3355 km<inline-formula><mml:math id="M50" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:math></inline-formula>, and the mean elevation ranges from 239 to 2509 m, as estimated from 30 m resolution digital elevation model (Table <xref ref-type="table" rid="Ch1.T1"/>).</p>
</sec>
<sec id="Ch1.S3.SS2.SSS2">
  <label>3.2.2</label><title>Precipitation</title>
      <p id="d1e1683">Daily precipitation observations were obtained from the AN81d PRISM dataset <xref ref-type="bibr" rid="bib1.bibx15" id="paren.59"/>. This gridded dataset has a resolution of 4 km, covers the entire continental US from January 1981 to present, and is continuously updated every 6 months. The best-estimate gridded value is derived by using all the available data from the numbers of station networks ingested by the PRISM Climate Group. A combination of climatologically aided interpolation (CAI) and radar interpolation were used to develop the PRISM dataset. In our study, watershed daily precipitation time series were constructed by computing the arithmetic mean for precipitation values of all grid points that fall within the given watershed.</p>
</sec>
<sec id="Ch1.S3.SS2.SSS3">
  <label>3.2.3</label><title>Snow water equivalent and temperatures</title>
      <p id="d1e1697">SWE is defined as the depth of water that would be obtained if a column of snow were completely melted <xref ref-type="bibr" rid="bib1.bibx51" id="paren.60"/>. Daily SWE data were retrieved from 201 SNOTEL stations in HUC-17. These stations are part of the network of over 800 sites located in remote, high-elevation mountain watersheds in the western US. The elevation of these stations is in the range of 128   and 3142 m. At SNOTEL sites, SWE is measured by a snow pillow – a pressure-sensitive pad that weighs the snowpack and records the reading via a pressure transducer. As temperature shift is the primary trigger for snowmelt, daily maximum temperature (TMAX) and minimum temperature (TMIN) from SNOTEL sensors were also retrieved and included as predictors for streamflow. The obtained data reflected the last measurement recorded for the respective day at each site. We only supplied the last measurement from SNOTEL stations because not all predictors have
sub-daily values. The dataset is mostly complete, with 99.6 %, 99.6 %, and 99.9 % of the observations available for the three variables TMAX, TMIN, and SWE, respectively. Because of the sparse coverage of SNOTEL sites, daily<?pagebreak page3004?> average values were calculated at USGS basin level (six-digit hydrological unit), similar to the currently reported snow observations from the National Water and Climate Center (<uri>https://www.wcc.nrcs.usda.gov/snow/snow_map.html</uri>, last access: 6 May 2021), and subsequently applied to the watersheds located in that basin. There is a total of 15 basins; each contains a number of SNOTEL stations in the range between 6 and 30 (Table S2 in the Supplement). It is noted the  in situ data from these stations cannot capture the spatial variability of snow accumulation, and computing an area-averaged snowpack value from observations remains a challenging task <xref ref-type="bibr" rid="bib1.bibx45" id="paren.61"/>. The SNOTEL averages therefore represent first-order estimates of snow coverage and temperature conditions.</p>
</sec>
<sec id="Ch1.S3.SS2.SSS4">
  <label>3.2.4</label><title>Predictor selection</title>
      <p id="d1e1717">Future daily mean streamflow (<inline-formula><mml:math id="M51" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>) is the response variable in our study. We attempt to explain the variability in <inline-formula><mml:math id="M52" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> using eight relevant predictors from the three datasets (Table <xref ref-type="table" rid="Ch1.T2"/>). The selection of predictors is based on a thorough review of the literature from previous studies and our understanding of the hydrology of this region. Specifically, precipitation (<inline-formula><mml:math id="M53" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) is intuitively a driver of streamflow. SWE<inline-formula><mml:math id="M54" display="inline"><mml:msub><mml:mi/><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula> provides storage information on the amount of accumulated snow available for runoff and is influenced by changes in temperature (TMAX<inline-formula><mml:math id="M55" display="inline"><mml:msub><mml:mi/><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula> and TMIN<inline-formula><mml:math id="M56" display="inline"><mml:msub><mml:mi/><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula>). Given that there
is high temporal correlation in daily temperatures, TMIN and TMAX data can provide a useful signal to
our streamflow forecast. Previous-day streamflow (<inline-formula><mml:math id="M57" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) is particularly important due to the high degree of persistence in the time series. A hydrological year consists of 73 pentads; each comprises 5 consecutive days and the observation for each day is indexed with a pentad value between 1 and 73. Data preprocessing showed moderate to strong nonlinear temporal correlation between daily streamflow and the pentad at each gauge. We also derived two variables from available data: the sum of 3 d precipitation (<inline-formula><mml:math id="M58" display="inline"><mml:mrow><mml:mi>P</mml:mi><mml:msub><mml:mn mathvariant="normal">3</mml:mn><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) and snowmelt (SD<inline-formula><mml:math id="M59" display="inline"><mml:msub><mml:mi/><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula>). Inclusion of 3 d precipitation was to account for large winter storms that can last for several days, which often result in surges in streamflow. SD<inline-formula><mml:math id="M60" display="inline"><mml:msub><mml:mi/><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula> was calculated as the difference between SWE at day <inline-formula><mml:math id="M61" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula> and <inline-formula><mml:math id="M62" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula>. A positive value of SD<inline-formula><mml:math id="M63" display="inline"><mml:msub><mml:mi/><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula> indicates snow accumulation, and a negative value indicates melt.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T2" specific-use="star"><?xmltex \currentcnt{2}?><label>Table 2</label><caption><p id="d1e1867">List of predictors.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="5">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="left"/>
     <oasis:colspec colnum="4" colname="col4" align="left"/>
     <oasis:colspec colnum="5" colname="col5" align="left"/>
     <oasis:thead>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1">No.</oasis:entry>
         <oasis:entry colname="col2">Predictors</oasis:entry>
         <oasis:entry colname="col3">Index</oasis:entry>
         <oasis:entry colname="col4">Unit</oasis:entry>
         <oasis:entry colname="col5">Source</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">1</oasis:entry>
         <oasis:entry colname="col2">Streamflow at day <inline-formula><mml:math id="M64" display="inline"><mml:mi>t</mml:mi></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M65" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">m<inline-formula><mml:math id="M66" display="inline"><mml:msup><mml:mi/><mml:mn mathvariant="normal">3</mml:mn></mml:msup></mml:math></inline-formula> s<inline-formula><mml:math id="M67" display="inline"><mml:msup><mml:mi/><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col5">USGS</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">2</oasis:entry>
         <oasis:entry colname="col2">Precipitation</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M68" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">mm</oasis:entry>
         <oasis:entry colname="col5">PRISM</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">3</oasis:entry>
         <oasis:entry colname="col2">Sum of 3 d precipitation (<inline-formula><mml:math id="M69" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M70" display="inline"><mml:mrow><mml:mi>P</mml:mi><mml:msub><mml:mn mathvariant="normal">3</mml:mn><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">mm</oasis:entry>
         <oasis:entry colname="col5">Derived from PRISM</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">4</oasis:entry>
         <oasis:entry colname="col2">Snow water equivalent</oasis:entry>
         <oasis:entry colname="col3">SWE<inline-formula><mml:math id="M71" display="inline"><mml:msub><mml:mi/><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">mm</oasis:entry>
         <oasis:entry colname="col5">SNOTEL</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">5</oasis:entry>
         <oasis:entry colname="col2">Maximum temperature</oasis:entry>
         <oasis:entry colname="col3">TMAX<inline-formula><mml:math id="M72" display="inline"><mml:msub><mml:mi/><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M73" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C</oasis:entry>
         <oasis:entry colname="col5">SNOTEL</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">6</oasis:entry>
         <oasis:entry colname="col2">Minimum temperature</oasis:entry>
         <oasis:entry colname="col3">TMIN<inline-formula><mml:math id="M74" display="inline"><mml:msub><mml:mi/><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M75" display="inline"><mml:msup><mml:mi/><mml:mo>∘</mml:mo></mml:msup></mml:math></inline-formula>C</oasis:entry>
         <oasis:entry colname="col5">SNOTEL</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">7</oasis:entry>
         <oasis:entry colname="col2">Snowmelt (SW<inline-formula><mml:math id="M76" display="inline"><mml:msub><mml:mi/><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula> – SW<inline-formula><mml:math id="M77" display="inline"><mml:msub><mml:mi/><mml:mrow><mml:mi>t</mml:mi><mml:mo>-</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>)</oasis:entry>
         <oasis:entry colname="col3">SD<inline-formula><mml:math id="M78" display="inline"><mml:msub><mml:mi/><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">mm</oasis:entry>
         <oasis:entry colname="col5">Derived from SNOTEL</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">8</oasis:entry>
         <oasis:entry colname="col2">Pentad</oasis:entry>
         <oasis:entry colname="col3">PEN<inline-formula><mml:math id="M79" display="inline"><mml:msub><mml:mi/><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">–</oasis:entry>
         <oasis:entry colname="col5">–</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e2224">Soil moisture is also a relevant variable in streamflow modeling as it controls the partition between infiltration and runoff of precipitation <xref ref-type="bibr" rid="bib1.bibx3" id="paren.62"/>. However, soil<?pagebreak page3005?> moisture data are often limited and incomplete, especially at a daily interval, and are therefore not included in this study. The data were divided into two sets: training consisting of  7 years (2009–2015) and a validation set of 3 years (2016–2018). We standardized training and validation data at each gauge using min–max scaling. First, we computed the min and max values from training datasets for each of the predictor and response variables at each watershed. These min and max values were then used to standardize both training and validation datasets. The training data, which were used to compute min–max values for standardization, therefore have values between 0 and 1. A flowchart representing the input–output model using RF is shown in Fig. <xref ref-type="fig" rid="Ch1.F4"/>.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F4" specific-use="star"><?xmltex \currentcnt{4}?><?xmltex \def\figurename{Figure}?><label>Figure 4</label><caption><p id="d1e2235">Flowchart showing the input–output model using RF.</p></caption>
            <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021-f04.png"/>

          </fig>

      <?xmltex \floatpos{t}?><fig id="Ch1.F5"><?xmltex \currentcnt{5}?><?xmltex \def\figurename{Figure}?><label>Figure 5</label><caption><p id="d1e2246">Out-of-bag mean absolute error plotted against <monospace>mtry</monospace> during an optimal parameter search at the Carbon River Watershed (USGS site 12094000).</p></caption>
            <?xmltex \igopts{width=199.169291pt}?><graphic xlink:href="https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021-f05.png"/>

          </fig>

</sec>
</sec>
</sec>
<sec id="Ch1.S4">
  <label>4</label><title>Results and discussion</title>
<sec id="Ch1.S4.SS1">
  <label>4.1</label><title>Parameter tuning</title>
      <p id="d1e2275">As we mentioned in Sect. <xref ref-type="sec" rid="Ch1.S2"/>, the error rate in RF can be sensitive to two parameters: the number of trees <monospace>ntree</monospace> and the number of randomly selected predictors available for splitting at each node <monospace>mtry</monospace>. We tested RF on training datasets of 30 randomly chosen watersheds and observed that the reduction in the out-of-bag MAE error is negligible after 2000 trees. We then set <monospace>ntree</monospace> <inline-formula><mml:math id="M80" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 2000 for all 86 watersheds; <monospace>mtry</monospace>, on the other hand, was tuned empirically using a combination of an exhaustive search approach and cross-validation.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T3"><?xmltex \currentcnt{3}?><label>Table 3</label><caption><p id="d1e2303">The optimized parameter <monospace>mtry</monospace> using an  exhaustive search strategy (<monospace>mtry</monospace> = {1, 2, 6, 7, 8} was considered but not found to be the optimal value at any gauge). </p></caption><oasis:table frame="topbot"><oasis:tgroup cols="3">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1"><monospace>mtry</monospace></oasis:entry>
         <oasis:entry colname="col2">Number of</oasis:entry>
         <oasis:entry colname="col3">Median</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">gauges</oasis:entry>
         <oasis:entry colname="col3">MAE</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">3</oasis:entry>
         <oasis:entry colname="col2">29</oasis:entry>
         <oasis:entry colname="col3">0.0127</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">4</oasis:entry>
         <oasis:entry colname="col2">44</oasis:entry>
         <oasis:entry colname="col3">0.0116</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">5</oasis:entry>
         <oasis:entry colname="col2">13</oasis:entry>
         <oasis:entry colname="col3">0.0079</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e2389">The goal of tuning is to select the <monospace>mtry</monospace> parameter value that would optimize the performance of the model. The candidates were evaluated based on their OOB mean absolute error (MAE). At each watershed, eight possible candidate values of <monospace>mtry</monospace> (1–8) were analyzed by three repetitions of 10-fold cross-validation from the training dataset. Averaging the MAE of repetitions of the cross-validation procedure can provide more reliable results as the variance of the estimation is reduced <xref ref-type="bibr" rid="bib1.bibx62" id="paren.63"/>. To illustrate, in Fig. <xref ref-type="fig" rid="Ch1.F5"/>, lowest cross-validation MAE is obtained at <monospace>mtry</monospace> <inline-formula><mml:math id="M81" display="inline"><mml:mo>=</mml:mo></mml:math></inline-formula> 3 at the Carbon River Watershed (USGS site 12094000). The results of tuning for all gauges (Table <xref ref-type="table" rid="Ch1.T3"/>) show that the optimal <monospace>mtry</monospace> values are {3, 4, 5} with a median MAE of 0.0127, 0.0116, and 0.0079, respectively. The optimal <monospace>mtry</monospace> at each gauge was then used in both training and validating the model.
Because the number of predictors in our study is relatively small, the computation burden of the exhaustive search was manageable. As the  number of candidate grows, a random search strategy <xref ref-type="bibr" rid="bib1.bibx55" id="paren.64"/>, in which values are drawn randomly from a specified space, can be more computationally efficient.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F6" specific-use="star"><?xmltex \currentcnt{6}?><?xmltex \def\figurename{Figure}?><label>Figure 6</label><caption><p id="d1e2428">Box plots for the Pearson correlation coefficient between forecasted and observed values for three models across three flow regimes: RF, naïve, and MLR. Two-sample Wilcoxon rank-sum significance tests are performed, and <inline-formula><mml:math id="M82" display="inline"><mml:mi>p</mml:mi></mml:math></inline-formula> values (in black) are included for each pair of models.</p></caption>
          <?xmltex \igopts{width=426.791339pt}?><graphic xlink:href="https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021-f06.png"/>

        </fig>

      <?xmltex \floatpos{t}?><fig id="Ch1.F7" specific-use="star"><?xmltex \currentcnt{7}?><?xmltex \def\figurename{Figure}?><label>Figure 7</label><caption><p id="d1e2446">Pairwise scatter plots of the Pearson correlation coefficient between forecasted and observed values for <bold>(a)</bold> RF vs. the naïve model, <bold>(b)</bold> RF vs. MLR, and <bold>(c)</bold> MLR vs. the naïve model. Each dot represents a watershed (<inline-formula><mml:math id="M83" display="inline"><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">86</mml:mn></mml:mrow></mml:math></inline-formula>).</p></caption>
          <?xmltex \igopts{width=455.244094pt}?><graphic xlink:href="https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021-f07.png"/>

        </fig>

</sec>
<sec id="Ch1.S4.SS2">
  <label>4.2</label><?xmltex \opttitle{Benchmark RF against MLR and na\"{i}ve models}?><title>Benchmark RF against MLR and naïve models</title>
      <p id="d1e2485">Figure <xref ref-type="fig" rid="Ch1.F6"/> shows the distributions of the Pearson correlation coefficient (<inline-formula><mml:math id="M84" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula>) between forecasted and observed values obtained from the three models:  RF,  naïve, and MLR. Non-parametric, two-sample Wilcoxon rank-sum significance tests <xref ref-type="bibr" rid="bib1.bibx78" id="paren.65"/>, which are used to assess whether the values obtained between two separate groups are systematically different from one another, suggest that the pair-wise differences in <inline-formula><mml:math id="M85" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> values between RF and the other two models are statistically significant (<inline-formula><mml:math id="M86" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.05</mml:mn></mml:mrow></mml:math></inline-formula>) in two flow regimes. RF is observed to outperform both naïve and MLR models in rainfall-driven and transient watersheds. Among snowmelt-driven watersheds, the three models yield similar correlation coefficients (<inline-formula><mml:math id="M87" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>&gt;</mml:mo><mml:mn mathvariant="normal">0.05</mml:mn></mml:mrow></mml:math></inline-formula>).  In Fig. <xref ref-type="fig" rid="Ch1.F7"/>a, we observe that most points lie on the left of the 1-to-1 line, suggesting that RF outperforms the naïve model at most individual watersheds in rainfall-driven and transient regimes. We also discern that large improvement, defined as the positive difference in <inline-formula><mml:math id="M88" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> values between RF and the naïve model, tends to occur with lower persistence (lower <inline-formula><mml:math id="M89" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> values from the naïve model). This suggests that application of RF would be<?pagebreak page3006?> most beneficial at watersheds in which next-day streamflow is less dependent on the condition of the current day. Among snowmelt-driven watersheds, the data points lie on the 1-to-1 line, indicating that the three models show a marginal difference in <inline-formula><mml:math id="M90" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> values. As <xref ref-type="bibr" rid="bib1.bibx43" id="text.66"/> pointed out, the choice of reference can affect the perceived performance of the forecast system. Our pair-wise comparisons highlight the fact that evaluating data-driven models should be performed in consideration of the autocorrelation structure in the data <xref ref-type="bibr" rid="bib1.bibx25" id="paren.67"/>. Without accounting for persistence, it would be inadequate to conclude that RF gives better performance in snowmelt-driven watersheds. Nevertheless, we observe that RF outperformed MLR in all rainfall-dominated  and transitional watersheds and 19 out of 25 snowmelt-dominated watersheds. The median <inline-formula><mml:math id="M91" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula> values for RF in the three groups are 0.88, 0.89, and 0.98 compared to 0.85, 0.87, and 0.98 for MLR. This may reflect RF's better ability to capture the nonlinear relationship between streamflow and other variables.</p>
</sec>
<sec id="Ch1.S4.SS3">
  <label>4.3</label><title>Evaluation of RF overall performance</title>
      <p id="d1e2577">We next evaluated the overall performance of RF across three flow regimes using four criteria: <inline-formula><mml:math id="M92" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, KGE, MAE, and RMSE (Table <xref ref-type="table" rid="Ch1.T4"/>, Fig. <xref ref-type="fig" rid="Ch1.F8"/>). Here, we observe a similar trend in <inline-formula><mml:math id="M93" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, KGE, MAE, and RMSE scores compared to the <inline-formula><mml:math id="M94" display="inline"><mml:mi>r</mml:mi></mml:math></inline-formula>-value trend in Fig. 6, where RF performs better in snowmelt-dominated than in  rainfall-dominated watersheds (higher <inline-formula><mml:math id="M95" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> and KGE, lower MAE and RMSE). Snowmelt-dominated watersheds have the smallest range of <inline-formula><mml:math id="M96" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values across the three groups. This may suggest that there is less variability in flow behaviors at individual gauges in this group and is consistent with<?pagebreak page3007?> the observed data for which hydrographs of snowmelt-driven watersheds tend to be less flashy compared to rainfall-driven watersheds. Not surprisingly, the transitional group has the largest spread in <inline-formula><mml:math id="M97" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula> values as watersheds in this group share characteristics from the other two groups.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F8" specific-use="star"><?xmltex \currentcnt{8}?><?xmltex \def\figurename{Figure}?><label>Figure 8</label><caption><p id="d1e2649">Streamflow daily forecast scores computed over the validation period for the RF model in four metrics: <inline-formula><mml:math id="M98" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, KGE, MAE, and RMSE.</p></caption>
          <?xmltex \igopts{width=369.885827pt}?><graphic xlink:href="https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021-f08.png"/>

        </fig>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T4" specific-use="star"><?xmltex \currentcnt{4}?><label>Table 4</label><caption><p id="d1e2672">Descriptive statistics of the four criteria used to evaluate the overall performance of RF: <inline-formula><mml:math id="M99" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula>, KGE, MAE, and RMSE.</p></caption><oasis:table frame="topbot"><oasis:tgroup cols="7">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="left"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:colspec colnum="5" colname="col5" align="right"/>
     <oasis:colspec colnum="6" colname="col6" align="right"/>
     <oasis:colspec colnum="7" colname="col7" align="right"/>
     <oasis:thead>
       <oasis:row rowsep="1">

         <oasis:entry colname="col1">Metric</oasis:entry>

         <oasis:entry colname="col2">Flow regime</oasis:entry>

         <oasis:entry colname="col3">Min</oasis:entry>

         <oasis:entry colname="col4"><inline-formula><mml:math id="M100" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mn mathvariant="normal">1</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col5">Median</oasis:entry>

         <oasis:entry colname="col6"><inline-formula><mml:math id="M101" display="inline"><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mn mathvariant="normal">3</mml:mn></mml:msub></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col7">Max</oasis:entry>

       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="2"><inline-formula><mml:math id="M102" display="inline"><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn mathvariant="normal">2</mml:mn></mml:msup></mml:mrow></mml:math></inline-formula></oasis:entry>

         <oasis:entry colname="col2">Rainfall-dominated</oasis:entry>

         <oasis:entry colname="col3">0.59</oasis:entry>

         <oasis:entry colname="col4">0.71</oasis:entry>

         <oasis:entry colname="col5">0.77</oasis:entry>

         <oasis:entry colname="col6">0.81</oasis:entry>

         <oasis:entry colname="col7">0.87</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">Transient</oasis:entry>

         <oasis:entry colname="col3">0.57</oasis:entry>

         <oasis:entry colname="col4">0.71</oasis:entry>

         <oasis:entry colname="col5">0.80</oasis:entry>

         <oasis:entry colname="col6">0.87</oasis:entry>

         <oasis:entry colname="col7">0.99</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2">Snowmelt-dominated</oasis:entry>

         <oasis:entry colname="col3">0.88</oasis:entry>

         <oasis:entry colname="col4">0.95</oasis:entry>

         <oasis:entry colname="col5">0.97</oasis:entry>

         <oasis:entry colname="col6">0.98</oasis:entry>

         <oasis:entry colname="col7">0.99</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="2">KGE</oasis:entry>

         <oasis:entry colname="col2">Rainfall-dominated</oasis:entry>

         <oasis:entry colname="col3">0.64</oasis:entry>

         <oasis:entry colname="col4">0.78</oasis:entry>

         <oasis:entry colname="col5">0.84</oasis:entry>

         <oasis:entry colname="col6">0.87</oasis:entry>

         <oasis:entry colname="col7">0.92</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">Transient</oasis:entry>

         <oasis:entry colname="col3">0.62</oasis:entry>

         <oasis:entry colname="col4">0.77</oasis:entry>

         <oasis:entry colname="col5">0.86</oasis:entry>

         <oasis:entry colname="col6">0.91</oasis:entry>

         <oasis:entry colname="col7">0.99</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2">Snowmelt-dominated</oasis:entry>

         <oasis:entry colname="col3">0.77</oasis:entry>

         <oasis:entry colname="col4">0.89</oasis:entry>

         <oasis:entry colname="col5">0.94</oasis:entry>

         <oasis:entry colname="col6">0.97</oasis:entry>

         <oasis:entry colname="col7">0.99</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry rowsep="1" colname="col1" morerows="2">MAE</oasis:entry>

         <oasis:entry colname="col2">Rainfall-dominated</oasis:entry>

         <oasis:entry colname="col3">0.0061</oasis:entry>

         <oasis:entry colname="col4">0.0096</oasis:entry>

         <oasis:entry colname="col5">0.0131</oasis:entry>

         <oasis:entry colname="col6">0.0161</oasis:entry>

         <oasis:entry colname="col7">0.0245</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">Transient</oasis:entry>

         <oasis:entry colname="col3">0.0070</oasis:entry>

         <oasis:entry colname="col4">0.0097</oasis:entry>

         <oasis:entry colname="col5">0.0109</oasis:entry>

         <oasis:entry colname="col6">0.0143</oasis:entry>

         <oasis:entry colname="col7">0.0189</oasis:entry>

       </oasis:row>
       <oasis:row rowsep="1">

         <oasis:entry colname="col2">Snowmelt-dominated</oasis:entry>

         <oasis:entry colname="col3">0.0065</oasis:entry>

         <oasis:entry colname="col4">0.0087</oasis:entry>

         <oasis:entry colname="col5">0.0092</oasis:entry>

         <oasis:entry colname="col6">0.0114</oasis:entry>

         <oasis:entry colname="col7">0.0168</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col1" morerows="2">RMSE</oasis:entry>

         <oasis:entry colname="col2">Rainfall-dominated</oasis:entry>

         <oasis:entry colname="col3">0.0157</oasis:entry>

         <oasis:entry colname="col4">0.0241</oasis:entry>

         <oasis:entry colname="col5">0.0326</oasis:entry>

         <oasis:entry colname="col6">0.0395</oasis:entry>

         <oasis:entry colname="col7">0.0609</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">Transient</oasis:entry>

         <oasis:entry colname="col3">0.0144</oasis:entry>

         <oasis:entry colname="col4">0.0227</oasis:entry>

         <oasis:entry colname="col5">0.0275</oasis:entry>

         <oasis:entry colname="col6">0.0331</oasis:entry>

         <oasis:entry colname="col7">0.0468</oasis:entry>

       </oasis:row>
       <oasis:row>

         <oasis:entry colname="col2">Snowmelt-dominated</oasis:entry>

         <oasis:entry colname="col3">0.0160</oasis:entry>

         <oasis:entry colname="col4">0.0218</oasis:entry>

         <oasis:entry colname="col5">0.0270</oasis:entry>

         <oasis:entry colname="col6">0.0315</oasis:entry>

         <oasis:entry colname="col7">0.0436</oasis:entry>

       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup></oasis:table></table-wrap>

      <p id="d1e3043">Because RMSE is more sensitive to larger errors compared to MAE, the difference between the two scores represents the extent to which outliers are present in error values <xref ref-type="bibr" rid="bib1.bibx34" id="paren.68"/>. In the rainfall-driven and transient groups, the shape of the box-plot distributions remains fairly consistent between the two error scores, suggesting that the distribution of large errors is similar to that of mean errors in these watersheds (Fig. <xref ref-type="fig" rid="Ch1.F8"/>). The MAE scores are heavily skewed towards 0, while RMSE scores are more evenly spread among snowmelt-driven watersheds. In snowmelt-driven watersheds, we observe a noticeably wider interquartile range (difference between the first quartile and third quartile) in the RMSE plot compared to the MAE plot. This indicates that RF can still be susceptible to underestimation or overestimation in watersheds in which the mean error is relatively low.</p>
      <p id="d1e3051">In Table <xref ref-type="table" rid="Ch1.T4"/>, KGE scores are reported in a range of 0.64–0.99 for all watersheds. The median values for each flow regime are 0.84, 0.87, and 0.94.  As observed mean flow is used in the calculation of KGE, <xref ref-type="bibr" rid="bib1.bibx30" id="text.69"/> suggested that a KGE score greater than <inline-formula><mml:math id="M103" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.41</mml:mn></mml:mrow></mml:math></inline-formula> indicates that a hydrologic model improves upon the forecast with mean flow, independent of the basin. Therefore, RF can be seen to give a satisfactory performance at all watersheds in our study. Our results are comparable to findings in <xref ref-type="bibr" rid="bib1.bibx70" id="text.70"/> in which the authors compare the performance of RF, SVM, and ANN to simulate daily discharge with baseflow separation at four rivers in California and Washington. Although the authors did not classify these basins, it can<?pagebreak page3008?> be inferred that three of the rivers were rainfall-driven and one was snowmelt-driven. RF model in their study produced KGE scores of 0.41, 0.81, and 0.92 for the rainfall-driven water basins (without baseflow separation). However, our KGE scores for snowmelt-fed watersheds (with a median of 0.94) are higher compared to the reported 0.55 in their study.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F9" specific-use="star"><?xmltex \currentcnt{9}?><?xmltex \def\figurename{Figure}?><label>Figure 9</label><caption><p id="d1e3074">The probability of detection (POD) plotted against the false alarm rate (FAR) for three extreme thresholds: 90th, 95th, and 99th percentiles. The thin black line connects values from the same watershed. The vertical axis indicates the number of times RF <italic>correctly</italic> forecasted events that exceeded the threshold divided by the total number of exceedance. The horizontal axis indicates the number of times RF <italic>incorrectly</italic> forecasted events that exceeded the threshold divided by the total number of non-exceedance. It is noted that the scales of the horizontal and vertical axes are not 1-to-1 in the plotted partial receiver operating characteristic (ROC) curve.</p></caption>
          <?xmltex \igopts{width=426.791339pt}?><graphic xlink:href="https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021-f09.png"/>

        </fig>

</sec>
<sec id="Ch1.S4.SS4">
  <label>4.4</label><title>RF performance on extreme streamflows</title>
      <?pagebreak page3009?><p id="d1e3097">We also examine the model's capacity to forecast extreme events because of their potential high impact and associated flood risks in this region. The ability of RF to correctly detect extreme flows exceeding  90th, 95th, and 99th percentile thresholds (defined as the POD) for each watershed is plotted against the FAR in Fig. <xref ref-type="fig" rid="Ch1.F9"/>. A threshold point falling below the no-skill line indicates the model yields higher FAR than POD and is considered to have no predictive power for that threshold. RF becomes expectedly less skillful in its forecasts with an increase in the magnitude of the events. The model tends to perform better among snowmelt-dominated watersheds (higher POD, lower FAR) compared to those in transient and rainfall-driven groups. At the 95th threshold, RF can correctly forecast at least 50 % of the extreme events (POD <inline-formula><mml:math id="M104" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 0.5) at most watersheds. At the 99th threshold, the difference in RF's ability to forecast extreme streamflow among the three flow regimes becomes less obvious. In snowmelt-driven watersheds, 8 out of 25 have POD <inline-formula><mml:math id="M105" display="inline"><mml:mo>&gt;</mml:mo></mml:math></inline-formula> 0.5, 9 have POD between 0.01 and 0.5, and 8 have a POD of 0. While few studies have examined complex diurnal hydrologic responses in high-elevation catchments <xref ref-type="bibr" rid="bib1.bibx20" id="paren.71"/>, our particular result suggests that large surges in streamflow sustained by spring and early summer snowmelt can be difficult to predict, even at 1 d of lead time, and is an ongoing research subject <xref ref-type="bibr" rid="bib1.bibx56 bib1.bibx12" id="paren.72"/>. In our study, we observe that high POD is accompanied by low FAR for the same threshold. This may suggest that RF is skillful in its forecasts of extreme events.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F10" specific-use="star"><?xmltex \currentcnt{10}?><?xmltex \def\figurename{Figure}?><label>Figure 10</label><caption><p id="d1e3124">Bar plots showing the importance of predictor variables using <bold>(a–c)</bold> MDA and <bold>(d–f)</bold> MDI criteria. The length of the blue bars indicates the median value across the watersheds for each flow regime, and the thin black bar represents the full range of the values.</p></caption>
          <?xmltex \igopts{width=426.791339pt}?><graphic xlink:href="https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021-f10.png"/>

        </fig>

</sec>
<sec id="Ch1.S4.SS5">
  <label>4.5</label><title>Analysis of variable importance</title>
      <p id="d1e3148">Variable importance is a useful feature in both understanding the underlying process of a current model and generating insights for the selection of variables in future studies <xref ref-type="bibr" rid="bib1.bibx38" id="paren.73"/>. RF quantifies variable importance through two measures: MDA and MDI (Fig. <xref ref-type="fig" rid="Ch1.F10"/>). In both measures, the higher value indicates that the variable contributes more to the model accuracy. Intuitively, streamflow from the previous day is shown to be the most importance variable due to persistence. This is reflected across three flow regimes and two measures. We also observe that the sum of 3 d precipitation tends to have more predictive power than    1 d precipitation. Maximum temperature and minimum temperature share similar contribution; minimum temperature tends to receive slightly higher scores. Among snowmelt-dominated watersheds (Fig. <xref ref-type="fig" rid="Ch1.F10"/>c and <xref ref-type="fig" rid="Ch1.F10"/>f), we anticipate that snow indices (SD<inline-formula><mml:math id="M106" display="inline"><mml:msub><mml:mi/><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula> and SWE<inline-formula><mml:math id="M107" display="inline"><mml:msub><mml:mi/><mml:mi>t</mml:mi></mml:msub></mml:math></inline-formula>) contribute more to the prediction than precipitation, and this is also reflected. Surprisingly, pentad comes third and fourth in MDI and MDA, respectively. This supports the long-term snowpack memory of daily streamflow <xref ref-type="bibr" rid="bib1.bibx79" id="paren.74"/> and can be useful in real-time prediction. Precipitation does not seem to have a significant contribution to the model's accuracy among the snowmelt-dominated watersheds. Although PRISM precipitation data include both rainfall and snowfall, it is likely that the majority of fallen precipitation in these high-altitude watersheds is stored as snow on the surface and does not immediately contribute to runoff. <xref ref-type="bibr" rid="bib1.bibx35" id="text.75"/> estimated that 37 % of the precipitation falls as snow in the western US, yet snowmelt is responsible for 70 % of the total runoff in mountainous areas. It is still very surprising to observe such a low contribution of the precipitation variable to RF model accuracy. Nevertheless, we observe general agreement between the two measures in ranking of the variables in the snowmelt-driven group.</p>
      <p id="d1e3185">In transient and rainfall-dominated groups, there is noticeable disagreement between the two criteria. Precipitation (<inline-formula><mml:math id="M108" display="inline"><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) and 3 d precipitation (<inline-formula><mml:math id="M109" display="inline"><mml:mrow><mml:mi>P</mml:mi><mml:msub><mml:mn mathvariant="normal">3</mml:mn><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula>) tend to rank lower in the MDA measure (Fig. <xref ref-type="fig" rid="Ch1.F10"/>a and <xref ref-type="fig" rid="Ch1.F10"/>b) compared to MDI (Fig. <xref ref-type="fig" rid="Ch1.F10"/>d and <xref ref-type="fig" rid="Ch1.F10"/>e). Specifically, in the rainfall-dominated group, 3 d precipitation and precipitation are placed second and third based on median MDI compared to fourth and seventh in MDA. Maximum and minimum temperatures, on the other hand, tend to be more important in MDA calculation compared to MDI. In <xref ref-type="bibr" rid="bib1.bibx63" id="text.76"/>, an RF model was used to predict streamflow at five rain-fed rivers in Ethiopia. Similarly calculated MDA in that study suggested that precipitation was less important (7.71 %) than temperature (12.74 %). A linear model in the same study, however, considered the coefficient for precipitation to be significant (<inline-formula><mml:math id="M110" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>≪</mml:mo><mml:mn mathvariant="normal">0.01</mml:mn></mml:mrow></mml:math></inline-formula>), while the temperature coefficient was not (<inline-formula><mml:math id="M111" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.08</mml:mn></mml:mrow></mml:math></inline-formula>). In <xref ref-type="bibr" rid="bib1.bibx47" id="text.77"/>, the authors predicted daily reservoir levels in three reservoirs in Indiana, Texas, and Atlanta using RF and other ML techniques. Precipitation was reported as the least important variable and ranked behind dew point temperature and humidity. Inspecting the probability density functions of our predictors, we suspect that for variables that are heavily skewed and zero-inflated (e.g., precipitation), permutation-based MDA may underestimate their importance compared to those that are more normally distributed such as maximum and minimum temperatures. In our precipitation data (both training and validation), at least 30 % of the daily observations are zeros across the watersheds. There is a high likelihood that the day with zero precipitation ends up with the same value during the shuffling process, thus potentially affecting the randomness created to compute MDA. While we did not perform additional simulations to further confirm whether MDA and MDI measures are sensitive to highly skewed and zero-inflated variables, this can be a topic of future research. <xref ref-type="bibr" rid="bib1.bibx67" id="text.78"/>, however, showed that RF variable importance measures can be unreliable in situations in which predictor variables vary in their scale of measurement. It is noted that the scale of measurement not only refers to the numeric range but also the nature
of the data (e.g., ordinal vs. continuous). Among our eight predictors in our study, pentad is considered an ordinal variable. Also, the scales of measurement of precipitation and temperature variables are slightly different. Precipitation is a flux variable and comprises discrete and continuous components in that if it does not rain the amount of rainfall is discrete, whereas if it rains the amount is continuous. Temperature is a state variable and always continuous. Temperature predictors receiving higher MDA can also be due to identified bias whereby permutation-based importance measures overestimates the true contribution of correlated variables <xref ref-type="bibr" rid="bib1.bibx21" id="paren.79"/>. In our study, temperature variables tend to have more correlation with other predictors than the two precipitation variables. This  is  likely  because  temperature controls both the form of precipitation (snowfall<?pagebreak page3011?> vs. rainfall) and the timing of snowmelt. There is also an ongoing discussion regarding the stability of both measures, in which the two variable importance measures can yield noticeably different rankings, in simulated datasets <xref ref-type="bibr" rid="bib1.bibx8 bib1.bibx46 bib1.bibx26" id="paren.80"/>. Although results from MDI make more sense in our case, we suggest that RF users exercise caution when interpreting outputs from these two measures.</p>

      <?xmltex \floatpos{t}?><fig id="Ch1.F11"><?xmltex \currentcnt{11}?><?xmltex \def\figurename{Figure}?><label>Figure 11</label><caption><p id="d1e3263">KGE scores plotted against <bold>(a)</bold> the average percent of slope and <bold>(b)</bold> the average percent of sand in soil at each watershed. Best-fit lines were determined using simple linear regression. Pearson correlation coefficients were computed with associated significance.</p></caption>
          <?xmltex \igopts{width=241.848425pt}?><graphic xlink:href="https://hess.copernicus.org/articles/25/2997/2021/hess-25-2997-2021-f11.png"/>

        </fig>

</sec>
<sec id="Ch1.S4.SS6">
  <label>4.6</label><title>Effects of watershed characteristics on model performance</title>
      <p id="d1e3286">To explore the role of catchment characteristics such as geology, topography, and land cover in the performance of the RF model, we perform a Pearson correlation test between the KGE scores and selected basin physical characteristics for each flow regime. These watershed characteristics were compiled as part of the GAGES-II dataset using national data sources including US National Land Cover Database (NLCD) 2006 version, the 100 m resolution National Elevation Dataset (NED), and the Digital General Soil Map of the United States (STATSGO2) (Table S1 in the Supplement). The results are shown in Table <xref ref-type="table" rid="Ch1.T5"/>. There is a strong negative correlation (<inline-formula><mml:math id="M112" display="inline"><mml:mrow><mml:mi>p</mml:mi><mml:mo>&lt;</mml:mo><mml:mn mathvariant="normal">0.05</mml:mn></mml:mrow></mml:math></inline-formula>) between KGE scores and watershed slopes among rainfall-dominated  and transient  watersheds (Fig. <xref ref-type="fig" rid="Ch1.F11"/>a). As a steeper hillslope is often associated with faster surface and subsurface water movement during event-flow runoff, this can result in a shorter response time. We observe a similar trend between KGE scores and the percent of sand in the soil (Fig. <xref ref-type="fig" rid="Ch1.F11"/>b); the RF performs worse in watersheds with higher hydraulic conductivity (i.e., higher sand content). This could be a result of rapid subsurface flow from the soil profile enabled by soil macropores in mountainous forested area <xref ref-type="bibr" rid="bib1.bibx66" id="paren.81"/>, where subsurface flow is the predominant mechanism. Without a quantification of the partition of discharge into surface flow and subsurface flow at individual watersheds, it is difficult to determine the relative importance of subsurface runoff mechanisms in regulating streamflow and how that may have affected the RF performance. The findings, however, suggest that RF performance can deteriorate at watersheds with quick-response runoff when supplied with 1 d delayed observation data.</p>
      <p id="d1e3310">It appears that stream density and the amount of vegetation cover may also affect the performance of RF, but the relationships are not statistically significant at <inline-formula><mml:math id="M113" display="inline"><mml:mrow><mml:mi mathvariant="italic">α</mml:mi><mml:mo>=</mml:mo><mml:mn mathvariant="normal">0.05</mml:mn></mml:mrow></mml:math></inline-formula>. Aspect eastness, drainage area, and basin compactness are not determining factors for variability in the KGE scores. We also explored the impact of land use and land cover, which can be represented by the extent of impervious cover in each watershed. However, because we only selected unregulated watersheds that experienced minimal human disruption during the initial screening, most watersheds have very little impervious cover (less than 5 %). It is noted that these selected characteristics are not meant to be exhaustive, but rather representative of various types of factors that could help explain the variability in model performance. Furthermore, an alternative approach to Pearson's correlation is to use analysis  of variance (ANOVA) to test for marginal significance of each catchment variable to KGE while accounting for their interaction. Because our objective is not to make inferences on KGE based on these variables and ANOVA can be complicated to interpret, we choose to compute the correlation coefficient.</p>

<?xmltex \floatpos{t}?><table-wrap id="Ch1.T5"><?xmltex \currentcnt{5}?><label>Table 5</label><caption><p id="d1e3328">Pearson correlation coefficient between KGE scores and selected basin physical characteristics. Bold values indicate that the relationship is significant at the 5 % or 1 % level.</p></caption><oasis:table frame="topbot"><?xmltex \begin{scaleboxenv}{.98}[.98]?><oasis:tgroup cols="4">
     <oasis:colspec colnum="1" colname="col1" align="left"/>
     <oasis:colspec colnum="2" colname="col2" align="right"/>
     <oasis:colspec colnum="3" colname="col3" align="right"/>
     <oasis:colspec colnum="4" colname="col4" align="right"/>
     <oasis:thead>
       <oasis:row>
         <oasis:entry colname="col1">Watershed characteristics</oasis:entry>
         <oasis:entry rowsep="1" namest="col2" nameend="col4" align="center">Hydrologic regime </oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">Rainfall</oasis:entry>
         <oasis:entry colname="col3">Transient</oasis:entry>
         <oasis:entry colname="col4">Snowmelt</oasis:entry>
       </oasis:row>
       <oasis:row rowsep="1">
         <oasis:entry colname="col1"/>
         <oasis:entry colname="col2">dominant</oasis:entry>
         <oasis:entry colname="col3"/>
         <oasis:entry colname="col4">dominant</oasis:entry>
       </oasis:row>
     </oasis:thead>
     <oasis:tbody>
       <oasis:row>
         <oasis:entry colname="col1">Slope</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M114" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>0.42</bold></oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M115" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>0.68</bold></oasis:entry>
         <oasis:entry colname="col4">0.12</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Aspect eastness</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M116" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.02</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">0.12</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M117" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.12</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Drainage area</oasis:entry>
         <oasis:entry colname="col2">0.14</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M118" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.12</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4">0.11</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Basin compactness</oasis:entry>
         <oasis:entry colname="col2">0.09</oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M119" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.12</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M120" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.16</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Stream density</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M121" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.10</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">0.29</oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M122" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.27</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Percent of sand</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M123" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>0.59</bold></oasis:entry>
         <oasis:entry colname="col3"><inline-formula><mml:math id="M124" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula><bold>0.46</bold></oasis:entry>
         <oasis:entry colname="col4"><inline-formula><mml:math id="M125" display="inline"><mml:mo>-</mml:mo></mml:math></inline-formula>0.14</oasis:entry>
       </oasis:row>
       <oasis:row>
         <oasis:entry colname="col1">Percent of forested area</oasis:entry>
         <oasis:entry colname="col2"><inline-formula><mml:math id="M126" display="inline"><mml:mrow><mml:mo>-</mml:mo><mml:mn mathvariant="normal">0.11</mml:mn></mml:mrow></mml:math></inline-formula></oasis:entry>
         <oasis:entry colname="col3">0.32</oasis:entry>
         <oasis:entry colname="col4">0.32</oasis:entry>
       </oasis:row>
     </oasis:tbody>
   </oasis:tgroup><?xmltex \end{scaleboxenv}?></oasis:table></table-wrap>

<?xmltex \hack{\newpage}?>
</sec>
<?pagebreak page3012?><sec id="Ch1.S4.SS7">
  <label>4.7</label><title>Limitations and future research</title>
      <p id="d1e3601">There are some notable limitations in our study and RF in general. The classification of watersheds into three flow regimes was based on the timing of the climatological mean of the annual flow volume, which can fluctuate from year to year. This is particularly true for the watersheds in the transient group for which streamflow is contributed by a mixture of runoff from winter rainfall and springtime snowmelt and the interannual variability is tremendous in both magnitude and timing <xref ref-type="bibr" rid="bib1.bibx39" id="paren.82"/>. Therefore, the membership in the classified watersheds from this group can vary. In fact, <xref ref-type="bibr" rid="bib1.bibx40" id="text.83"/> discussed the future shift of transient runoff watersheds towards rainfall-dominated in Washington State. Because we trained RF using the same input variables for all watersheds regardless of flow regimes and calculated performance criteria separately, the classification does not alter the results at individual watersheds.</p>
      <p id="d1e3610">In the study, we used estimated precipitation from PRISM, which is an interpolation product that combines data from various rain gauges from multiple networks. Despite possible introduced errors and uncertainty, we believe the use of a spatially distributed product better represents the areal estimation of precipitation over the watershed than a single rain gauge measurement. In a real-time forecast, this would be not be feasible due to the added time to compile and process such data. Similarly, we provided the RF model with a basin-average SWE from SNOTEL stations as an estimate of snowpack conditions. Using more spatially consistent SWE data such as those from the
Snow Data Assimilation System <xref ref-type="bibr" rid="bib1.bibx51" id="paren.84"/> product would potentially improve model accuracy. As our results indicate that RF can produce reasonable forecasts, potential future research could explore the sensitivity of the model using satellite-derived snow products with station data and even include <inline-formula><mml:math id="M127" display="inline"><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn mathvariant="normal">1</mml:mn></mml:mrow></mml:math></inline-formula> precipitation forecasts as a predictor in the model.</p>
      <p id="d1e3628">An inherent limitation of RF is the lack of direct uncertainty quantification in prediction. In our case, the forecasted streamflow using RF does not yield a standard error comparable to that provided by a traditional regression model, and hence there is no way to provide probabilistic confidence intervals for predictions. Methods to estimate confidence intervals have been proposed by <xref ref-type="bibr" rid="bib1.bibx75" id="text.85"/>, <xref ref-type="bibr" rid="bib1.bibx42" id="text.86"/>, and <xref ref-type="bibr" rid="bib1.bibx13" id="text.87"/>, but they are not widely applied. For future work, the computation of confidence intervals in RF prediction will be useful in addressing and understanding uncertainty.</p>
</sec>
</sec>
<sec id="Ch1.S5" sec-type="conclusions">
  <label>5</label><title>Conclusions</title>
      <p id="d1e3650">Accurate streamflow forecast has extensive applications across disciplines from water resources and planning to engineering design. In this study, we assessed the ability of RF to make daily streamflow forecasts at 86 watersheds in the Pacific Northwest Hydrologic Region. Key results are summarized below.
<list list-type="bullet"><list-item>
      <p id="d1e3655">Based on the KGE scores (ranging from 0.62 to 0.99), we show that RF is capable of producing skillful forecasts across all watersheds.</p></list-item><list-item>
      <p id="d1e3659">RF performs better in snowmelt-dominated watersheds, which can be attributed to stronger persistence in the streamflow time series. The largest improvements in forecast compared to the naïve model are found among rainfall-dominated watersheds.</p></list-item><list-item>
      <p id="d1e3663">The two approaches for measuring predictor importance yield noticeably different results. We recommend that interpretation of the these two measures should be coupled with understanding of the physical processes and how these processes are connected.</p></list-item><list-item>
      <p id="d1e3667">Increases in the steepness of the slope and the amount of sand content are found to deteriorate RF performance in two flow regime groups. This demonstrates that catchment characteristics can cause variability in performance of the model and should be considered in both predictor selection and evaluation of the model.</p></list-item></list>
Considering the current and future vulnerabilities of the Pacific Northwest to flooding caused by extreme precipitation and significant snowmelt events <xref ref-type="bibr" rid="bib1.bibx56" id="paren.88"/>, skillful streamflow forecasts can have important implications. Due to practical applications, RF and RF-based algorithms continue to gain popularity in hydrological studies <xref ref-type="bibr" rid="bib1.bibx71" id="paren.89"/>. Given the promising results from our study, RF can be used as part of an ensemble of models to achieve better generalization ability and accuracy not only in streamflow forecast but also in other water-related applications in this region.</p>
</sec>

      
      </body>
    <back><notes notes-type="codedataavailability"><title>Code and data availability</title>

      <p id="d1e3681">Example code for building a random forest model in R and data are available at <uri>https://github.com/leopham95/RandomForestStreamflowForecast</uri> <xref ref-type="bibr" rid="bib1.bibx54" id="paren.90"/>.</p>
  </notes><app-group>
        <supplementary-material position="anchor"><p id="d1e3690">The supplement related to this article is available online at: <inline-supplementary-material xlink:href="https://doi.org/10.5194/hess-25-2997-2021-supplement" xlink:title="pdf">https://doi.org/10.5194/hess-25-2997-2021-supplement</inline-supplementary-material>.</p></supplementary-material>
        </app-group><notes notes-type="authorcontribution"><title>Author contributions</title>

      <p id="d1e3699">LTP was responsible for conceptualization, data curation, formal analysis, funding acquisition, investigation, methodology, project administration, software, validation, visualization, and writing the original draft.
LL was responsible for conceptualization, investigation, methodology, funding acquisition, supervision, project administration, resources, and writing the original draft.
AOF was responsible for resources, supervision, funding acquisition, and writing the original draft.</p>
  </notes><notes notes-type="competinginterests"><title>Competing interests</title>

      <p id="d1e3705">The authors declare that they have no conflict of interest.</p>
  </notes><ack><title>Acknowledgements</title><p id="d1e3711">We wish to express deep gratitude to the researchers at the National Supercomputing Center in Wuxi, China, and Tyler Willson at Michigan State University for the initial brainstorming and project development.</p></ack><notes notes-type="financialsupport"><title>Financial support</title>

      <p id="d1e3716">Leo Triet Pham was supported by the Algorithms and Software for SUpercomputers with emerging aRchitEctures fellowship funded by the National Science Foundation (grant no. NSF-1827093). Lifeng Luo's effort was partially supported by the National Science Foundation (grant no. NSF-2006633).</p>
  </notes><notes notes-type="reviewstatement"><title>Review statement</title>

      <p id="d1e3722">This paper was edited by Dimitri Solomatine and reviewed by Francesco Avanzi and one anonymous referee.</p>
  </notes><ref-list>
    <title>References</title>

      <ref id="bib1.bibx1"><?xmltex \def\ref@label{{Adamowski(2008)}}?><label>Adamowski(2008)</label><?label adamowski2008development?><mixed-citation>
Adamowski, J. F.: Development of a short-term river flood forecasting method
for snowmelt driven floods based on wavelet and cross-wavelet analysis,
J. Hydrol., 353, 247–266, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx2"><?xmltex \def\ref@label{{Altman and Bland(1999)}}?><label>Altman and Bland(1999)</label><?label altman1999statistics?><mixed-citation>
Altman, D. G. and Bland, J. M.: Statistics notes Variables and parameters, Brit. Med. J.,
318, 1667, 1999.</mixed-citation></ref>
      <ref id="bib1.bibx3"><?xmltex \def\ref@label{{Aubert et~al.(2003)Aubert, Loumagne, and
Oudin}}?><label>Aubert et al.(2003)Aubert, Loumagne, and
Oudin</label><?label aubert2003sequential?><mixed-citation>
Aubert, D., Loumagne, C., and Oudin, L.: Sequential assimilation of soil
moisture and streamflow data in a conceptual rainfall–runoff model, J.
Hydrol., 280, 145–161, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx4"><?xmltex \def\ref@label{{Bernard et~al.(2009)Bernard, Heutte, and Adam}}?><label>Bernard et al.(2009)Bernard, Heutte, and Adam</label><?label bernard2009influence?><mixed-citation>
Bernard, S., Heutte, L., and Adam, S.: Influence of hyperparameters on random
forest accuracy, in: International Workshop on Multiple Classifier Systems,
Springer, Berlin, Heidelberg,  171–180, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx5"><?xmltex \def\ref@label{{Boyle et~al.(2000)Boyle, Gupta, and Sorooshian}}?><label>Boyle et al.(2000)Boyle, Gupta, and Sorooshian</label><?label boyle2000toward?><mixed-citation>
Boyle, D. P., Gupta, H. V., and Sorooshian, S.: Toward improved calibration of
hydrologic models: Combining the strengths of manual and automatic methods,
Water Resour. Res., 36, 3663–3674, 2000.</mixed-citation></ref>
      <ref id="bib1.bibx6"><?xmltex \def\ref@label{{Breiman(2001)}}?><label>Breiman(2001)</label><?label breiman2001random?><mixed-citation>
Breiman, L.: Random forests, Mach. Learn., 45, 5–32, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx7"><?xmltex \def\ref@label{{Breiman et~al.(1984)Breiman, Friedman, Stone, and
Olshen}}?><label>Breiman et al.(1984)Breiman, Friedman, Stone, and
Olshen</label><?label breiman1984classification?><mixed-citation>
Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A.: Classification and
regression trees, CRC Press, Boca Raton, Florida, 1984.</mixed-citation></ref>
      <ref id="bib1.bibx8"><?xmltex \def\ref@label{{Calle and Urrea(2010)}}?><label>Calle and Urrea(2010)</label><?label calle2010letter?><mixed-citation>
Calle, M. L. and Urrea, V.: Letter to the editor: stability of random forest
importance measures, Brief. Bioinform., 12, 86–89, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx9"><?xmltex \def\ref@label{{Carvalho et~al.(2019)Carvalho, Pereira, and
Cardoso}}?><label>Carvalho et al.(2019)Carvalho, Pereira, and
Cardoso</label><?label carvalho2019machine?><mixed-citation>Carvalho, D. V., Pereira, E. M., and Cardoso, J. S.: Machine learning
interpretability: A survey on methods and metrics, Electronics, 8, 832, <ext-link xlink:href="https://doi.org/10.3390/electronics8080832" ext-link-type="DOI">10.3390/electronics8080832</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx10"><?xmltex \def\ref@label{{Cayan et~al.(1999)Cayan, Redmond, and Riddle}}?><label>Cayan et al.(1999)Cayan, Redmond, and Riddle</label><?label cayan1999enso?><mixed-citation>
Cayan, D. R., Redmond, K. T., and Riddle, L. G.: ENSO and hydrologic extremes
in the western United States, J. Climate, 12, 2881–2893, 1999.</mixed-citation></ref>
      <ref id="bib1.bibx11"><?xmltex \def\ref@label{{Chen and Ishwaran(2012)}}?><label>Chen and Ishwaran(2012)</label><?label chen2012random?><mixed-citation>
Chen, X. and Ishwaran, H.: Random forests for genomic data analysis, Genomics,
99, 323–329, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx12"><?xmltex \def\ref@label{{Cho and Jacobs(2020)}}?><label>Cho and Jacobs(2020)</label><?label cho2020extreme?><mixed-citation>Cho, E. and Jacobs, J. M.: Extreme Value Snow Water Equivalent and Snowmelt for
Infrastructure Design over the Contiguous United States, Water Resou.
Res., 56, e2020WR028126, <ext-link xlink:href="https://doi.org/10.1029/2020WR028126" ext-link-type="DOI">10.1029/2020WR028126</ext-link>, 2020.</mixed-citation></ref>
      <ref id="bib1.bibx13"><?xmltex \def\ref@label{{Coulston et~al.(2016)Coulston, Blinn, Thomas, and
Wynne}}?><label>Coulston et al.(2016)Coulston, Blinn, Thomas, and
Wynne</label><?label coulston2016approximating?><mixed-citation>
Coulston, J. W., Blinn, C. E., Thomas, V. A., and Wynne, R. H.: Approximating
prediction uncertainty for random forest regression models, Photogramm.
Eng. Rem. S., 82, 189–197, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx14"><?xmltex \def\ref@label{{Dawson et~al.(2006)Dawson, Abrahart, Shamseldin, and
Wilby}}?><label>Dawson et al.(2006)Dawson, Abrahart, Shamseldin, and
Wilby</label><?label dawson2006flood?><mixed-citation>
Dawson, C. W., Abrahart, R. J., Shamseldin, A. Y., and Wilby, R. L.: Flood
estimation at ungauged sites using artificial neural networks, J.
Hydrol., 319, 391–409, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx15"><?xmltex \def\ref@label{{Di~Luzio et~al.(2008)Di~Luzio, Johnson, Daly, Eischeid, and
Arnold}}?><label>Di Luzio et al.(2008)Di Luzio, Johnson, Daly, Eischeid, and
Arnold</label><?label di2008constructing?><mixed-citation>
Di Luzio, M., Johnson, G. L., Daly, C., Eischeid, J. K., and Arnold, J. G.:
Constructing retrospective gridded daily precipitation and temperature
datasets for the conterminous United States, J. Appl. Meteorol.
Clim., 47, 475–497, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx16"><?xmltex \def\ref@label{{Dibike and Solomatine(2001)}}?><label>Dibike and Solomatine(2001)</label><?label dibike2001river?><mixed-citation>
Dibike, Y. B. and Solomatine, D. P.: River flow forecasting using artificial
neural networks, Phys. Chem. Earth Pt. B, 26, 1–7, 2001.</mixed-citation></ref>
      <ref id="bib1.bibx17"><?xmltex \def\ref@label{{Dingman(2015)}}?><label>Dingman(2015)</label><?label dingman2015physical?><mixed-citation>
Dingman, S. L.: Physical hydrology, Waveland Press, Long Grove, Illinois, 104–106, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx18"><?xmltex \def\ref@label{{Elsner et~al.(2010)Elsner, Cuo, Voisin, Deems, Hamlet, Vano,
Mickelson, Lee, and Lettenmaier}}?><label>Elsner et al.(2010)Elsner, Cuo, Voisin, Deems, Hamlet, Vano,
Mickelson, Lee, and Lettenmaier</label><?label elsner2010implications?><mixed-citation>
Elsner, M. M., Cuo, L., Voisin, N., Deems, J. S., Hamlet, A. F., Vano, J. A.,
Mickelson, K. E., Lee, S.-Y., and Lettenmaier, D. P.: Implications of 21st
century climate change for the hydrology of Washington State, Climatic
Change, 102, 225–260, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx19"><?xmltex \def\ref@label{{Falcone(2011)}}?><label>Falcone(2011)</label><?label falcone2011gages?><mixed-citation>Falcone, J. A.: GAGES-II: Geospatial attributes of gages for evaluating
streamflow, Tech. rep., US Geological Survey, <ext-link xlink:href="https://doi.org/10.3133/70046617" ext-link-type="DOI">10.3133/70046617</ext-link>, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx20"><?xmltex \def\ref@label{{Graham et~al.(2013)Graham, Barnard, Kavanagh, and
McNamara}}?><label>Graham et al.(2013)Graham, Barnard, Kavanagh, and
McNamara</label><?label graham2013catchment?><mixed-citation>
Graham, C. B., Barnard, H. R., Kavanagh, K. L., and McNamara, J. P.: Catchment
scale controls the temporal connection of transpiration and diel fluctuations
in streamflow, Hydrol. Process., 27, 2541–2556, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx21"><?xmltex \def\ref@label{{Gregorutti et~al.(2017)Gregorutti, Michel, and
Saint-Pierre}}?><label>Gregorutti et al.(2017)Gregorutti, Michel, and
Saint-Pierre</label><?label gregorutti2017correlation?><mixed-citation>
Gregorutti, B., Michel, B., and Saint-Pierre, P.: Correlation and variable
importance in random forests, Stat. Comput., 27, 659–678, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx22"><?xmltex \def\ref@label{{Gupta et~al.(1999)Gupta, Sorooshian, and Yapo}}?><label>Gupta et al.(1999)Gupta, Sorooshian, and Yapo</label><?label gupta1999status?><mixed-citation>
Gupta, H. V., Sorooshian, S., and Yapo, P. O.: Status of automatic calibration
for hydrologic models: Comparison with multilevel expert calibration, J.
Hydrol. Eng., 4, 135–143, 1999.</mixed-citation></ref>
      <ref id="bib1.bibx23"><?xmltex \def\ref@label{{Gupta et~al.(2009)Gupta, Kling, Yilmaz, and
Martinez}}?><label>Gupta et al.(2009)Gupta, Kling, Yilmaz, and
Martinez</label><?label gupta2009decomposition?><mixed-citation>
Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.: Decomposition of
the mean squared error and NSE performance criteria: Implications for
improving hydrological modelling, J. Hydrol., 377, 80–91, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx24"><?xmltex \def\ref@label{{Huang and Boutros(2016)}}?><label>Huang and Boutros(2016)</label><?label huang2016parameter?><mixed-citation>
Huang, B. F. and Boutros, P. C.: The parameter sensitivity of random forests,
BMC Bioinformatics, 17,  1–13, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx25"><?xmltex \def\ref@label{{Hwang et~al.(2012)Hwang, Ham, and Kim}}?><label>Hwang et al.(2012)Hwang, Ham, and Kim</label><?label hwang2012new?><mixed-citation>
Hwang, S. H., Ham, D. H., and Kim, J. H.: A new measure for assessing the
efficiency of hydrological data-driven forecasting models, Hydrolog.
Sci. J., 57, 1257–1274, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx26"><?xmltex \def\ref@label{{Ishwaran and Lu(2019)}}?><label>Ishwaran and Lu(2019)</label><?label ishwaran2019standard?><mixed-citation>
Ishwaran, H. and Lu, M.: Standard errors and confidence intervals for variable
importance in random forest regression, classification, and survival,
Stat. Med., 38, 558–582, 2019.</mixed-citation></ref>
      <?pagebreak page3014?><ref id="bib1.bibx27"><?xmltex \def\ref@label{{James et~al.(2013)James, Witten, Hastie, and
Tibshirani}}?><label>James et al.(2013)James, Witten, Hastie, and
Tibshirani</label><?label james2013introduction?><mixed-citation>
James, G., Witten, D., Hastie, T., and Tibshirani, R.: An introduction to statistical learning,Springer, New York, 113, 246–247, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx28"><?xmltex \def\ref@label{{Johnstone(2011)}}?><label>Johnstone(2011)</label><?label johnstone2011quasi?><mixed-citation>
Johnstone, J. A.: A quasi-biennial signal in western US hydroclimate and its
global teleconnections, Clim. Dynam., 36, 663–680, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx29"><?xmltex \def\ref@label{{Karran et~al.(2013)Karran, Morin, and Adamowski}}?><label>Karran et al.(2013)Karran, Morin, and Adamowski</label><?label karran2013multi?><mixed-citation>
Karran, D. J., Morin, E., and Adamowski, J.: Multi-step streamflow forecasting
using data-driven non-linear methods in contrasting climate regimes, J.
Hydroinform., 16, 671–689, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx30"><?xmltex \def\ref@label{{Knoben et~al.(2019)Knoben, Freer, and Woods}}?><label>Knoben et al.(2019)Knoben, Freer, and Woods</label><?label knoben2019inherent?><mixed-citation>Knoben, W. J. M., Freer, J. E., and Woods, R. A.: Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores, Hydrol. Earth Syst. Sci., 23, 4323–4331, <ext-link xlink:href="https://doi.org/10.5194/hess-23-4323-2019" ext-link-type="DOI">10.5194/hess-23-4323-2019</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx31"><?xmltex \def\ref@label{{Knowles et~al.(2006)Knowles, Dettinger, and
Cayan}}?><label>Knowles et al.(2006)Knowles, Dettinger, and
Cayan</label><?label knowles2006trends?><mixed-citation>
Knowles, N., Dettinger, M. D., and Cayan, D. R.: Trends in snowfall versus rainfall in the western United States, J. Climate, 19, 4545–4559, 2006.</mixed-citation></ref>
      <ref id="bib1.bibx32"><?xmltex \def\ref@label{{Knowles et~al.(2007)Knowles, Dettinger, and
Cayan}}?><label>Knowles et al.(2007)Knowles, Dettinger, and
Cayan</label><?label knowles2007trends?><mixed-citation>
Knowles, N., Dettinger, M., and Cayan, D.: Trends in snowfall versus rainfall for the western united states, 1949–2001, prepared for California energy commission public interest energy research program, Sacramento, California, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx33"><?xmltex \def\ref@label{{Kuhn et~al.(2008)}}?><label>Kuhn et al.(2008)</label><?label kuhn2008building?><mixed-citation>
Kuhn, M. et al.: Building predictive models in R using the caret package,
J. Stat. Softw., 28, 1–26, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx34"><?xmltex \def\ref@label{{Legates and McCabe(1999)}}?><label>Legates and McCabe(1999)</label><?label legates1999evaluating?><mixed-citation>
Legates, D. R. and McCabe Jr., G. J.: Evaluating the use of
“goodness-of-fit” measures in hydrologic and hydroclimatic model
validation, Water Resour. Res., 35, 233–241, 1999.</mixed-citation></ref>
      <ref id="bib1.bibx35"><?xmltex \def\ref@label{{Li et~al.(2017)Li, Wrzesien, Durand, Adam, and
Lettenmaier}}?><label>Li et al.(2017)Li, Wrzesien, Durand, Adam, and
Lettenmaier</label><?label li2017much?><mixed-citation>
Li, D., Wrzesien, M. L., Durand, M., Adam, J., and Lettenmaier, D. P.: How much
runoff originates as snow in the western United States, and how will that
change in the future?, Geophys. Res. Lett., 44, 6163–6172, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx36"><?xmltex \def\ref@label{{Li et~al.(2019)Li, Sha, and Wang}}?><label>Li et al.(2019)Li, Sha, and Wang</label><?label li2019comparison?><mixed-citation>
Li, X., Sha, J., and Wang, Z.-L.: Comparison of daily streamflow forecasts
using extreme learning machines and the random forest method, Hydrolog.
Sci. J., 64, 1857–1866, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx37"><?xmltex \def\ref@label{{Liaw and Wiener(2002)Liaw, Wiener et~al.}}?><label>Liaw and Wiener(2002)Liaw, Wiener et al.</label><?label liaw2002classification?><mixed-citation>
Liaw, A.  and Wiener, M.: : Classification and regression by randomForest, R
News, 2, 18–22, 2002.</mixed-citation></ref>
      <ref id="bib1.bibx38"><?xmltex \def\ref@label{{Louppe et~al.(2013)Louppe, Wehenkel, Sutera, and
Geurts}}?><label>Louppe et al.(2013)Louppe, Wehenkel, Sutera, and
Geurts</label><?label louppe2013understanding?><mixed-citation>
Louppe, G., Wehenkel, L., Sutera, A., and Geurts, P.: Understanding variable
importances in forests of randomized trees, in: Advances in neural
information processing systems,  26, 431–439, 2013.</mixed-citation></ref>
      <ref id="bib1.bibx39"><?xmltex \def\ref@label{{Lundquist et~al.(2009)Lundquist, Dettinger, Stewart, and
Cayan}}?><label>Lundquist et al.(2009)Lundquist, Dettinger, Stewart, and
Cayan</label><?label lundquist2009variability?><mixed-citation>
Lundquist, J. D., Dettinger, M. D., Stewart, I. T., and Cayan, D. R.:
Variability and trends in spring runoff in the western United States, Climate
warming in western North America: evidence and environmental effects,
University of Utah Press, Salt Lake City, Utah, USA, in: Climate Warming in Western North America: Evidence and Environmental Effects,  63–76, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx40"><?xmltex \def\ref@label{{Mantua et~al.(2009)Mantua, Tohver, and Hamlet}}?><label>Mantua et al.(2009)Mantua, Tohver, and Hamlet</label><?label mantua2009impacts?><mixed-citation>Mantua, N., Tohver, I., and Hamlet, A. F.:  Impacts of Climate Change on Key Aspects of Freshwater Salmon Habitat in Washington State, The Washington Climate Change Impacts Assessment: Evaluating Washington's Future in a Changing Climate, University of Washington Climate Impacts Group, Seattle, WA, <ext-link xlink:href="https://doi.org/10.7915/CIG6QZ23J" ext-link-type="DOI">10.7915/CIG6QZ23J</ext-link>, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx41"><?xmltex \def\ref@label{{Mass(2015)}}?><label>Mass(2015)</label><?label mass2015weather?><mixed-citation>
Mass, C.: The weather of the Pacific Northwest, University of Washington Press,  Seattle, Washington, 34–35,
2015.</mixed-citation></ref>
      <ref id="bib1.bibx42"><?xmltex \def\ref@label{{Mentch and Hooker(2016)}}?><label>Mentch and Hooker(2016)</label><?label mentch2016quantifying?><mixed-citation>
Mentch, L. and Hooker, G.: Quantifying uncertainty in random forests via
confidence intervals and hypothesis tests,   J. Mach. Learn.
Res., 17, 841–881, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx43"><?xmltex \def\ref@label{{Mittermaier(2008)}}?><label>Mittermaier(2008)</label><?label mittermaier2008potential?><mixed-citation>
Mittermaier, M. P.: The potential impact of using persistence as a reference
forecast on perceived forecast skill, Weather  Forecast., 23,
1022–1031, 2008.</mixed-citation></ref>
      <ref id="bib1.bibx44"><?xmltex \def\ref@label{{Mosavi et~al.(2018)Mosavi, Ozturk, and Chau}}?><label>Mosavi et al.(2018)Mosavi, Ozturk, and Chau</label><?label mosavi2018flood?><mixed-citation>Mosavi, A., Ozturk, P., and Chau, K.-w.: Flood prediction using machine
learning models: Literature review, Water, 10, 1536, <ext-link xlink:href="https://doi.org/10.3390/w10111536" ext-link-type="DOI">10.3390/w10111536</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx45"><?xmltex \def\ref@label{{Mote et~al.(2018)Mote, Li, Lettenmaier, Xiao, and
Engel}}?><label>Mote et al.(2018)Mote, Li, Lettenmaier, Xiao, and
Engel</label><?label mote2018dramatic?><mixed-citation>
Mote, P. W., Li, S., Lettenmaier, D. P., Xiao, M., and Engel, R.: Dramatic
declines in snowpack in the western US, NPJ Climate and Atmospheric Science,
1, 1–6, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx46"><?xmltex \def\ref@label{{Nicodemus(2011)}}?><label>Nicodemus(2011)</label><?label nicodemus2011letter?><mixed-citation>
Nicodemus, K. K.: Letter to the editor: On the stability and ranking of
predictors from random forest variable importance measures, Brief.
Bioinform., 12, 369–373, 2011.</mixed-citation></ref>
      <ref id="bib1.bibx47"><?xmltex \def\ref@label{{Obringer and Nateghi(2018)}}?><label>Obringer and Nateghi(2018)</label><?label obringer2018predicting?><mixed-citation>Obringer, R. and Nateghi, R.: Predicting urban reservoir levels using
statistical learning techniques, Sci. Rep.-UK 8, 5164, <ext-link xlink:href="https://doi.org/10.1038/s41598-018-23509-w" ext-link-type="DOI">10.1038/s41598-018-23509-w</ext-link>, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx48"><?xmltex \def\ref@label{{Oshiro et~al.(2012)Oshiro, Perez, and Baranauskas}}?><label>Oshiro et al.(2012)Oshiro, Perez, and Baranauskas</label><?label oshiro2012many?><mixed-citation>
Oshiro, T. M., Perez, P. S., and Baranauskas, J. A.: How many trees in a random
forest?, in: International workshop on machine learning and data mining in
pattern recognition, Springer, Berlin, Heidelberg, 154–168, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx49"><?xmltex \def\ref@label{{Pagano et~al.(2009)Pagano, Garen, Perkins, and
Pasteris}}?><label>Pagano et al.(2009)Pagano, Garen, Perkins, and
Pasteris</label><?label pagano2009daily?><mixed-citation>
Pagano, T. C., Garen, D. C., Perkins, T. R., and Pasteris, P. A.: Daily
updating of operational statistical seasonal water supply forecasts for the
western US 1, J. Am. Water Resour. As., 45,
767–778, 2009.</mixed-citation></ref>
      <ref id="bib1.bibx50"><?xmltex \def\ref@label{{Pal(2005)}}?><label>Pal(2005)</label><?label pal2005random?><mixed-citation>
Pal, M.: Random forest classifier for remote sensing classification,
Int. J. Remote Sens., 26, 217–222, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx51"><?xmltex \def\ref@label{{Pan et~al.(2003)Pan, Sheffield, Wood, Mitchell, Houser, Schaake,
Robock, Lohmann, Cosgrove, Duan et~al.}}?><label>Pan et al.(2003)Pan, Sheffield, Wood, Mitchell, Houser, Schaake,
Robock, Lohmann, Cosgrove, Duan et al.</label><?label pan2003snow?><mixed-citation>Pan, M., Sheffield, J., Wood, E. F., Mitchell, K. E., Houser, P. R., Schaake, J. C., Robock, A., Lohmann, D., Cosgrove, B., Duan, Q., and Luo, L.: Snow process
modeling in the North American Land Data Assimilation System (NLDAS): 2.
Evaluation of model simulated snow water equivalent, J. Geophys.
Res.-Atmos., 108, 8850, <ext-link xlink:href="https://doi.org/10.1029/2003JD003994" ext-link-type="DOI">10.1029/2003JD003994</ext-link>, 2003.</mixed-citation></ref>
      <ref id="bib1.bibx52"><?xmltex \def\ref@label{{Papacharalampous and Tyralis(2018)}}?><label>Papacharalampous and Tyralis(2018)</label><?label papacharalampous2018evaluation?><mixed-citation>
Papacharalampous, G. A. and Tyralis, H.: Evaluation of random forests and
Prophet for daily streamflow forecasting, Advances in Geosciences, 45,
201–208, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx53"><?xmltex \def\ref@label{{Payne et~al.(2004)Payne, Wood, Hamlet, Palmer, and
Lettenmaier}}?><label>Payne et al.(2004)Payne, Wood, Hamlet, Palmer, and
Lettenmaier</label><?label payne2004mitigating?><mixed-citation>
Payne, J. T., Wood, A. W., Hamlet, A. F., Palmer, R. N., and Lettenmaier,
D. P.: Mitigating the effects of climate change on the water resources of the
Columbia River basin, Climatic Change, 62, 233–256, 2004.</mixed-citation></ref>
      <ref id="bib1.bibx54"><?xmltex \def\ref@label{Pham(2020)}?><label>Pham(2020)</label><?label Pham2020?><mixed-citation>Pham, L. T.: Random Forest Streamflow Forecast (2020), GitHub, available at: <uri>https://github.com/leopham95/RandomForestStreamflowForecast</uri>, last access: 15 June 2020.</mixed-citation></ref>
      <ref id="bib1.bibx55"><?xmltex \def\ref@label{{Probst et~al.(2019)Probst, Wright, and
Boulesteix}}?><label>Probst et al.(2019)Probst, Wright, and
Boulesteix</label><?label probst2019hyperparameters?><mixed-citation>Probst, P., Wright, M. N., and Boulesteix, A.-L.: Hyperparameters and tuning
strategies for random forest, WIRES Data Min.
Knowl., 9, e1301, <ext-link xlink:href="https://doi.org/10.1002/widm.1301" ext-link-type="DOI">10.1002/widm.1301</ext-link>, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx56"><?xmltex \def\ref@label{{Ralph et~al.(2014)Ralph, Dettinger, White, Reynolds, Cayan,
Schneider, Cifelli, Redmond, Anderson, Gherke et~al.}}?><label>Ralph et al.(2014)Ralph, Dettinger, White, Reynolds, Cayan,
Schneider, Cifelli, Redmond, Anderson, Gherke et al.</label><?label ralph2014vision?><mixed-citation>
Ralph, F., Dettinger, M., White, A., Reynolds, D., Cayan, D., Schneider, T.,
Cifelli, R., Redmond, K., Anderson, M., Gherke, F.,   and Jones, J.: A vision for
future observations for western US extreme precipitation and flooding,
Journal of Contemporary Water Research &amp; Education, 153, 16–32, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx57"><?xmltex \def\ref@label{{Rasouli et~al.(2012)Rasouli, Hsieh, and Cannon}}?><label>Rasouli et al.(2012)Rasouli, Hsieh, and Cannon</label><?label rasouli2012daily?><mixed-citation>
Rasouli, K., Hsieh, W. W., and Cannon, A. J.: Daily streamflow forecasting by
machine learning methods with weather and climate inputs, J.
Hydrol., 414, 284–293, 2012.</mixed-citation></ref>
      <ref id="bib1.bibx58"><?xmltex \def\ref@label{{Regonda et~al.(2005)Regonda, Rajagopalan, Clark, and
Pitlick}}?><label>Regonda et al.(2005)Regonda, Rajagopalan, Clark, and
Pitlick</label><?label regonda2005seasonal?><mixed-citation>
Regonda, S. K., Rajagopalan, B., Clark, M., and Pitlick, J.: Seasonal cycle
shifts in hydroclimatology over the western United States, J.
Climate, 18, 372–384, 2005.</mixed-citation></ref>
      <ref id="bib1.bibx59"><?xmltex \def\ref@label{{Ribeiro et~al.(2016)Ribeiro, Singh, and Guestrin}}?><label>Ribeiro et al.(2016)Ribeiro, Singh, and Guestrin</label><?label ribeiro2016model?><mixed-citation>Ribeiro, M. T., Singh, S., and Guestrin, C.: Model-agnostic interpretability of
machine learning, arXiv [preprint],
<ext-link xlink:href="https://arxiv.org/abs/1606.05386">arXiv:1606.05386</ext-link>, last access: 16 June 2016.</mixed-citation></ref>
      <ref id="bib1.bibx60"><?xmltex \def\ref@label{{Safeeq et~al.(2014)Safeeq, Mauger, Grant, Arismendi, Hamlet, and
Lee}}?><label>Safeeq et al.(2014)Safeeq, Mauger, Grant, Arismendi, Hamlet, and
Lee</label><?label safeeq2014comparing?><mixed-citation>
Safeeq, M., Mauger, G. S., Grant, G. E., Arismendi, I., Hamlet, A. F., and Lee,
S.-Y.<?pagebreak page3015?>: Comparing large-scale hydrological model predictions with observed
streamflow in the Pacific Northwest: effects of climate and groundwater,
J. Hydrometeorol., 15, 2501–2521, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx61"><?xmltex \def\ref@label{{Salath{\'{e}}~Jr et~al.(2014)Salath{\'{e}}~Jr, Hamlet, Mass, Lee,
Stumbaugh, and Steed}}?><label>Salathé Jr et al.(2014)Salathé Jr, Hamlet, Mass, Lee,
Stumbaugh, and Steed</label><?label salathe2014estimates?><mixed-citation>
Salathé Jr, E. P., Hamlet, A. F., Mass, C. F., Lee, S.-Y., Stumbaugh, M.,
and Steed, R.: Estimates of twenty-first-century flood risk in the Pacific
Northwest based on regional climate model simulations, J.
Hydrometeorol., 15, 1881–1899, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx62"><?xmltex \def\ref@label{{Seibold et~al.(2018)Seibold, Bernau, Boulesteix, and
De~Bin}}?><label>Seibold et al.(2018)Seibold, Bernau, Boulesteix, and
De Bin</label><?label seibold2018choice?><mixed-citation>
Seibold, H., Bernau, C., Boulesteix, A.-L., and De Bin, R.: On the choice and
influence of the number of boosting steps for high-dimensional linear
Cox-models, Comput. Stat., 33, 1195–1215, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx63"><?xmltex \def\ref@label{{Shortridge et~al.(2016)Shortridge, Guikema, and
Zaitchik}}?><label>Shortridge et al.(2016)Shortridge, Guikema, and
Zaitchik</label><?label shortridge2016machine?><mixed-citation>Shortridge, J. E., Guikema, S. D., and Zaitchik, B. F.: Machine learning methods for empirical streamflow simulation: a comparison of model accuracy, interpretability, and uncertainty in seasonal watersheds, Hydrol. Earth Syst. Sci., 20, 2611–2628, <ext-link xlink:href="https://doi.org/10.5194/hess-20-2611-2016" ext-link-type="DOI">10.5194/hess-20-2611-2016</ext-link>, 2016.</mixed-citation></ref>
      <ref id="bib1.bibx64"><?xmltex \def\ref@label{{Shrikumar et~al.(2017)Shrikumar, Greenside, and
Kundaje}}?><label>Shrikumar et al.(2017)Shrikumar, Greenside, and
Kundaje</label><?label shrikumar2017learning?><mixed-citation>Shrikumar, A., Greenside, P., and Kundaje, A.: Learning important features
through propagating activation differences, arXiv [preprint],  <ext-link xlink:href="https://arxiv.org/abs/1704.02685">arXiv:1704.02685</ext-link>, last access:
17 July 2017.</mixed-citation></ref>
      <ref id="bib1.bibx65"><?xmltex \def\ref@label{{Sitterson et~al.(2018)Sitterson, Knightes, Parmar, Wolfe, Avant, and
Muche}}?><label>Sitterson et al.(2018)Sitterson, Knightes, Parmar, Wolfe, Avant, and
Muche</label><?label sitterson2018overview?><mixed-citation>
Sitterson, J., Knightes, C., Parmar, R., Wolfe, K., Avant, B., and Muche, M.:
An overview of rainfall-runoff model types,  EPA Office of Research and Development (8101R) Washington, DC 20460, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx66"><?xmltex \def\ref@label{{Srivastava et~al.(2017)Srivastava, Wu, Elliot, Brooks, and
Flanagan}}?><label>Srivastava et al.(2017)Srivastava, Wu, Elliot, Brooks, and
Flanagan</label><?label srivastava2017modeling?><mixed-citation>
Srivastava, A., Wu, J. Q., Elliot, W. J., Brooks, E. S., and Flanagan, D. C.:
Modeling streamflow in a snow-dominated forest watershed using the Water
Erosion Prediction Project (WEPP) model, T. ASABE, 60,
1171–1187, 2017.</mixed-citation></ref>
      <ref id="bib1.bibx67"><?xmltex \def\ref@label{{Strobl et~al.(2007)Strobl, Boulesteix, Zeileis, and
Hothorn}}?><label>Strobl et al.(2007)Strobl, Boulesteix, Zeileis, and
Hothorn</label><?label strobl2007bias?><mixed-citation>
Strobl, C., Boulesteix, A.-L., Zeileis, A., and Hothorn, T.: Bias in random
forest variable importance measures: Illustrations, sources and a solution,
BMC Bioinformatics, 8,  1–21, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx68"><?xmltex \def\ref@label{{Tohver et~al.(2014)Tohver, Hamlet, and Lee}}?><label>Tohver et al.(2014)Tohver, Hamlet, and Lee</label><?label tohver2014impacts?><mixed-citation>
Tohver, I. M., Hamlet, A. F., and Lee, S.-Y.: Impacts of 21st-century climate
change on hydrologic extremes in the Pacific Northwest region of North
America, J. Am. Water Resour. As., 50,
1461–1476, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx69"><?xmltex \def\ref@label{{Tolson and Shoemaker(2007)}}?><label>Tolson and Shoemaker(2007)</label><?label tolson2007dynamically?><mixed-citation>Tolson, B. A. and Shoemaker, C. A.: Dynamically dimensioned search algorithm
for computationally efficient watershed model calibration, Water Resour.
Res., 43, W01413, <ext-link xlink:href="https://doi.org/10.1029/2005WR004723" ext-link-type="DOI">10.1029/2005WR004723</ext-link>, 2007.</mixed-citation></ref>
      <ref id="bib1.bibx70"><?xmltex \def\ref@label{{Tongal and Booij(2018)}}?><label>Tongal and Booij(2018)</label><?label tongal2018simulation?><mixed-citation>
Tongal, H. and Booij, M. J.: Simulation and forecasting of streamflows using
machine learning models coupled with base flow separation, J.
Hydrol., 564, 266–282, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx71"><?xmltex \def\ref@label{{Tyralis et~al.(2019)Tyralis, Papacharalampous, and
Langousis}}?><label>Tyralis et al.(2019)Tyralis, Papacharalampous, and
Langousis</label><?label tyralis2019brief?><mixed-citation>
Tyralis, H., Papacharalampous, G., and Langousis, A.: A brief review of random
forests for water scientists and practitioners and their recent history in
water resources, Water, 11, p. 910, 2019.</mixed-citation></ref>
      <ref id="bib1.bibx72"><?xmltex \def\ref@label{{{U.S. Geological Survey}(2020)}}?><label>U.S. Geological Survey(2020)</label><?label usgs-nhd?><mixed-citation>U.S. Geological Survey: U.S. Geological Survey, 2019, National Hydrography
Dataset (ver. USGS National Hydrography Dataset Best Resolution (NHD) for
Hydrologic Unit (HU) 4 – 2001), available at:
<uri>https://www.usgs.gov/core-science-systems/ngp/national-hydrography/access-national-hydrography-products</uri> (last access: 6 June 2020),
2020.
</mixed-citation></ref><?xmltex \hack{\newpage}?>
      <ref id="bib1.bibx73"><?xmltex \def\ref@label{{Van~Rijn and Hutter(2018)}}?><label>Van Rijn and Hutter(2018)</label><?label van2018hyperparameter?><mixed-citation>
Van Rijn, J. N. and Hutter, F.: Hyperparameter importance across datasets, in:
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge
Discovery &amp; Data Mining,   2367–2376, 2018.</mixed-citation></ref>
      <ref id="bib1.bibx74"><?xmltex \def\ref@label{{Vano et~al.(2015)Vano, Nijssen, and Lettenmaier}}?><label>Vano et al.(2015)Vano, Nijssen, and Lettenmaier</label><?label vano2015seasonal?><mixed-citation>
Vano, J. A., Nijssen, B., and Lettenmaier, D. P.: Seasonal hydrologic responses
to climate change in the Pacific Northwest, Water Resour. Res., 51,
1959–1976, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx75"><?xmltex \def\ref@label{{Wager et~al.(2014)Wager, Hastie, and Efron}}?><label>Wager et al.(2014)Wager, Hastie, and Efron</label><?label wager2014confidence?><mixed-citation>
Wager, S., Hastie, T., and Efron, B.: Confidence intervals for random forests:
The jackknife and the infinitesimal jackknife,  J. Mach.
Learn. Research, 15, 1625–1651, 2014.</mixed-citation></ref>
      <ref id="bib1.bibx76"><?xmltex \def\ref@label{{Wang et~al.(2015)Wang, Lai, Chen, Yang, Zhao, and
Bai}}?><label>Wang et al.(2015)Wang, Lai, Chen, Yang, Zhao, and
Bai</label><?label wang2015flood?><mixed-citation>
Wang, Z., Lai, C., Chen, X., Yang, B., Zhao, S., and Bai, X.: Flood hazard risk
assessment model based on random forest, J. Hydrol., 527,
1130–1141, 2015.</mixed-citation></ref>
      <ref id="bib1.bibx77"><?xmltex \def\ref@label{{Wenger et~al.(2010)Wenger, Luce, Hamlet, Isaak, and
Neville}}?><label>Wenger et al.(2010)Wenger, Luce, Hamlet, Isaak, and
Neville</label><?label wenger2010macroscale?><mixed-citation>Wenger, S. J., Luce, C. H., Hamlet, A. F., Isaak, D. J., and Neville, H. M.:
Macroscale hydrologic modeling of ecologically relevant flow metrics, Water
Resour. Res., 46, W09513, <ext-link xlink:href="https://doi.org/10.1029/2009WR008839" ext-link-type="DOI">10.1029/2009WR008839</ext-link>, 2010.</mixed-citation></ref>
      <ref id="bib1.bibx78"><?xmltex \def\ref@label{{Wilcoxon et~al.(1970)Wilcoxon, Katti, and
Wilcox}}?><label>Wilcoxon et al.(1970)Wilcoxon, Katti, and
Wilcox</label><?label wilcoxon1970critical?><mixed-citation>
Wilcoxon, F., Katti, S., and Wilcox, R. A.: Critical values and probability
levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test,
Selected tables in mathematical statistics, 1, 171–259, 1970.</mixed-citation></ref>
      <ref id="bib1.bibx79"><?xmltex \def\ref@label{{Zheng et~al.(2018)Zheng, Wang, Zhou, Sun, and
Li}}?><label>Zheng et al.(2018)Zheng, Wang, Zhou, Sun, and
Li</label><?label zheng2018predictive?><mixed-citation>
Zheng, X., Wang, Q., Zhou, L., Sun, Q., and Li, Q.: Predictive Contributions of
Snowmelt and Rainfall to Streamflow Variations in the Western United States,
Adv. Meteorol., 2018, p. 14, 2018.</mixed-citation></ref>

  </ref-list></back>
    <!--<article-title-html>Evaluation of random forests for short-term daily streamflow forecasting in rainfall- and snowmelt-driven watersheds</article-title-html>
<abstract-html><p>In the past decades, data-driven machine-learning (ML) models have emerged as promising tools for short-term streamflow forecasting. Among other qualities, the popularity of ML models for such applications is due to their relative ease in implementation, less strict distributional assumption, and competitive computational and predictive performance. Despite the encouraging results, most applications of ML for streamflow forecasting have been limited to watersheds in which rainfall is the major source of runoff. In this study, we evaluate the potential of random forests (RFs), a popular ML method, to make streamflow forecasts at 1&thinsp;d of lead time at 86 watersheds in the Pacific Northwest. These watersheds cover diverse climatic conditions and physiographic settings and exhibit varied contributions of rainfall and snowmelt to their streamflow. Watersheds are classified into three hydrologic regimes based on the timing of center-of-annual flow volume: rainfall-dominated, transient, and snowmelt-dominated. RF performance is benchmarked against naïve  and multiple linear regression (MLR) models and evaluated using four criteria: coefficient of determination, root mean squared error, mean absolute error, and Kling–Gupta efficiency (KGE). Model evaluation scores suggest that the RF performs better in snowmelt-driven watersheds compared to rainfall-driven watersheds. The largest improvements in forecasts compared to benchmark models are found among rainfall-driven watersheds. RF performance deteriorates with increases in catchment slope and soil sandiness. We note disagreement between two popular measures of RF variable importance and recommend jointly considering these measures with the physical processes under study. These and other results presented provide new insights for effective application of RF-based streamflow forecasting.</p></abstract-html>
<ref-html id="bib1.bib1"><label>Adamowski(2008)</label><mixed-citation>
Adamowski, J. F.: Development of a short-term river flood forecasting method
for snowmelt driven floods based on wavelet and cross-wavelet analysis,
J. Hydrol., 353, 247–266, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib2"><label>Altman and Bland(1999)</label><mixed-citation>
Altman, D. G. and Bland, J. M.: Statistics notes Variables and parameters, Brit. Med. J.,
318, 1667, 1999.
</mixed-citation></ref-html>
<ref-html id="bib1.bib3"><label>Aubert et al.(2003)Aubert, Loumagne, and
Oudin</label><mixed-citation>
Aubert, D., Loumagne, C., and Oudin, L.: Sequential assimilation of soil
moisture and streamflow data in a conceptual rainfall–runoff model, J.
Hydrol., 280, 145–161, 2003.
</mixed-citation></ref-html>
<ref-html id="bib1.bib4"><label>Bernard et al.(2009)Bernard, Heutte, and Adam</label><mixed-citation>
Bernard, S., Heutte, L., and Adam, S.: Influence of hyperparameters on random
forest accuracy, in: International Workshop on Multiple Classifier Systems,
Springer, Berlin, Heidelberg,  171–180, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib5"><label>Boyle et al.(2000)Boyle, Gupta, and Sorooshian</label><mixed-citation>
Boyle, D. P., Gupta, H. V., and Sorooshian, S.: Toward improved calibration of
hydrologic models: Combining the strengths of manual and automatic methods,
Water Resour. Res., 36, 3663–3674, 2000.
</mixed-citation></ref-html>
<ref-html id="bib1.bib6"><label>Breiman(2001)</label><mixed-citation>
Breiman, L.: Random forests, Mach. Learn., 45, 5–32, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib7"><label>Breiman et al.(1984)Breiman, Friedman, Stone, and
Olshen</label><mixed-citation>
Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A.: Classification and
regression trees, CRC Press, Boca Raton, Florida, 1984.
</mixed-citation></ref-html>
<ref-html id="bib1.bib8"><label>Calle and Urrea(2010)</label><mixed-citation>
Calle, M. L. and Urrea, V.: Letter to the editor: stability of random forest
importance measures, Brief. Bioinform., 12, 86–89, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib9"><label>Carvalho et al.(2019)Carvalho, Pereira, and
Cardoso</label><mixed-citation>
Carvalho, D. V., Pereira, E. M., and Cardoso, J. S.: Machine learning
interpretability: A survey on methods and metrics, Electronics, 8, 832, <a href="https://doi.org/10.3390/electronics8080832" target="_blank">https://doi.org/10.3390/electronics8080832</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib10"><label>Cayan et al.(1999)Cayan, Redmond, and Riddle</label><mixed-citation>
Cayan, D. R., Redmond, K. T., and Riddle, L. G.: ENSO and hydrologic extremes
in the western United States, J. Climate, 12, 2881–2893, 1999.
</mixed-citation></ref-html>
<ref-html id="bib1.bib11"><label>Chen and Ishwaran(2012)</label><mixed-citation>
Chen, X. and Ishwaran, H.: Random forests for genomic data analysis, Genomics,
99, 323–329, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib12"><label>Cho and Jacobs(2020)</label><mixed-citation>
Cho, E. and Jacobs, J. M.: Extreme Value Snow Water Equivalent and Snowmelt for
Infrastructure Design over the Contiguous United States, Water Resou.
Res., 56, e2020WR028126, <a href="https://doi.org/10.1029/2020WR028126" target="_blank">https://doi.org/10.1029/2020WR028126</a>, 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib13"><label>Coulston et al.(2016)Coulston, Blinn, Thomas, and
Wynne</label><mixed-citation>
Coulston, J. W., Blinn, C. E., Thomas, V. A., and Wynne, R. H.: Approximating
prediction uncertainty for random forest regression models, Photogramm.
Eng. Rem. S., 82, 189–197, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib14"><label>Dawson et al.(2006)Dawson, Abrahart, Shamseldin, and
Wilby</label><mixed-citation>
Dawson, C. W., Abrahart, R. J., Shamseldin, A. Y., and Wilby, R. L.: Flood
estimation at ungauged sites using artificial neural networks, J.
Hydrol., 319, 391–409, 2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib15"><label>Di Luzio et al.(2008)Di Luzio, Johnson, Daly, Eischeid, and
Arnold</label><mixed-citation>
Di Luzio, M., Johnson, G. L., Daly, C., Eischeid, J. K., and Arnold, J. G.:
Constructing retrospective gridded daily precipitation and temperature
datasets for the conterminous United States, J. Appl. Meteorol.
Clim., 47, 475–497, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib16"><label>Dibike and Solomatine(2001)</label><mixed-citation>
Dibike, Y. B. and Solomatine, D. P.: River flow forecasting using artificial
neural networks, Phys. Chem. Earth Pt. B, 26, 1–7, 2001.
</mixed-citation></ref-html>
<ref-html id="bib1.bib17"><label>Dingman(2015)</label><mixed-citation>
Dingman, S. L.: Physical hydrology, Waveland Press, Long Grove, Illinois, 104–106, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib18"><label>Elsner et al.(2010)Elsner, Cuo, Voisin, Deems, Hamlet, Vano,
Mickelson, Lee, and Lettenmaier</label><mixed-citation>
Elsner, M. M., Cuo, L., Voisin, N., Deems, J. S., Hamlet, A. F., Vano, J. A.,
Mickelson, K. E., Lee, S.-Y., and Lettenmaier, D. P.: Implications of 21st
century climate change for the hydrology of Washington State, Climatic
Change, 102, 225–260, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib19"><label>Falcone(2011)</label><mixed-citation>
Falcone, J. A.: GAGES-II: Geospatial attributes of gages for evaluating
streamflow, Tech. rep., US Geological Survey, <a href="https://doi.org/10.3133/70046617" target="_blank">https://doi.org/10.3133/70046617</a>, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib20"><label>Graham et al.(2013)Graham, Barnard, Kavanagh, and
McNamara</label><mixed-citation>
Graham, C. B., Barnard, H. R., Kavanagh, K. L., and McNamara, J. P.: Catchment
scale controls the temporal connection of transpiration and diel fluctuations
in streamflow, Hydrol. Process., 27, 2541–2556, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib21"><label>Gregorutti et al.(2017)Gregorutti, Michel, and
Saint-Pierre</label><mixed-citation>
Gregorutti, B., Michel, B., and Saint-Pierre, P.: Correlation and variable
importance in random forests, Stat. Comput., 27, 659–678, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib22"><label>Gupta et al.(1999)Gupta, Sorooshian, and Yapo</label><mixed-citation>
Gupta, H. V., Sorooshian, S., and Yapo, P. O.: Status of automatic calibration
for hydrologic models: Comparison with multilevel expert calibration, J.
Hydrol. Eng., 4, 135–143, 1999.
</mixed-citation></ref-html>
<ref-html id="bib1.bib23"><label>Gupta et al.(2009)Gupta, Kling, Yilmaz, and
Martinez</label><mixed-citation>
Gupta, H. V., Kling, H., Yilmaz, K. K., and Martinez, G. F.: Decomposition of
the mean squared error and NSE performance criteria: Implications for
improving hydrological modelling, J. Hydrol., 377, 80–91, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib24"><label>Huang and Boutros(2016)</label><mixed-citation>
Huang, B. F. and Boutros, P. C.: The parameter sensitivity of random forests,
BMC Bioinformatics, 17,  1–13, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib25"><label>Hwang et al.(2012)Hwang, Ham, and Kim</label><mixed-citation>
Hwang, S. H., Ham, D. H., and Kim, J. H.: A new measure for assessing the
efficiency of hydrological data-driven forecasting models, Hydrolog.
Sci. J., 57, 1257–1274, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib26"><label>Ishwaran and Lu(2019)</label><mixed-citation>
Ishwaran, H. and Lu, M.: Standard errors and confidence intervals for variable
importance in random forest regression, classification, and survival,
Stat. Med., 38, 558–582, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib27"><label>James et al.(2013)James, Witten, Hastie, and
Tibshirani</label><mixed-citation>
James, G., Witten, D., Hastie, T., and Tibshirani, R.: An introduction to statistical learning,Springer, New York, 113, 246–247, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib28"><label>Johnstone(2011)</label><mixed-citation>
Johnstone, J. A.: A quasi-biennial signal in western US hydroclimate and its
global teleconnections, Clim. Dynam., 36, 663–680, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib29"><label>Karran et al.(2013)Karran, Morin, and Adamowski</label><mixed-citation>
Karran, D. J., Morin, E., and Adamowski, J.: Multi-step streamflow forecasting
using data-driven non-linear methods in contrasting climate regimes, J.
Hydroinform., 16, 671–689, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib30"><label>Knoben et al.(2019)Knoben, Freer, and Woods</label><mixed-citation>
Knoben, W. J. M., Freer, J. E., and Woods, R. A.: Technical note: Inherent benchmark or not? Comparing Nash–Sutcliffe and Kling–Gupta efficiency scores, Hydrol. Earth Syst. Sci., 23, 4323–4331, <a href="https://doi.org/10.5194/hess-23-4323-2019" target="_blank">https://doi.org/10.5194/hess-23-4323-2019</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib31"><label>Knowles et al.(2006)Knowles, Dettinger, and
Cayan</label><mixed-citation>
Knowles, N., Dettinger, M. D., and Cayan, D. R.: Trends in snowfall versus rainfall in the western United States, J. Climate, 19, 4545–4559, 2006.
</mixed-citation></ref-html>
<ref-html id="bib1.bib32"><label>Knowles et al.(2007)Knowles, Dettinger, and
Cayan</label><mixed-citation>
Knowles, N., Dettinger, M., and Cayan, D.: Trends in snowfall versus rainfall for the western united states, 1949–2001, prepared for California energy commission public interest energy research program, Sacramento, California, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib33"><label>Kuhn et al.(2008)</label><mixed-citation>
Kuhn, M. et al.: Building predictive models in R using the caret package,
J. Stat. Softw., 28, 1–26, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib34"><label>Legates and McCabe(1999)</label><mixed-citation>
Legates, D. R. and McCabe Jr., G. J.: Evaluating the use of
“goodness-of-fit” measures in hydrologic and hydroclimatic model
validation, Water Resour. Res., 35, 233–241, 1999.
</mixed-citation></ref-html>
<ref-html id="bib1.bib35"><label>Li et al.(2017)Li, Wrzesien, Durand, Adam, and
Lettenmaier</label><mixed-citation>
Li, D., Wrzesien, M. L., Durand, M., Adam, J., and Lettenmaier, D. P.: How much
runoff originates as snow in the western United States, and how will that
change in the future?, Geophys. Res. Lett., 44, 6163–6172, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib36"><label>Li et al.(2019)Li, Sha, and Wang</label><mixed-citation>
Li, X., Sha, J., and Wang, Z.-L.: Comparison of daily streamflow forecasts
using extreme learning machines and the random forest method, Hydrolog.
Sci. J., 64, 1857–1866, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib37"><label>Liaw and Wiener(2002)Liaw, Wiener et al.</label><mixed-citation>
Liaw, A.  and Wiener, M.: : Classification and regression by randomForest, R
News, 2, 18–22, 2002.
</mixed-citation></ref-html>
<ref-html id="bib1.bib38"><label>Louppe et al.(2013)Louppe, Wehenkel, Sutera, and
Geurts</label><mixed-citation>
Louppe, G., Wehenkel, L., Sutera, A., and Geurts, P.: Understanding variable
importances in forests of randomized trees, in: Advances in neural
information processing systems,  26, 431–439, 2013.
</mixed-citation></ref-html>
<ref-html id="bib1.bib39"><label>Lundquist et al.(2009)Lundquist, Dettinger, Stewart, and
Cayan</label><mixed-citation>
Lundquist, J. D., Dettinger, M. D., Stewart, I. T., and Cayan, D. R.:
Variability and trends in spring runoff in the western United States, Climate
warming in western North America: evidence and environmental effects,
University of Utah Press, Salt Lake City, Utah, USA, in: Climate Warming in Western North America: Evidence and Environmental Effects,  63–76, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib40"><label>Mantua et al.(2009)Mantua, Tohver, and Hamlet</label><mixed-citation>
Mantua, N., Tohver, I., and Hamlet, A. F.:  Impacts of Climate Change on Key Aspects of Freshwater Salmon Habitat in Washington State, The Washington Climate Change Impacts Assessment: Evaluating Washington's Future in a Changing Climate, University of Washington Climate Impacts Group, Seattle, WA, <a href="https://doi.org/10.7915/CIG6QZ23J" target="_blank">https://doi.org/10.7915/CIG6QZ23J</a>, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib41"><label>Mass(2015)</label><mixed-citation>
Mass, C.: The weather of the Pacific Northwest, University of Washington Press,  Seattle, Washington, 34–35,
2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib42"><label>Mentch and Hooker(2016)</label><mixed-citation>
Mentch, L. and Hooker, G.: Quantifying uncertainty in random forests via
confidence intervals and hypothesis tests,   J. Mach. Learn.
Res., 17, 841–881, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib43"><label>Mittermaier(2008)</label><mixed-citation>
Mittermaier, M. P.: The potential impact of using persistence as a reference
forecast on perceived forecast skill, Weather  Forecast., 23,
1022–1031, 2008.
</mixed-citation></ref-html>
<ref-html id="bib1.bib44"><label>Mosavi et al.(2018)Mosavi, Ozturk, and Chau</label><mixed-citation>
Mosavi, A., Ozturk, P., and Chau, K.-w.: Flood prediction using machine
learning models: Literature review, Water, 10, 1536, <a href="https://doi.org/10.3390/w10111536" target="_blank">https://doi.org/10.3390/w10111536</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib45"><label>Mote et al.(2018)Mote, Li, Lettenmaier, Xiao, and
Engel</label><mixed-citation>
Mote, P. W., Li, S., Lettenmaier, D. P., Xiao, M., and Engel, R.: Dramatic
declines in snowpack in the western US, NPJ Climate and Atmospheric Science,
1, 1–6, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib46"><label>Nicodemus(2011)</label><mixed-citation>
Nicodemus, K. K.: Letter to the editor: On the stability and ranking of
predictors from random forest variable importance measures, Brief.
Bioinform., 12, 369–373, 2011.
</mixed-citation></ref-html>
<ref-html id="bib1.bib47"><label>Obringer and Nateghi(2018)</label><mixed-citation>
Obringer, R. and Nateghi, R.: Predicting urban reservoir levels using
statistical learning techniques, Sci. Rep.-UK 8, 5164, <a href="https://doi.org/10.1038/s41598-018-23509-w" target="_blank">https://doi.org/10.1038/s41598-018-23509-w</a>, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib48"><label>Oshiro et al.(2012)Oshiro, Perez, and Baranauskas</label><mixed-citation>
Oshiro, T. M., Perez, P. S., and Baranauskas, J. A.: How many trees in a random
forest?, in: International workshop on machine learning and data mining in
pattern recognition, Springer, Berlin, Heidelberg, 154–168, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib49"><label>Pagano et al.(2009)Pagano, Garen, Perkins, and
Pasteris</label><mixed-citation>
Pagano, T. C., Garen, D. C., Perkins, T. R., and Pasteris, P. A.: Daily
updating of operational statistical seasonal water supply forecasts for the
western US 1, J. Am. Water Resour. As., 45,
767–778, 2009.
</mixed-citation></ref-html>
<ref-html id="bib1.bib50"><label>Pal(2005)</label><mixed-citation>
Pal, M.: Random forest classifier for remote sensing classification,
Int. J. Remote Sens., 26, 217–222, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib51"><label>Pan et al.(2003)Pan, Sheffield, Wood, Mitchell, Houser, Schaake,
Robock, Lohmann, Cosgrove, Duan et al.</label><mixed-citation>
Pan, M., Sheffield, J., Wood, E. F., Mitchell, K. E., Houser, P. R., Schaake, J. C., Robock, A., Lohmann, D., Cosgrove, B., Duan, Q., and Luo, L.: Snow process
modeling in the North American Land Data Assimilation System (NLDAS): 2.
Evaluation of model simulated snow water equivalent, J. Geophys.
Res.-Atmos., 108, 8850, <a href="https://doi.org/10.1029/2003JD003994" target="_blank">https://doi.org/10.1029/2003JD003994</a>, 2003.
</mixed-citation></ref-html>
<ref-html id="bib1.bib52"><label>Papacharalampous and Tyralis(2018)</label><mixed-citation>
Papacharalampous, G. A. and Tyralis, H.: Evaluation of random forests and
Prophet for daily streamflow forecasting, Advances in Geosciences, 45,
201–208, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib53"><label>Payne et al.(2004)Payne, Wood, Hamlet, Palmer, and
Lettenmaier</label><mixed-citation>
Payne, J. T., Wood, A. W., Hamlet, A. F., Palmer, R. N., and Lettenmaier,
D. P.: Mitigating the effects of climate change on the water resources of the
Columbia River basin, Climatic Change, 62, 233–256, 2004.
</mixed-citation></ref-html>
<ref-html id="bib1.bib54"><label>Pham(2020)</label><mixed-citation>
Pham, L. T.: Random Forest Streamflow Forecast (2020), GitHub, available at: <a href="https://github.com/leopham95/RandomForestStreamflowForecast" target="_blank"/>, last access: 15 June 2020.
</mixed-citation></ref-html>
<ref-html id="bib1.bib55"><label>Probst et al.(2019)Probst, Wright, and
Boulesteix</label><mixed-citation>
Probst, P., Wright, M. N., and Boulesteix, A.-L.: Hyperparameters and tuning
strategies for random forest, WIRES Data Min.
Knowl., 9, e1301, <a href="https://doi.org/10.1002/widm.1301" target="_blank">https://doi.org/10.1002/widm.1301</a>, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib56"><label>Ralph et al.(2014)Ralph, Dettinger, White, Reynolds, Cayan,
Schneider, Cifelli, Redmond, Anderson, Gherke et al.</label><mixed-citation>
Ralph, F., Dettinger, M., White, A., Reynolds, D., Cayan, D., Schneider, T.,
Cifelli, R., Redmond, K., Anderson, M., Gherke, F.,   and Jones, J.: A vision for
future observations for western US extreme precipitation and flooding,
Journal of Contemporary Water Research &amp; Education, 153, 16–32, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib57"><label>Rasouli et al.(2012)Rasouli, Hsieh, and Cannon</label><mixed-citation>
Rasouli, K., Hsieh, W. W., and Cannon, A. J.: Daily streamflow forecasting by
machine learning methods with weather and climate inputs, J.
Hydrol., 414, 284–293, 2012.
</mixed-citation></ref-html>
<ref-html id="bib1.bib58"><label>Regonda et al.(2005)Regonda, Rajagopalan, Clark, and
Pitlick</label><mixed-citation>
Regonda, S. K., Rajagopalan, B., Clark, M., and Pitlick, J.: Seasonal cycle
shifts in hydroclimatology over the western United States, J.
Climate, 18, 372–384, 2005.
</mixed-citation></ref-html>
<ref-html id="bib1.bib59"><label>Ribeiro et al.(2016)Ribeiro, Singh, and Guestrin</label><mixed-citation>
Ribeiro, M. T., Singh, S., and Guestrin, C.: Model-agnostic interpretability of
machine learning, arXiv [preprint],
<a href="https://arxiv.org/abs/1606.05386" target="_blank">arXiv:1606.05386</a>, last access: 16 June 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib60"><label>Safeeq et al.(2014)Safeeq, Mauger, Grant, Arismendi, Hamlet, and
Lee</label><mixed-citation>
Safeeq, M., Mauger, G. S., Grant, G. E., Arismendi, I., Hamlet, A. F., and Lee,
S.-Y.: Comparing large-scale hydrological model predictions with observed
streamflow in the Pacific Northwest: effects of climate and groundwater,
J. Hydrometeorol., 15, 2501–2521, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib61"><label>Salathé Jr et al.(2014)Salathé Jr, Hamlet, Mass, Lee,
Stumbaugh, and Steed</label><mixed-citation>
Salathé Jr, E. P., Hamlet, A. F., Mass, C. F., Lee, S.-Y., Stumbaugh, M.,
and Steed, R.: Estimates of twenty-first-century flood risk in the Pacific
Northwest based on regional climate model simulations, J.
Hydrometeorol., 15, 1881–1899, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib62"><label>Seibold et al.(2018)Seibold, Bernau, Boulesteix, and
De Bin</label><mixed-citation>
Seibold, H., Bernau, C., Boulesteix, A.-L., and De Bin, R.: On the choice and
influence of the number of boosting steps for high-dimensional linear
Cox-models, Comput. Stat., 33, 1195–1215, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib63"><label>Shortridge et al.(2016)Shortridge, Guikema, and
Zaitchik</label><mixed-citation>
Shortridge, J. E., Guikema, S. D., and Zaitchik, B. F.: Machine learning methods for empirical streamflow simulation: a comparison of model accuracy, interpretability, and uncertainty in seasonal watersheds, Hydrol. Earth Syst. Sci., 20, 2611–2628, <a href="https://doi.org/10.5194/hess-20-2611-2016" target="_blank">https://doi.org/10.5194/hess-20-2611-2016</a>, 2016.
</mixed-citation></ref-html>
<ref-html id="bib1.bib64"><label>Shrikumar et al.(2017)Shrikumar, Greenside, and
Kundaje</label><mixed-citation>
Shrikumar, A., Greenside, P., and Kundaje, A.: Learning important features
through propagating activation differences, arXiv [preprint],  <a href="https://arxiv.org/abs/1704.02685" target="_blank">arXiv:1704.02685</a>, last access:
17 July 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib65"><label>Sitterson et al.(2018)Sitterson, Knightes, Parmar, Wolfe, Avant, and
Muche</label><mixed-citation>
Sitterson, J., Knightes, C., Parmar, R., Wolfe, K., Avant, B., and Muche, M.:
An overview of rainfall-runoff model types,  EPA Office of Research and Development (8101R) Washington, DC 20460, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib66"><label>Srivastava et al.(2017)Srivastava, Wu, Elliot, Brooks, and
Flanagan</label><mixed-citation>
Srivastava, A., Wu, J. Q., Elliot, W. J., Brooks, E. S., and Flanagan, D. C.:
Modeling streamflow in a snow-dominated forest watershed using the Water
Erosion Prediction Project (WEPP) model, T. ASABE, 60,
1171–1187, 2017.
</mixed-citation></ref-html>
<ref-html id="bib1.bib67"><label>Strobl et al.(2007)Strobl, Boulesteix, Zeileis, and
Hothorn</label><mixed-citation>
Strobl, C., Boulesteix, A.-L., Zeileis, A., and Hothorn, T.: Bias in random
forest variable importance measures: Illustrations, sources and a solution,
BMC Bioinformatics, 8,  1–21, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib68"><label>Tohver et al.(2014)Tohver, Hamlet, and Lee</label><mixed-citation>
Tohver, I. M., Hamlet, A. F., and Lee, S.-Y.: Impacts of 21st-century climate
change on hydrologic extremes in the Pacific Northwest region of North
America, J. Am. Water Resour. As., 50,
1461–1476, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib69"><label>Tolson and Shoemaker(2007)</label><mixed-citation>
Tolson, B. A. and Shoemaker, C. A.: Dynamically dimensioned search algorithm
for computationally efficient watershed model calibration, Water Resour.
Res., 43, W01413, <a href="https://doi.org/10.1029/2005WR004723" target="_blank">https://doi.org/10.1029/2005WR004723</a>, 2007.
</mixed-citation></ref-html>
<ref-html id="bib1.bib70"><label>Tongal and Booij(2018)</label><mixed-citation>
Tongal, H. and Booij, M. J.: Simulation and forecasting of streamflows using
machine learning models coupled with base flow separation, J.
Hydrol., 564, 266–282, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib71"><label>Tyralis et al.(2019)Tyralis, Papacharalampous, and
Langousis</label><mixed-citation>
Tyralis, H., Papacharalampous, G., and Langousis, A.: A brief review of random
forests for water scientists and practitioners and their recent history in
water resources, Water, 11, p. 910, 2019.
</mixed-citation></ref-html>
<ref-html id="bib1.bib72"><label>U.S. Geological Survey(2020)</label><mixed-citation>
U.S. Geological Survey: U.S. Geological Survey, 2019, National Hydrography
Dataset (ver. USGS National Hydrography Dataset Best Resolution (NHD) for
Hydrologic Unit (HU) 4 – 2001), available at:
<a href="https://www.usgs.gov/core-science-systems/ngp/national-hydrography/access-national-hydrography-products" target="_blank"/> (last access: 6 June 2020),
2020.

</mixed-citation></ref-html>
<ref-html id="bib1.bib73"><label>Van Rijn and Hutter(2018)</label><mixed-citation>
Van Rijn, J. N. and Hutter, F.: Hyperparameter importance across datasets, in:
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge
Discovery &amp; Data Mining,   2367–2376, 2018.
</mixed-citation></ref-html>
<ref-html id="bib1.bib74"><label>Vano et al.(2015)Vano, Nijssen, and Lettenmaier</label><mixed-citation>
Vano, J. A., Nijssen, B., and Lettenmaier, D. P.: Seasonal hydrologic responses
to climate change in the Pacific Northwest, Water Resour. Res., 51,
1959–1976, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib75"><label>Wager et al.(2014)Wager, Hastie, and Efron</label><mixed-citation>
Wager, S., Hastie, T., and Efron, B.: Confidence intervals for random forests:
The jackknife and the infinitesimal jackknife,  J. Mach.
Learn. Research, 15, 1625–1651, 2014.
</mixed-citation></ref-html>
<ref-html id="bib1.bib76"><label>Wang et al.(2015)Wang, Lai, Chen, Yang, Zhao, and
Bai</label><mixed-citation>
Wang, Z., Lai, C., Chen, X., Yang, B., Zhao, S., and Bai, X.: Flood hazard risk
assessment model based on random forest, J. Hydrol., 527,
1130–1141, 2015.
</mixed-citation></ref-html>
<ref-html id="bib1.bib77"><label>Wenger et al.(2010)Wenger, Luce, Hamlet, Isaak, and
Neville</label><mixed-citation>
Wenger, S. J., Luce, C. H., Hamlet, A. F., Isaak, D. J., and Neville, H. M.:
Macroscale hydrologic modeling of ecologically relevant flow metrics, Water
Resour. Res., 46, W09513, <a href="https://doi.org/10.1029/2009WR008839" target="_blank">https://doi.org/10.1029/2009WR008839</a>, 2010.
</mixed-citation></ref-html>
<ref-html id="bib1.bib78"><label>Wilcoxon et al.(1970)Wilcoxon, Katti, and
Wilcox</label><mixed-citation>
Wilcoxon, F., Katti, S., and Wilcox, R. A.: Critical values and probability
levels for the Wilcoxon rank sum test and the Wilcoxon signed rank test,
Selected tables in mathematical statistics, 1, 171–259, 1970.
</mixed-citation></ref-html>
<ref-html id="bib1.bib79"><label>Zheng et al.(2018)Zheng, Wang, Zhou, Sun, and
Li</label><mixed-citation>
Zheng, X., Wang, Q., Zhou, L., Sun, Q., and Li, Q.: Predictive Contributions of
Snowmelt and Rainfall to Streamflow Variations in the Western United States,
Adv. Meteorol., 2018, p. 14, 2018.
</mixed-citation></ref-html>--></article>
