Surface water storage influences streamflow signatures

Vanderhoof, Melanie K.; Nieuwlandt, Peter; Golden, Heather E.; Lane, Charles R.; Christensen, Jay R.; Keenan, Will; Dolan, Wayana

doi:https://doi.org/10.5194/hess-2024-119

Preprints

https://doi.org/10.5194/hess-2024-119

Preprints

06 May 2024

| 06 May 2024

Status: this discussion paper is a preprint. It has been under review for the journal Hydrology and Earth System Sciences (HESS). The manuscript was not accepted for further review after discussion.

Surface water storage influences streamflow signatures

Melanie K. Vanderhoof, Peter Nieuwlandt, Heather E. Golden, Charles R. Lane, Jay R. Christensen, Will Keenan, and Wayana Dolan

Abstract. Extreme flow conditions in river discharge have far-reaching environmental and economic consequences. The retention of surface water in lakes, wetlands, and floodplains can potentially moderate these extreme flows by modifying the timing, duration, and magnitude of flow generation. However, efforts to characterize the impact of surface water storage on river discharge have been limited in geographic extent. In this analysis, a suite of hydrologic signatures, quantifying components of watershed flow regimes, was calculated from daily discharge at 72 gaged watersheds across the conterminous United States. Random forest models were developed to explain variability in six hydrologic signatures related to flashiness and high and low flow conditions. In addition to traditionally considered variables such as climate, land cover, topography, and geology, a novel remote sensing (Sentinel-1 & 2) approach was used to study the contribution of surface water storage dynamics to each signature's variability. While climate variables explained much of the variability in the hydrologic signatures, models for five of the six signatures showed an improvement in explanatory power when landscape characteristics were added. Automated variable selection is part of the modeling process and can be indicative of the relative importance of certain variables over others. When all variables were considered, four of the six signature models selected remotely sensed inundation variables. The amount of semi-permanent and permanent floodplain inundation, for example, was both negatively correlated with, and showed the greatest variable importance for wet season flashiness. Further, increases in seasonal floodplain inundation were positively correlated with increases in peak flows. This suggests that the storage of surface water on floodplains is relevant to both flashiness and high flow signatures. In addition, spatial variability in the amount of semi-permanent and permanent non-floodplain water helped explain variability in the baseflow index. These findings suggest that watershed surface water storage dynamics explain a portion of streamflow signature variability. The results underscore the need for protection and restoration of surface water storage systems, such as wetlands, across watersheds.

Received: 18 Apr 2024 – Discussion started: 06 May 2024

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Melanie K. Vanderhoof, Peter Nieuwlandt, Heather E. Golden, Charles R. Lane, Jay R. Christensen, Will Keenan, and Wayana Dolan

Status: closed

RC1:
'Comment on hess-2024-119', Anonymous Referee #1, 09 Jul 2024

Major comments

The main conclusions are based on increases in the explanatory power (R²) from the model including only climate as independent variables (M_climate) to the model including all variables as independent variables (M_all). These increases (Table 5) range from 0 to 10%, which results in adjusted R² increases up to 0.04. These are indeed very low model improvements. To better assess if these increases are not by chance, the authors should include another method for hypothesis testing, computing the p-value and the uncertainty behind these results.

The authors model the maximum annual flow, however, none of the climatic variables are related to high precipitation events (e.g. maximum annual precipitation). Given that peak flow is often linked to peak precipitation and snowmelt (Berghuijs et al., 2016), I suggest that such variables are added to the models. Additionally, why is the maximum annual flow computed for the time scale of 30 days? How do the results change if shorter time scales are also analyzed? (e.g. maximum annual daily flow; 7-day flow).

The baseflow index is usually highly connected to the geology of a catchment (Aboelnour et al., 2021; Bloomfield et al., 2021; Briggs et al., 2022; Carlier et al., 2018). None of the independent variables to model the baseflow index includes geological characteristics. This could potentially change the results and final conclusions obtained in the manuscript.

The title could be modified as it states knowledge that is already well established, with several references in the introduction showing that surface water storage impacts streamflow. It also could be misleading as it does not focus on the mechanisms behind these impacts.

Some of the flow signatures with the strongest influences of inundation variables are related to the temporal variability at the daily scale (flashiness index) rather than to a central measure of magnitude (e.g. MAX30/area, DryMonth/area). Would other flow variables related to the temporal variability be worth investigating? (e.g. CV of MAX30/area).

Minor comments

L26-29. What is precisely “amount of semi-permanent and permanent floodplain inundation” and “increases in seasonal floodplain inundation”? If it refers to the spatial extent, is it in terms of km² or relative to the catchment’s area?

From the Abstract alone, it is hard to get an idea of how significantly water storage influences streamflow. The Abstract would be clearer if it presented some numbers to show how much some key flow signatures are explained by the inundation variables (e.g. differences between M_climate and M_all R²).

Fig. 1 and flow signatures. “annual actual evapotranspiration divided by annual precipitation,” Shouldn’t it be potential evapotranspiration? The aridity index usually considers potential evapotranspiration rather than actual evapotranspiration (e.g., Berghuijs et al., 2017; Gudmundsson et al., 2016; Sawicz et al., 2011). It also looks like something is off because Fig. 1 shows that mean annual actual evaporation is up to 29.8 times higher than mean annual precipitation. Values in Fig. 1 do not match the values in Table 2 (Aridity index); are they the same variable?

Section 2.2. Is the calendar year used for the computation of the hydrologic signatures?

Are the gap-filled inundation time series publicly available? It would be necessary to replicate the study.

What were the precise selection criteria behind the watersheds chosen? Why aren’t more watersheds analyzed, given that there are hundreds (or thousands) of gauges with data available?

L176-177. How precisely is the temperature CV calculated? Is it calculated from monthly temperature values considering the entire time series? The same applies to precipitation CV.

Section 2.3.2 and Table 2. Are the subsurface and topography variables computed using the mean (or median) value of all raster cells within the catchment?

What is the spatial configuration of the inundation variables? It would be great to include maps showing their spatial distribution and a brief discussion. What are the relationships between the inundation variables? E.g. how are they correlated?

Results - Section 3.2 Flashiness signatures. The authors show how the flashiness indices are only weakly correlated with climate, have the models with the lowest R² compared to other flow signatures (Table 4), and have the highest increases in R² when all independent variables are added. I’m guessing – Is one contributor to the weak climate signal the fact that the flashiness index focuses on the daily variability while the climate indices focus on the monthly variability (or the annual mean)? Could this contribution to the lowest M_climate R² indirectly explain the highest potential for R²increases when including all variables?

Table 2; Table 4; Table 6. Is it actual or potential evapotranspiration? Does the coefficient of variation of climate variables refer to the monthly change or annual change? Also, it looks like the authors might have computed the standard deviation instead of the coefficient of variation because of the units of mm, ºC, and the high values. The coefficient of variation is unitless as it is relative to the sample mean.

I believe that the Results section would read better if the sensitivity analysis of the data (Section 3.1) was not the first result presented and discussed, given that this is not the main topic of research.

L301-303. To me, the DryMonth/area bias of -2.2% seems very low for interpreting it as a drier condition than for the longer period, given that the between-year coefficient of variation of flow indices is usually much higher than 2%. The interpretation is up to the authors, but it looks like a very small bias and perhaps not statistically significant. The bias in the baseflow index (-11.8%) might come from a larger volume of high flows (as shown with MAX30/area) rather than lower flows or a drier period. These interpretations would agree with those using the PDSI.

Parts of the results show the Pearson correlation (R) and its associated p-values without stating if the residuals follow the normal distribution or if the authors checked for outliers. It would be good to either state that the authors checked for the correlation assumptions or use Spearman correlation throughout the manuscript.

L340-341. Here the authors refer to R as the Spearman correlation, but in other parts of the manuscript as the Pearson correlation.

How correlated are the indices Precipitation (P), Aridity index (ET/P) and Water demand (P-ET)? They look very highly correlated given the similar results.

Table 6 and associated discussion. Is it possible to know whether these variables are positively or negatively associated with the flow signatures?

Fig. 4 and Fig. 5. The numbers are hard to read because they are in the middle of the plot, mixed with the observations. It would read better if the numbers were displayed on the border of the plots. Fig. 4 shows R as the Spearman correlation but the Results section uses R as the Pearson correlation.

L434-436. Here I believe it should be clarified how much the models improved with the inundation variables, providing some numbers.

L439-441. It will be easier to interpret if the authors clarify whether these inundation variables are significantly correlated with other independent variables, particularly climate.

L452-454. How do geographically isolated wetlands contribute to baseflow? Given that they are isolated, what physical phenomena could explain this contribution? It is surprising to me that geographically isolated wetlands contribute to increasing baseflow while floodplain wetlands don’t. The reasons for this difference could also be discussed.

L524-528. I believe that it should also be clarified how much the inundation variables increased the model’s explanatory power. It would be interesting if the Conclusion section also presented results for the different inundation variables (e.g. floodplain and non-floodplain).

Technical comments

L49. “endanger poverty”. Do the authors mean increase (exacerbate) poverty?

Table 5. There is a parenthesis missing it the column “RMSE LOOCV)”.

Table 6. Are the values shown unitless?

Table 3, caption. I believe that R and p should be italics.

L419. “difference between baseflow and high flow conditions (i.e., (Q10-Q95)/area).”. It looks like it should be the difference between low and high flow conditions.

L473. There’s a mistake in “including like precipitation”.

L479. “seasonal flooding” refers to the seasonal flooding extent? “reduce or otherwise impact” is confusing to me. Do the authors mean that it could also be the other way around? That is, that larger peak discharges cause larger flooding extents?

References

Aboelnour, M. A., Engel, B. A., Frisbee, M. D., Gitau, M. W., and Flanagan, D. C.: Impacts of Watershed Physical Properties and Land Use on Baseflow at Regional Scales, Journal of Hydrology: Regional Studies, 35, 100810, https://doi.org/10.1016/j.ejrh.2021.100810, 2021.
Berghuijs, W. R., Woods, R. A., Hutton, C. J., and Sivapalan, M.: Dominant flood generating mechanisms across the United States, Geophys. Res. Lett., 43, 4382–4390, https://doi.org/10.1002/2016GL068070, 2016.
Berghuijs, W. R., Larsen, J. R., van Emmerik, T. H. M., and Woods, R. A.: A Global Assessment of Runoff Sensitivity to Changes in Precipitation, Potential Evaporation, and Other Factors, Water Resour. Res., 53, 8475–8486, https://doi.org/10.1002/2017WR021593, 2017.
Bloomfield, J. P., Gong, M., Marchant, B. P., Coxon, G., and Addor, N.: How is Baseflow Index (BFI) impacted by water resource management practices?, Hydrology and Earth System Sciences, 25, 5355–5379, https://doi.org/10.5194/hess-25-5355-2021, 2021.
Briggs, M. A., Goodling, P., Johnson, Z. C., Rogers, K. M., Hitt, N. P., Fair, J. B., and Snyder, C. D.: Bedrock depth influences spatial patterns of summer baseflow, temperature and flow disconnection for mountainous headwater streams, Hydrology and Earth System Sciences, 26, 3989–4011, https://doi.org/10.5194/hess-26-3989-2022, 2022.
Carlier, C., Wirth, S. B., Cochand, F., Hunkeler, D., and Brunner, P.: Geology controls streamflow dynamics, Journal of Hydrology, 566, 756–769, https://doi.org/10.1016/j.jhydrol.2018.08.069, 2018.
Gudmundsson, L., Greve, P., and Seneviratne, S. I.: The sensitivity of water availability to changes in the aridity index and other factors-A probabilistic analysis in the Budyko space, Geophys. Res. Lett., 43, 6985–6994, https://doi.org/10.1002/2016GL069763, 2016.
Sawicz, K., Wagener, T., Sivapalan, M., Troch, P. A., and Carrillo, G.: Catchment classification: empirical analysis of hydrologic similarity based on catchment function in the eastern USA, Hydrol. Earth Syst. Sci., 15, 2895–2911, https://doi.org/10.5194/hess-15-2895-2011, 2011.

Citation: https://doi.org/10.5194/hess-2024-119-RC1
- AC1: 'Reply on RC1', Melanie Vanderhoof, 22 Aug 2024
  
  Manuscript: HESS-2024-119
  
  Summary of Revisions:
  We very much appreciate the thoughtful comments made by both reviewers. In response to their comments we (1) added two new climate variables to better represent high flow conditions, (2) added a new geologic variable, (3) updated aridity to now use PET instead of ET, and (4) recalculated precipitation and maximum temperature coefficient of variation to use monthly inputs instead of daily. Because of these changes in potential variables and variable values, we re-ran the hyper-parameterization and variable selection processes, and re-calculated all original and additional statistical tests. Our results are very similar to the original models, but we wanted to make sure both reviewers were aware that the results had been re-run and all tables and figures were updated, correspondingly.
  
  To better evaluate model performance, we added calculations of MSE and AIC, in addition to R², adjusted R² and RMSE. We also added a significance test, using rfPermute, for the inundation variables selected for inclusion in models. Further, we added a new Table to the manuscript consolidating all of the inundation-related data in a single place, to improve the ease with which reviewers can interpret what role the inundation variables played between signatures, models and the correlation analysis.
  
  We have also modified the title, Abstract, Results and Discussion sections to include more numerical values, and more accurately represent the small to moderate improvements in model performance. Lastly, we added a new paragraph to the Discussion section to more thoroughly discuss sources of uncertainty.
  
  We believe that the manuscript is now much improved and we hope the reviewers find the revised manuscript suitable for publication.
  
  Reviewer #1
  Major comments
  
  Comment: The main conclusions are based on increases in the explanatory power (R²) from the model including only climate as independent variables (M_climate) to the model including all variables as independent variables (M_all). These increases (Table 5) range from 0 to 10%, which results in adjusted R² increases up to 0.04. These are indeed very low model improvements. To better assess if these increases are not by chance, the authors should include another method for hypothesis testing, computing the p-value and the uncertainty behind these results.
  Response: To better be able to compare model performance, in addition to the R², adjusted R²and RMSE values, we added mean square error (MSE) and Akaike information criterion (AIC). We also added a new table (Table 6) to consolidate our characterization of the potential influence of inundation dynamics. This includes testing if the indices are significantly correlated with an inundation variable, if inundation variables were selected for the model using the forward step-wise variable selection process, and if the variables were significant (p<0.01) in the models, as well as 5 statistical measures comparing M_Climate to M_All.
  
  Regarding the request for hypothesis testing, the impact of surface water storage on discharge is known to be subtle and hard to isolate. For example, annual precipitation showed a Spearman correlation value of R=0.75 with fraction of watershed showing seasonal floodplain inundation. This is because precipitation impacts not just discharge, but also surface water dynamics. This means that the climate only model is not an independent model that is able to exclude the influence of surface water. We have added additional text to the Discussion section to further discuss this important challenge. Specifically,
  
  “In cases where variables can be isolated (e.g., basins with tile drainage, compared to basins without tile drainage), significant differences between models can be an appropriate mechanism to help quantify the impact of a variable (Rainio et al. 2024). But both discharge and surface water extent tend to be a function of climate inputs and catchment characteristics (Heimhuber et al., 2016; Vanderhoof et al., 2018). Consequently, our inundation variables were significantly correlated with not only catchment characteristics such as depth to bedrock, slope, and topographic diversity, but also climate variables, including annual precipitation, aridity, and the rainfall and runoff factor (Table A4), with the highest correlation occurring between the amount of seasonal inundation on the floodplain and the watershed rainfall and runoff factor (R=0.80, p<0.01) (Table A4). Because our inundation variables were significantly correlated with select climate variables, the M_Climate could not be considered a null model, relative to M_All, and therefore comparing variables selected, variable significance and importance as well as model improvement using evaluation metrics was seen as most appropriate.”
  
  Comment: The authors model the maximum annual flow, however, none of the climatic variables are related to high precipitation events (e.g. maximum annual precipitation). Given that peak flow is often linked to peak precipitation and snowmelt (Berghuijs et al., 2016), I suggest that such variables are added to the models. Additionally, why is the maximum annual flow computed for the time scale of 30 days? How do the results change if shorter time scales are also analyzed? (e.g. maximum annual daily flow; 7-day flow).
  Response: To better reflect high precipitation conditions we added (1) a Rainfall and Runoff Factor, or rainfall intensity, which reflects the amount of rainfall and peak intensity of each storm, and (2) a maximum monthly precipitation variable. Six of the models, including both flashiness models, selected rainfall intensity for inclusion. Since snowmelt only occurs in about ½ of the watersheds, we did not include a SWE specific variable, but the influence of SWE is partially reflected in the DAYMET precipitation CV and seasonality variables. Variable selection and models were re-run with the additional potential variables. In earlier exploratory efforts we included both MAX30/area and MAX7/area but our results were very similar between the two variables and so we only retained MAX30/area to improve the manuscript’s readability.
  
  Comment: The baseflow index is usually highly connected to the geology of a catchment (Aboelnour et al., 2021; Bloomfield et al., 2021; Briggs et al., 2022; Carlier et al., 2018). None of the independent variables to model the baseflow index includes geological characteristics. This could potentially change the results and final conclusions obtained in the manuscript.
  Response: The soil and geologic variable included in the analysis were derived from the GAGES-II (GAGES-II: Geospatial Attributes of Gages for Evaluating Streamflow - ScienceBase-Catalog), which does not contain complete information on aquifers. However, we currently have percent clay, percent sand, average soil thickness, which reflects the depth to bedrock, and annual minimum depth to water table. We added silt fraction and geologic permeability as new variables and re-ran the variable selection process. Our variable list includes many of the variables that Bloomfield et al (2021) reported helped explain the Baseflow Index, including clay fraction, crop cover, topography, and aridity. Briggs et al. (2022) cited bedrock depth as the primary explanatory variable, which we have included as average soil thickness. Further, Aboelnour et al. (2021) found soil groups and precipitation played key roles in explaining baseflow, and our inclusion of climate variables as well as soil variables is consistent with this paper. In addition to adding the two additional soil and geology variables, we have relabeled soil thickness to “depth to bedrock” and also added a comment to the Discussion section to further note the possible uncertainty. Specifically:
  
  “Our findings may also depend on the variables included in the analysis. While we included diverse climate and catchment characteristics, it is possible that additional catchment variables, such as data on aquifers (Bloomfield et al., 2021) or additional geologic characteristics, such as proportion sandstone (Carlier et al. 2018), could improve the explanatory power of certain hydrologic signatures, like baseflow index and reduce our model uncertainty.”
  
  Comment: The title could be modified as it states knowledge that is already well established, with several references in the introduction showing that surface water storage impacts streamflow. It also could be misleading as it does not focus on the mechanisms behind these impacts.
  Response: We have changed the title to, “Integrating remotely sensed surface water dynamics in hydrologic signature modelling”
  
  Comment: Some of the flow signatures with the strongest influences of inundation variables are related to the temporal variability at the daily scale (flashiness index) rather than to a central measure of magnitude (e.g. MAX30/area, DryMonth/area). Would other flow variables related to the temporal variability be worth investigating? (e.g. CV of MAX30/area).
  Response: Yes, we agree that it would be worthwhile to explore how inundation variables might impact a wide number of hydrologic signatures. We were concerned that including too many hydrologic signatures in a single paper could make it more challenging for readers to interpret. We originally tested several CV signatures, including CV of dry season and annual CV. Annual CV showed results similar to the flashiness signatures. We ultimately excluded the CV signatures as we decided that they were more challenging to interpret hydrologically, since the variability could reflect episodic or seasonal variability. In the Discussion section we have added a comment in acknowledgement of this point of concern:
  
  “Our results, for instance, could depend on the hydrologic signatures included in the analysis (McMillan et al., 2021). It is possible that inundation has a greater or lesser influence on different aspects of the flow regime than those explored here. For the hydrologic signatures considered, such signatures can show substantial uncertainty, attributable to error in precipitation and discharge datasets (Westerberg and McMillan, 2015). To account for uncertainty, the hydrologic signatures were calculated annually, and then averaged across multiple years, while independent variables were averaged over multiple years and across each watershed, both steps that have been shown to reduce uncertainty (Westerberg and McMillan, 2015).”
  
  Minor comments
  
  L26-29. What is precisely “amount of semi-permanent and permanent floodplain inundation” and “increases in seasonal floodplain inundation”? If it refers to the spatial extent, is it in terms of km² or relative to the catchment’s area?
  Response: We revised the abstract to clarify that this refers to the proportion or fraction of a watershed with that inundation type extent.
  
  Comment: From the Abstract alone, it is hard to get an idea of how significantly water storage influences streamflow. The Abstract would be clearer if it presented some numbers to show how much some key flow signatures are explained by the inundation variables (e.g. differences between M_climate and M_all R²).
  Response: We have edited the abstract to include more numerical information on the findings.
  
  Comment: Fig. 1 and flow signatures. “annual actual evapotranspiration divided by annual precipitation,” Shouldn’t it be potential evapotranspiration? The aridity index usually considers potential evapotranspiration rather than actual evapotranspiration (e.g., Berghuijs et al., 2017; Gudmundsson et al., 2016; Sawicz et al., 2011). It also looks like something is off because Fig. 1 shows that mean annual actual evaporation is up to 29.8 times higher than mean annual precipitation. Values in Fig. 1 do not match the values in Table 2 (Aridity index); are they the same variable?
  Response: We updated aridity to integrate TerraClimate’s PET variable, so that aridity is now calculated using PET/PR. We updated the text in section 2.3.1, correlations and models, and Figure 1 correspondingly. Regarding Figure 1, the aridity values in Figure 1 are identical to the aridity values in Table 2. The maximum aridity values in CONUS occur in the Sonoran desert, well outside of our study basins. Using PET/PR the values are in the 60’s in the Sonoran desert. The challenge in providing a map of aridity, is that the values are highly skewed, so that the minimum and maximum values are not very representative. Therefore, we changed the legend number labels in Figure 1 to reflect the median values of each color instead of the extreme minimum and maximum values and updated the caption to note this change in labeling.
  
  Section 2.2. Is the calendar year used for the computation of the hydrologic signatures?
  Response: Yes, we now clarify in the text that the calendar year, not the water year, was used to compute the hydrologic signatures.
  
  Comment: Are the gap-filled inundation time series publicly available? It would be necessary to replicate the study.
  Response: We have now released the associated inundation data. The data release includes a raster of each watershed with a spatially explicit categorization of inundation dynamics (2016-2023) corresponding to the inundation categories included in the analysis. This will allow for the replication of the surface water data. The rest of the data used in the analysis is publicly available. We have updated the data availability section.
  
  Vanderhoof, M.K., Nieuwlandt, P., Golden, H., Lane, C., Christensen, J.R. (2024) Data release for surface water storage influences streamflow signatures across the United States (2017-2021). U.S. Geological Survey data release, Sciencebase, https://doi.org/10.5066/P9RLFMEQ.
  
  Comment: What were the precise selection criteria behind the watersheds chosen? Why aren’t more watersheds analyzed, given that there are hundreds (or thousands) of gauges with data available?
  Response: We recognize that relying on the CAMEL dataset (n=671, watershed size ranges from 1 to 25,800 km2), could in theory greatly increase our sample size. However, we are also well aware that watershed size has a strong influence on runoff responses, therefore one of our primary objectives was to control for watershed size (avoiding very small and very large watersheds) and avoid nested watersheds. The watersheds included in our analysis were predominantly between 1500 and 5000 km². We note that only 74 of the CAMEL watersheds fall within this size range (348 within the 300 km² to 10,000 km² size range).
  
  Additionally, the primary objective of the analysis was to incorporate novel, remotely sensed surface water variables reflecting both location (floodplain, non-floodplain) as well as hydroperiod (temporary, seasonal, semi-permanent to permanent). The advantage of utilizing the Sentinel-1 and Sentinel-2 water algorithms, is that they map vegetated water, which is where most of the temporal dynamics or variability occurs, but is excluded from most available surface water products, which typically map only open water. Further, the use of Sentinel-1 allows us to bypass cloud cover and provide more accurate estimates of hydroperiod that do not reflect data gaps due to clouds. However, deriving these variables was extremely time intensive, which limited the number of basins that we could look at.
  
  We revised section 2.1 to clarify that selecting non-nested watershed within a bounded size class was the primary selection criteria, followed by other criteria including considering the accuracy of the surface water algorithms, location of major dams, and exclusion of tidal wetlands.
  
  L176-177. How precisely is the temperature CV calculated? Is it calculated from monthly temperature values considering the entire time series? The same applies to precipitation CV.
  Response: The CV variables were originally calculated from daily data, but upon further reflection, we felt that monthly data would be more appropriate for precipitation, therefore we re-calculated both the temperature and precipitation CV from monthly data. We have also added this detail to section 2.3.1.
  
  Section 2.3.2 and Table 2. Are the subsurface and topography variables computed using the mean (or median) value of all raster cells within the catchment?
  Response: The mean value was reported, this detail has been added or clarified in section 2.3.2.
  
  Comment: What is the spatial configuration of the inundation variables? It would be great to include maps showing their spatial distribution and a brief discussion. What are the relationships between the inundation variables? E.g. how are they correlated?
  Response: Although it isn’t feasible to show the spatial distribution for all of the basins, we added a new Figure (now Figure 3), showing examples of variability in inundation between watersheds. We also modified now Figure 5 to show spatial inundation examples of watersheds representing the low and high ends of each scatter plot. Correlations between inundation variables ranged from 0 to 0.98 with a median correlation value of 0.35. Since we have quite a few Appendix tables already, we decided not to include a new table showing the correlations between the inundation variables, but correlation limits in the variable selection process limit the inclusion of highly correlated variables.
  
  Results - Section 3.2 Flashiness signatures. The authors show how the flashiness indices are only weakly correlated with climate, have the models with the lowest R² compared to other flow signatures (Table 4), and have the highest increases in R² when all independent variables are added. I’m guessing – Is one contributor to the weak climate signal the fact that the flashiness index focuses on the daily variability while the climate indices focus on the monthly variability (or the annual mean)? Could this contribution to the lowest M_climate R² indirectly explain the highest potential for R²increases when including all variables?
  Response: Rainfall intensity was selected in 3 of the 4 flashiness models. After including the additional climate variables for consideration, the R² for the Flashiness index M_Climate increased from 0.50 to 0.54 but the M_All increased from 0.55 to 0.61 so the addition of improved climate conditions does not seem to have impacted the contribution of the inundation variables.
  
  Table 2; Table 4; Table 6. Is it actual or potential evapotranspiration? Does the coefficient of variation of climate variables refer to the monthly change or annual change? Also, it looks like the authors might have computed the standard deviation instead of the coefficient of variation because of the units of mm, ºC, and the high values. The coefficient of variation is unitless as it is relative to the sample mean.
  Response: Evapotranspiration uses “eto” with is actual evapotranspiration using grass as a reference vegetation. This is now clarified in the Methods. Aridity has now been updated to be calculated from PET and P. The units on the coefficient of variations have been corrected to be unitless and were re-calculated using monthly data instead of daily data. The values have been updated in Table 2.
  
  Comment: I believe that the Results section would read better if the sensitivity analysis of the data (Section 3.1) was not the first result presented and discussed, given that this is not the main topic of research.
  Response: We moved section 3.1 to the Methods section, since the sensitivity analysis is helping to justify decision points related to the Methods. This allows the Results to focus on the main topic of research.
  
  L301-303. To me, the DryMonth/area bias of -2.2% seems very low for interpreting it as a drier condition than for the longer period, given that the between-year coefficient of variation of flow indices is usually much higher than 2%. The interpretation is up to the authors, but it looks like a very small bias and perhaps not statistically significant. The bias in the baseflow index (-11.8%) might come from a larger volume of high flows (as shown with MAX30/area) rather than lower flows or a drier period. These interpretations would agree with those using the PDSI.
  Response: We agree, and have revised this sentence accordingly: “Additionally, while the DryMonth/area bias was minimal, the baseflow index showed a relative bias of -11.8%, potentially reflecting a higher volume of water coming from high flows within the 8-year period, relative to the longer period (Table 2).”
  
  Comment: Parts of the results show the Pearson correlation (R) and its associated p-values without stating if the residuals follow the normal distribution or if the authors checked for outliers. It would be good to either state that the authors checked for the correlation assumptions or use Spearman correlation throughout the manuscript.
  AND
  L340-341. Here the authors refer to R as the Spearman correlation, but in other parts of the manuscript as the Pearson correlation.
  Response: Pearson was used only to correlate the hydrologic indices over the two different time periods, since these correlations were linear and the hydrologic indices had already been tested for normality. In this case, all assumptions were met for using a Pearson correlation. When correlating the independent and dependent variables with each other, we used Spearman correlation with a Bonferroni correction to account for non-normal distributions in the independent variables. To reduce confusion, we retained the identification of the specific correlation tests used in the Methods but eliminated mention of the statistical test in the Results and Appendix sections.
  
  Comment: How correlated are the indices Precipitation (P), Aridity index (ET/P) and Water demand (P-ET)? They look very highly correlated given the similar results.
  Response: Annual precipitation has a Spearman rank correlation value of R = 0.82 with water demand, and a correlation of R = -0.79 with the updated aridity index (PET/P). The updated aridity shows a correlation of R=-0.95 with water demand. Correlation between variables is considered and limited within the model variable selection process.
  
  Comment: Table 6 and associated discussion. Is it possible to know whether these variables are positively or negatively associated with the flow signatures?
  Response: As random forest models do not produce regression coefficients that can inform the directionality of the relationship, partial dependence plots are the primary mechanism through which directionality (i.e., positive, negative) can be displayed. However, as the relationship is not assumed to be linear, the directionality is not necessarily consistent over the full range of the variable. Therefore discerning directionality from Table 4, where each signature is correlated with each variable, is the most straightforward mechanism to interpret patterns of directionality.
  
  Fig. 4 and Fig. 5. The numbers are hard to read because they are in the middle of the plot, mixed with the observations. It would read better if the numbers were displayed on the border of the plots. Fig. 4 shows R as the Spearman correlation but the Results section uses R as the Pearson correlation.
  Response: We moved the x-axis and y-axis labels to for both Fig 4 and Fig 5 to improve readability. Spearman correlations were used for everything, with the exception of correlating the longer-term hydrologic signatures with the shorter-term hydrologic signatures, which is the only place where Pearson correlation was used.
  
  L434-436, Discussion. Here I believe it should be clarified how much the models improved with the inundation variables, providing some numbers.
  Response: We have revised the Discussion section to include more numerical values.
  
  L439-441, Discussion. It will be easier to interpret if the authors clarify whether these inundation variables are significantly correlated with other independent variables, particularly climate.
  Response: We now clarify in the Discussion about the correlations between the inundation and climate variables and how this influenced our interpretation of the analysis.
  
  L452-454, Discussion. How do geographically isolated wetlands contribute to baseflow? Given that they are isolated, what physical phenomena could explain this contribution? It is surprising to me that geographically isolated wetlands contribute to increasing baseflow while floodplain wetlands don’t. The reasons for this difference could also be discussed.
  Response: Geographically isolated wetlands are still often very much connected via wetland-groundwater interactions. The term only refers to a lack of surface water connections. A strong description of the physical processes are provided in McLaughlin et al (2014), but essentially differences in specific yield between uplands and GIWs leads to frequent reversals in hydraulic gradients that results in GIWs acting as both groundwater sinks and sources. This process has also been demonstrated in multiple independent modeling efforts (Evenson et al., 2015; Ameli and Creed, 2017; Blanchette et al., 2019; Yeo et al., 2019). We expanded our explanation of the physical process in the Discussion section.
  
  L524-528, Conclusion. I believe that it should also be clarified how much the inundation variables increased the model’s explanatory power. It would be interesting if the Conclusion section also presented results for the different inundation variables (e.g. floodplain and non-floodplain).
  Response: We revised the Conclusion section to include some numerical values, as well as re-frame the results to focus on the floodplain inundation.
  
  Technical comments
  
  L49. “endanger poverty”. Do the authors mean increase (exacerbate) poverty?
  Response: The phrase currently used is “endanger property”.
  
  Table 5. There is a parenthesis missing it the column “RMSE LOOCV)”.
  Response: We have revised this column title and the remaining parenthesis has been removed.
  
  Table 6. Are the values shown unitless?
  Response: Yes, the values shown are unitless.
  
  Table 3, caption. I believe that R and p should be italics.
  Response: R and p have been italicized.
  
  L419. “difference between baseflow and high flow conditions (i.e., (Q10-Q95)/area).”. It looks like it should be the difference between low and high flow conditions.
  Response: Changed as recommended.
  
  L473. There’s a mistake in “including like precipitation”.
  Response: Sentence has been corrected.
  
  L479. “seasonal flooding” refers to the seasonal flooding extent? “reduce or otherwise impact” is confusing to me. Do the authors mean that it could also be the other way around? That is, that larger peak discharges cause larger flooding extents?
  Response: Sentence has been revised to clarify “seasonal flooding extent” and revised to improve clarity.
  
  References
  
  Aboelnour, M. A., Engel, B. A., Frisbee, M. D., Gitau, M. W., and Flanagan, D. C.: Impacts of Watershed Physical Properties and Land Use on Baseflow at Regional Scales, Journal of Hydrology: Regional Studies, 35, 100810, https://doi.org/10.1016/j.ejrh.2021.100810, 2021.
  Berghuijs, W. R., Woods, R. A., Hutton, C. J., and Sivapalan, M.: Dominant flood generating mechanisms across the United States, Geophys. Res. Lett., 43, 4382–4390, https://doi.org/10.1002/2016GL068070, 2016.
  Berghuijs, W. R., Larsen, J. R., van Emmerik, T. H. M., and Woods, R. A.: A Global Assessment of Runoff Sensitivity to Changes in Precipitation, Potential Evaporation, and Other Factors, Water Resour. Res., 53, 8475–8486, https://doi.org/10.1002/2017WR021593, 2017.
  Bloomfield, J. P., Gong, M., Marchant, B. P., Coxon, G., and Addor, N.: How is Baseflow Index (BFI) impacted by water resource management practices?, Hydrology and Earth System Sciences, 25, 5355–5379, https://doi.org/10.5194/hess-25-5355-2021, 2021.
  Briggs, M. A., Goodling, P., Johnson, Z. C., Rogers, K. M., Hitt, N. P., Fair, J. B., and Snyder, C. D.: Bedrock depth influences spatial patterns of summer baseflow, temperature and flow disconnection for mountainous headwater streams, Hydrology and Earth System Sciences, 26, 3989–4011, https://doi.org/10.5194/hess-26-3989-2022, 2022.
  Carlier, C., Wirth, S. B., Cochand, F., Hunkeler, D., and Brunner, P.: Geology controls streamflow dynamics, Journal of Hydrology, 566, 756–769, https://doi.org/10.1016/j.jhydrol.2018.08.069, 2018.
  Gudmundsson, L., Greve, P., and Seneviratne, S. I.: The sensitivity of water availability to changes in the aridity index and other factors-A probabilistic analysis in the Budyko space, Geophys. Res. Lett., 43, 6985–6994, https://doi.org/10.1002/2016GL069763, 2016.
  Sawicz, K., Wagener, T., Sivapalan, M., Troch, P. A., and Carrillo, G.: Catchment classification: empirical analysis of hydrologic similarity based on catchment function in the eastern USA, Hydrol. Earth Syst. Sci., 15, 2895–2911, https://doi.org/10.5194/hess-15-2895-2011, 2011.
  
  Citation: https://doi.org/10.5194/hess-2024-119-AC1
RC2:
'Comment on hess-2024-119', Anonymous Referee #2, 12 Jul 2024
This is a review of the manuscript “Surface water storage influences streamflow signatures”, authored by Vanderhoof et al. and submitted to Hydrology and Earth System Sciences (HESS). The paper explores the relationship between streamflow signatures and climate, land cover, geology, topography, and surface water storage variables using random forest (RF) models for 72 catchments in the contiguous United States (CONUS). The study addresses an important subject in an important study area and should interest HESS readers. The study has promising results, however, there are methodological issues that should be carefully addressed before publication. Please, find below some suggestions that I hope will improve the overall reliability of the manuscript.

GENERAL COMMENTS
The manuscript is well-written and sound. The introduction is informative of the study’s scope and context; the RF approach is very interesting.

The general quality of the figures is good, but some improvements can help enhance the interpretability of the results. Please, refer to specific point-by-point comments.

The title and abstract are not representative of the study. The title expresses well-known information about the relationship between streamflow signatures and surface water storage which is mentioned many times in the introduction. It states a strong conclusion that, in my understanding, is not supported by the results. The abstract could be improved if some numerical information about the findings were provided, such as differences in the explanatory power of the model after the inclusion of the wetlands and inundation variables.

If I understand correctly, all of the 72 catchments are pooled together. For instance, the pooling will necessarily pool together information on small and large catchments, and also on catchments with distinct runoff generation mechanisms, and hence, with distinct responses to the explanatory variables. This pooling would result in a large variability in the hydrological signatures, which combined with a quite limited sample size (e.g., 72 samples), could result in biased estimates of the model parameters and large uncertainties. How is this issue addressed in the analysis? Please: (i) provide additional analysis; or (ii) clearly mention this as a potential limitation of the study, which currently is little discussed in the manuscript.

The use of the random forest (RF) model is very interesting to address the research questions. However, the output from the RF model provided in the manuscript is simply the R-squared and the relative importance of each explanatory variable. This is a very limited result which is not a meaningful interpretation of the hydrological process – especially taking into account all the effort employed in the data curation. I think that the RF is a very useful tool for a first-step analysis, but complementary analysis must be done for a more meaningful interpretation of the results.

The overall improvement in the model's performance due to the inclusion of wetlands and inundation variables is very limited since improvements are up to 10% and in most cases are less than 6%. The manuscript’s results do not corroborate the conclusions, and it is unclear if the results are indeed meaningful from a hydrological process point of view. I think that would be very useful if p-values were provided to understand if these results are indeed significant.

There are a lot of sources of uncertainties in the methodology related to both the streamflow signatures (see Westerberg and McMillan, 2015) and also due to the modeling approach which are not accounted for in the manuscript. Those uncertainties may be large and hence the results can be misleading. This issue makes the results even less reliable and is timidly discussed in the manuscript. Please, provide uncertainty estimation or mention this as a critical limitation of the study.

I was wondering about the choice of some explanatory variables. For instance, the 30-day annual maximum streamflow signatures are related to the extreme streamflow regime. In this case, it would be interesting to include explanatory variables related to extreme streamflow dynamics. Snow melting is indirectly accounted for by temperature, however, relevant variables such as extreme precipitation and antecedent wetness – based on soil moisture or antecedent precipitation (see Lun et al., 2021; Merz and Blöschl, 2009) are omitted in the model. Also, are the results sensitive to the choice of the 30-day window?

The discussion provided in the manuscript is very limited and mostly based on a literature review rather than the paper results, which makes the discussion most speculative.

SPECIFIC COMMENTS
Lines 156-158. It is not clear whether the baseflow index is derived from the UFSF, which is based on model simulations, or whether it was just calculated in a similar fashion. Please clarify this information in the revised manuscript.

Lines 377-381. Figure 3. It is difficult to follow the results shown in Fig. 3, especially for (c), (d), and (e) frames due to the units used.

Lines 399-408. Figures 4 and 5. The overall quality of Fig. 4 and 5 should be improved, the overlapping of axis labels with other graphical elements makes the interpretation difficult.

Lines 163-165. It is not clear which correlation method was used. For instance, in the main text, the Pearson correlation is mentioned, however, in tables and figure captions the Spearman is mentioned. Please clarify this information in the revised manuscript. If the Pearson correlation was indeed used, please check the assumptions for its use. Also, which test is used for the correlation significance? The Pearson/Spearman correlation itself is not a statistical test.

Lines 308-310. Indeed, the relative variations in hydrologic signature values between the long-term flow records (24 years) compared to the study period (8 years) are quite similar. This would be a pragmatic decision based on Sentinel data availability rather than a “solid justification”.

TECHNICAL COMMENTS

Lines 387-390. Table 5. There is a missing parenthesis in the RMSE of leave-one-out cross-validation.

REFERENCES

Lun, D., Viglione, A., Bertola, M., Komma, J., Parajka, J., Valent, P., and Blöschl, G.: Characteristics and process controls of statistical flood moments in Europe – a data-based analysis, Hydrology and Earth System Sciences, 25, 5535–5560, https://doi.org/10.5194/hess-25-5535-2021, 2021.

Merz, R., & Blöschl, G.: Process controls on the statistical flood moments - a data based analysis. Hydrological Processes, 23, 675–696, 10.1002/hyp.7168, 2009.

Westerberg, I. K. and McMillan, H. K.: Uncertainty in hydrological signatures, Hydrology and Earth System Sciences, 19, 3951–3968, https://doi.org/10.5194/hess-19-3951-2015, 2015.
Citation: https://doi.org/10.5194/hess-2024-119-RC2
- AC2: 'Reply on RC2', Melanie Vanderhoof, 22 Aug 2024
  
  Manuscript: HESS-2024-119
  
  Summary of Revisions:
  We very much appreciate the thoughtful comments made by both reviewers. In response to their comments we (1) added two new climate variables to better represent high flow conditions, (2) added a new geologic variable, (3) updated aridity to now use PET instead of ET, and (4) recalculated precipitation and maximum temperature coefficient of variation to use monthly inputs instead of daily. Because of these changes in potential variables and variable values, we re-ran the hyper-parameterization and variable selection processes, and re-calculated all original and additional statistical tests. Our results are very similar to the original models, but we wanted to make sure both reviewers were aware that the results had been re-run and all tables and figures were updated, correspondingly.
  
  To better evaluate model performance, we added calculations of MSE and AIC, in addition to R², adjusted R² and RMSE. We also added a significance test, using rfPermute, for the inundation variables selected for inclusion in models. Further, we added a new Table to the manuscript consolidating all of the inundation-related data in a single place, to improve the ease with which reviewers can interpret what role the inundation variables played between signatures, models and the correlation analysis.
  
  We have also modified the title, Abstract, Results and Discussion sections to include more numerical values, and more accurately represent the small to moderate improvements in model performance. Lastly, we added a new paragraph to the Discussion section to more thoroughly discuss sources of uncertainty.
  
  We believe that the manuscript is now much improved and we hope the reviewers find the revised manuscript suitable for publication.
  
  Reviewer #2
  Summary Comment: This is a review of the manuscript “Surface water storage influences streamflow signatures”, authored by Vanderhoof et al. and submitted to Hydrology and Earth System Sciences (HESS). The paper explores the relationship between streamflow signatures and climate, land cover, geology, topography, and surface water storage variables using random forest (RF) models for 72 catchments in the contiguous United States (CONUS). The study addresses an important subject in an important study area and should interest HESS readers. The study has promising results, however, there are methodological issues that should be carefully addressed before publication. Please, find below some suggestions that I hope will improve the overall reliability of the manuscript.
  Response: We really appreciate your thoughtful and thorough review and comments. Addressing them has improved the quality of the manuscript. Please see detailed responses to all specific comments below.
  
  General Comments
  
  Comment: The manuscript is well-written and sound. The introduction is informative of the study’s scope and context; the RF approach is very interesting.
  Response: Thank you!
  
  Comment: The general quality of the figures is good, but some improvements can help enhance the interpretability of the results. Please, refer to specific point-by-point comments.
  Response: We have addressed all specific comments below.
  
  Comment: The title and abstract are not representative of the study. The title expresses well-known information about the relationship between streamflow signatures and surface water storage which is mentioned many times in the introduction. It states a strong conclusion that, in my understanding, is not supported by the results.
  Response: We have changed the title to, “Integrating remotely sensed surface water dynamics in hydrologic signature modelling”
  
  Comment: The abstract could be improved if some numerical information about the findings were provided, such as differences in the explanatory power of the model after the inclusion of the wetlands and inundation variables.
  Response: We have edited the abstract to include more numerical information on the findings.
  
  Comment: If I understand correctly, all of the 72 catchments are pooled together. For instance, the pooling will necessarily pool together information on small and large catchments, and also on catchments with distinct runoff generation mechanisms, and hence, with distinct responses to the explanatory variables. This pooling would result in a large variability in the hydrological signatures, which combined with a quite limited sample size (e.g., 72 samples), could result in biased estimates of the model parameters and large uncertainties. How is this issue addressed in the analysis? Please: (i) provide additional analysis; or (ii) clearly mention this as a potential limitation of the study, which currently is little discussed in the manuscript.
  Response: Yes, we are modeling spatial variability in the hydrologic signatures and therefore the 72 catchments are pooled together. We are aware that watershed size influences runoff responses, therefore we (1) divided signatures by area where necessary to help control for variability in watershed size, and (2) limited the range of watershed sizes included (80% were between 1500 and 5000 km², and nested watersheds were avoided).
  
  Our sample size was primarily limited by the time intensiveness of generating the remotely sensed water variables. The primary objective of the analysis was to incorporate novel, remotely sensed surface water variables reflecting both location (floodplain, non-floodplain) as well as hydroperiod (temporary, seasonal, semi-permanent to permanent). The advantage of utilizing the Sentinel-1 and Sentinel-2 water algorithms, is that they map vegetated water, which is where most of the temporal dynamics or variability occurs, but is excluded from most available surface water products, which typically map only open water. Further, the use of Sentinel-1 allows us to bypass cloud cover and provide more accurate estimates of hydroperiod that do not reflect data gaps due to clouds. However, deriving these variables was extremely time intensive, which limited the number of basins that we could look at. We have added text to the Discussion section to more clearly articulate uncertainty that could be attributable to sample size.
  
  Comment: The use of the random forest (RF) model is very interesting to address the research questions. However, the output from the RF model provided in the manuscript is simply the R-squared and the relative importance of each explanatory variable. This is a very limited result which is not a meaningful interpretation of the hydrological process – especially taking into account all the effort employed in the data curation. I think that the RF is a very useful tool for a first-step analysis, but complementary analysis must be done for a more meaningful interpretation of the results.
  Response: To better be able to compare model performance, in addition to the adjusted R²values and RMSE, we added mean square error (MSE) and Akaike information criterion (AIC). We also now use rfPermute to calculate the significance of the inundation variables, as well as the estimated increase in MSE with their exclusion. We also added a new table (Table 6) to consolidate the statistics related to the inundation variables. This new table includes a (1) summary of the percent change of all 5 model performance comparisons, (2) which inundation variables were selected in the models after using the forward step-wise variable selection process, (3) the statistical significance and MSE for the inundation variables, and (4) which inundation variables were significantly correlated with each of the signatures.
  
  The impact of surface water storage on discharge is known to be subtle and hard to isolate. For example, annual precipitation showed a Spearman correlation value of R=0.75 with fraction of watershed showing seasonal floodplain inundation. This is because precipitation impacts not just discharge, but also surface water dynamics. This means that the climate only model is not an independent model that is able to exclude the influence of surface water. We have added additional text to the Discussion section to further discuss this important challenge. Specifically,
  
  “In cases where variables can be isolated (e.g., basins with tile drainage, compared to basins without tile drainage), significant differences between models can be an appropriate mechanism to help quantify the impact of a variable (Rainio et al. 2024). But both discharge and surface water extent tend to be a function of climate inputs and catchment characteristics (Heimhuber et al., 2016; Vanderhoof et al., 2018). Consequently, our inundation variables were significantly correlated with not only catchment characteristics such as depth to bedrock, slope, and topographic diversity, but also climate variables, including annual precipitation, aridity, and the rainfall and runoff factor (Table A4), with the highest correlation occurring between the amount of seasonal inundation on the floodplain and the watershed rainfall and runoff factor (R=0.80, p<0.01) (Table A4). Because our inundation variables were significantly correlated with select climate variables, the M_Climate could not be considered a null model, relative to M_All, and therefore comparing variables selected, variable significance and importance as well as model improvement using evaluation metrics was seen as most appropriate.”
  
  Comment: The overall improvement in the model's performance due to the inclusion of wetlands and inundation variables is very limited since improvements are up to 10% and in most cases are less than 6%. The manuscript’s results do not corroborate the conclusions, and it is unclear if the results are indeed meaningful from a hydrological process point of view. I think that would be very useful if p-values were provided to understand if these results are indeed significant.
  Response: We agree that improvement in the models was moderate at best. Recognizing the challenge of interpreting whether results are meaningful from a hydrological processes perspective, we have revised the title of the paper to, “Integrating remotely sensed surface water dynamics into hydrologic signature models”. We have also revised the language in the Abstract, Results and Discussion to better represent the findings. We now report multiple model evaluation metrics, and better articulate the weight (or lack of) evidence (e.g., new Table 6).
  
  In the Discussion we have re-focused part of the discussion from contextualizing with the literature in general, to instead, look at the inundation variables selected in the models and explore how hydrologically reasonable those specific variables are. We have also revised the Discussion section to interpret the results more conservatively and re-focused part of the discussion from contextualizing with the literature in general, to instead, look at the inundation variables selected in the models and explore how hydrologically reasonable those specific variables are. We also added a new paragraph to the Discussion to discuss sources of uncertainty.
  
  Regarding the request for p-values, we are assuming that the reviewers are looking for a statistical test comparing whether differences between models are significant. The impact of surface water storage on discharge is known to be subtle and hard to isolate. For example, annual precipitation showed a Spearman correlation value of R=0.75 with fraction of watershed showing seasonal floodplain inundation. This is because precipitation impacts not just discharge, but also surface water dynamics. This means that the climate only model is not an independent model that is able to exclude the influence of surface water and would complicate the interpretation of a significance test. However, we have added the p-values for the individual models, as well as p-values for the inundation variables selected for model inclusion. We have also added additional text to the Discussion section to further discuss this important challenge.
  
  Comment: There are a lot of sources of uncertainties in the methodology related to both the streamflow signatures (see Westerberg and McMillan, 2015) and also due to the modeling approach which are not accounted for in the manuscript. Those uncertainties may be large and hence the results can be misleading. This issue makes the results even less reliable and is timidly discussed in the manuscript. Please, provide uncertainty estimation or mention this as a critical limitation of the study.
  Response: We have added the paragraph below to the Discussion section to better discuss sources of uncertainty, and now reference Westerberg and McMillan (2015):
  
  “4.2 Sources of Uncertainty
  Modelling hydrologic signatures to evaluate the relative influence of drivers on hydrologic responses has many potential sources of uncertainty. Our results, for instance, could depend on the hydrologic signatures included in the analysis (McMillan et al., 2021). It is possible that inundation has a greater or lesser influence on different aspects of the flow regime than those explored here. For the hydrologic signatures considered, such signatures can show substantial uncertainty, attributable to error in precipitation and discharge datasets (Westerberg and McMillan, 2015). To account for uncertainty, the hydrologic signatures were calculated annually, and then averaged across multiple years, while independent variables were averaged over multiple years and across each watershed, both steps that have been shown to reduce uncertainty (Westerberg and McMillan, 2015). Our findings may also depend on the variables included in the analysis. While we included diverse climate and catchment characteristics, it is possible that additional catchment variables, such as data on aquifers (Bloomfield et al., 2021) or additional geologic characteristics, such as proportion sandstone (Carlier et al. 2018), could improve the explanatory power of certain hydrologic signatures, like baseflow index and reduce our model uncertainty. Uncertainty can also be attributable to the watersheds selected (McMillan et al., 2021). While we limited the range of watershed sizes and sampled across diverse regions, we under-sampled certain regions including the northeastern U.S. and mountainous regions, where a high proportion of forest cover and steep slopes, respectively, tend to increase our uncertainty in mapping surface water. Generating the surface water variables was also computationally intensive and limited our feasible sample size, which also likely contributed uncertainty to the modelling effort. Further, while surface water extent was used to represent surface water storage, the two are distinct measurements, and in the future, conversion of surface water (2D) to storage (3D) will facilitate improved modelling of total water distribution. Lastly, uncertainty can be introduced by the statistical modelling approach itself. To minimize modelling-related uncertainty we applied hyper-parameter optimization and variable selection procedures. Random forest models have previously been found to be an effective mechanism to model hydrologic signatures (Trancoso et al., 2016; Addor et al., 2018; Oppel and Schumann, 2020). Further exploration of how inundation impacts diverse components of flow regimes will be an important next step to reduce the uncertainty associated with this effort.”
  
  Comment: I was wondering about the choice of some explanatory variables. For instance, the 30-day annual maximum streamflow signatures are related to the extreme streamflow regime. In this case, it would be interesting to include explanatory variables related to extreme streamflow dynamics. Snow melting is indirectly accounted for by temperature, however, relevant variables such as extreme precipitation and antecedent wetness – based on soil moisture or antecedent precipitation (see Lun et al., 2021; Merz and Blöschl, 2009) are omitted in the model. Also, are the results sensitive to the choice of the 30-day window?
  Response: To better reflect high precipitation we added (1) a Rainfall and Runoff Factor or rainfall intensity, which reflects the amount of rainfall and peak intensity of each storm, and (2) a maximum monthly precipitation variable. Six of the models, including both flashiness models, selected rainfall intensity for inclusion, and one of the models selected maximum monthly precipitation for inclusion. Since snowmelt only occurs in about ½ of the watersheds, we did not include a SWE specific variable, but the influence of SWE is partially reflected in the DAYMET precipitation CV and seasonality variables. Variable selection and models were re-run with the additional potential variables. In earlier exploratory efforts we included both MAX30/area and MAX7/area but our results were very similar between the two variables and so we only retained MAX30/area to improve the manuscript’s readability.
  
  Comment: The discussion provided in the manuscript is very limited and mostly based on a literature review rather than the paper results, which makes the discussion most speculative.
  Response: We have substantially revised the Discussion results to focus more on the paper results as well as sources of uncertainty, and now use literature references to inform/support the analysis, rather than discussing the topic more generally.
  
  Specific Comments
  
  Lines 156-158. It is not clear whether the baseflow index is derived from the UFSF, which is based on model simulations, or whether it was just calculated in a similar fashion. Please clarify this information in the revised manuscript.
  Response: We calculated baseflow index ourselves. We revised the sentence to move the reference to the end of the statement to reduce confusion.
  
  Lines 377-381. Figure 3. It is difficult to follow the results shown in Fig. 3, especially for (c), (d), and (e) frames due to the units used.
  Response: We modified the Figure caption to improve clarity, specifically: “Greater flashiness (a, b), higher peak flows (c, d), and greater flows during low flow periods (e, f) are shown in blue.”
  
  Lines 399-408. Figures 4 and 5. The overall quality of Fig. 4 and 5 should be improved, the overlapping of axis labels with other graphical elements makes the interpretation difficult.
  Response: We moved the axis labels on both Figures 4 and 5 to improve readability.
  
  Lines 163-165. It is not clear which correlation method was used. For instance, in the main text, the Pearson correlation is mentioned, however, in tables and figure captions the Spearman is mentioned. Please clarify this information in the revised manuscript. If the Pearson correlation was indeed used, please check the assumptions for its use. Also, which test is used for the correlation significance? The Pearson/Spearman correlation itself is not a statistical test.
  Response: Pearson was used only to correlate the hydrologic indices over the two different time periods, since these correlations were linear and the hydrologic indices had already been tested for normality. All assumptions were met for using a Pearson correlation. When correlating the independent and dependent variables with each other, we used Spearman correlation with a Bonferroni correction to account for non-normal distributions in the independent variables. Correlation matrices with significance values (p-values) for Pearson correlations and Spearman rank correlations were calculated using the Hmisc package in R. To reduce confusion, we retained the identification of the specific correlation tests used in the Methods but eliminated mention of the statistical test in the Results and Appendix sections. We also now note that Hmisc R package was used to calculate significance.
  
  Lines 308-310. Indeed, the relative variations in hydrologic signature values between the long-term flow records (24 years) compared to the study period (8 years) are quite similar. This would be a pragmatic decision based on Sentinel data availability rather than a “solid justification”.
  Response: We removed this sentence from the text. This section has also been moved from the Results to the Methods, in response to a comment from Reviewer #1.
  
  Technical Comments
  
  Lines 387-390. Table 5. There is a missing parenthesis in the RMSE of leave-one-out cross-validation.
  Response: We have revised the column title to correct this error.
  
  References
  Lun, D., Viglione, A., Bertola, M., Komma, J., Parajka, J., Valent, P., and Blöschl, G.: Characteristics and process controls of statistical flood moments in Europe – a data-based analysis, Hydrology and Earth System Sciences, 25, 5535–5560, https://doi.org/10.5194/hess-25-5535-2021, 2021.
  
  Merz, R., & Blöschl, G.: Process controls on the statistical flood moments - a data based analysis. Hydrological Processes, 23, 675–696, 10.1002/hyp.7168, 2009.
  
  Westerberg, I. K. and McMillan, H. K.: Uncertainty in hydrological signatures, Hydrology and Earth System Sciences, 19, 3951–3968, https://doi.org/10.5194/hess-19-3951-2015, 2015.
  
  Citation: https://doi.org/10.5194/hess-2024-119-AC2

Status: closed

RC1:
'Comment on hess-2024-119', Anonymous Referee #1, 09 Jul 2024

Major comments

The main conclusions are based on increases in the explanatory power (R²) from the model including only climate as independent variables (M_climate) to the model including all variables as independent variables (M_all). These increases (Table 5) range from 0 to 10%, which results in adjusted R² increases up to 0.04. These are indeed very low model improvements. To better assess if these increases are not by chance, the authors should include another method for hypothesis testing, computing the p-value and the uncertainty behind these results.

The authors model the maximum annual flow, however, none of the climatic variables are related to high precipitation events (e.g. maximum annual precipitation). Given that peak flow is often linked to peak precipitation and snowmelt (Berghuijs et al., 2016), I suggest that such variables are added to the models. Additionally, why is the maximum annual flow computed for the time scale of 30 days? How do the results change if shorter time scales are also analyzed? (e.g. maximum annual daily flow; 7-day flow).

The baseflow index is usually highly connected to the geology of a catchment (Aboelnour et al., 2021; Bloomfield et al., 2021; Briggs et al., 2022; Carlier et al., 2018). None of the independent variables to model the baseflow index includes geological characteristics. This could potentially change the results and final conclusions obtained in the manuscript.

The title could be modified as it states knowledge that is already well established, with several references in the introduction showing that surface water storage impacts streamflow. It also could be misleading as it does not focus on the mechanisms behind these impacts.

Some of the flow signatures with the strongest influences of inundation variables are related to the temporal variability at the daily scale (flashiness index) rather than to a central measure of magnitude (e.g. MAX30/area, DryMonth/area). Would other flow variables related to the temporal variability be worth investigating? (e.g. CV of MAX30/area).

Minor comments

L26-29. What is precisely “amount of semi-permanent and permanent floodplain inundation” and “increases in seasonal floodplain inundation”? If it refers to the spatial extent, is it in terms of km² or relative to the catchment’s area?

From the Abstract alone, it is hard to get an idea of how significantly water storage influences streamflow. The Abstract would be clearer if it presented some numbers to show how much some key flow signatures are explained by the inundation variables (e.g. differences between M_climate and M_all R²).

Fig. 1 and flow signatures. “annual actual evapotranspiration divided by annual precipitation,” Shouldn’t it be potential evapotranspiration? The aridity index usually considers potential evapotranspiration rather than actual evapotranspiration (e.g., Berghuijs et al., 2017; Gudmundsson et al., 2016; Sawicz et al., 2011). It also looks like something is off because Fig. 1 shows that mean annual actual evaporation is up to 29.8 times higher than mean annual precipitation. Values in Fig. 1 do not match the values in Table 2 (Aridity index); are they the same variable?

Section 2.2. Is the calendar year used for the computation of the hydrologic signatures?

Are the gap-filled inundation time series publicly available? It would be necessary to replicate the study.

What were the precise selection criteria behind the watersheds chosen? Why aren’t more watersheds analyzed, given that there are hundreds (or thousands) of gauges with data available?

L176-177. How precisely is the temperature CV calculated? Is it calculated from monthly temperature values considering the entire time series? The same applies to precipitation CV.

Section 2.3.2 and Table 2. Are the subsurface and topography variables computed using the mean (or median) value of all raster cells within the catchment?

What is the spatial configuration of the inundation variables? It would be great to include maps showing their spatial distribution and a brief discussion. What are the relationships between the inundation variables? E.g. how are they correlated?

Results - Section 3.2 Flashiness signatures. The authors show how the flashiness indices are only weakly correlated with climate, have the models with the lowest R² compared to other flow signatures (Table 4), and have the highest increases in R² when all independent variables are added. I’m guessing – Is one contributor to the weak climate signal the fact that the flashiness index focuses on the daily variability while the climate indices focus on the monthly variability (or the annual mean)? Could this contribution to the lowest M_climate R² indirectly explain the highest potential for R²increases when including all variables?

Table 2; Table 4; Table 6. Is it actual or potential evapotranspiration? Does the coefficient of variation of climate variables refer to the monthly change or annual change? Also, it looks like the authors might have computed the standard deviation instead of the coefficient of variation because of the units of mm, ºC, and the high values. The coefficient of variation is unitless as it is relative to the sample mean.

I believe that the Results section would read better if the sensitivity analysis of the data (Section 3.1) was not the first result presented and discussed, given that this is not the main topic of research.

L301-303. To me, the DryMonth/area bias of -2.2% seems very low for interpreting it as a drier condition than for the longer period, given that the between-year coefficient of variation of flow indices is usually much higher than 2%. The interpretation is up to the authors, but it looks like a very small bias and perhaps not statistically significant. The bias in the baseflow index (-11.8%) might come from a larger volume of high flows (as shown with MAX30/area) rather than lower flows or a drier period. These interpretations would agree with those using the PDSI.

Parts of the results show the Pearson correlation (R) and its associated p-values without stating if the residuals follow the normal distribution or if the authors checked for outliers. It would be good to either state that the authors checked for the correlation assumptions or use Spearman correlation throughout the manuscript.

L340-341. Here the authors refer to R as the Spearman correlation, but in other parts of the manuscript as the Pearson correlation.

How correlated are the indices Precipitation (P), Aridity index (ET/P) and Water demand (P-ET)? They look very highly correlated given the similar results.

Table 6 and associated discussion. Is it possible to know whether these variables are positively or negatively associated with the flow signatures?

Fig. 4 and Fig. 5. The numbers are hard to read because they are in the middle of the plot, mixed with the observations. It would read better if the numbers were displayed on the border of the plots. Fig. 4 shows R as the Spearman correlation but the Results section uses R as the Pearson correlation.

L434-436. Here I believe it should be clarified how much the models improved with the inundation variables, providing some numbers.

L439-441. It will be easier to interpret if the authors clarify whether these inundation variables are significantly correlated with other independent variables, particularly climate.

L452-454. How do geographically isolated wetlands contribute to baseflow? Given that they are isolated, what physical phenomena could explain this contribution? It is surprising to me that geographically isolated wetlands contribute to increasing baseflow while floodplain wetlands don’t. The reasons for this difference could also be discussed.

L524-528. I believe that it should also be clarified how much the inundation variables increased the model’s explanatory power. It would be interesting if the Conclusion section also presented results for the different inundation variables (e.g. floodplain and non-floodplain).

Technical comments

L49. “endanger poverty”. Do the authors mean increase (exacerbate) poverty?

Table 5. There is a parenthesis missing it the column “RMSE LOOCV)”.

Table 6. Are the values shown unitless?

Table 3, caption. I believe that R and p should be italics.

L419. “difference between baseflow and high flow conditions (i.e., (Q10-Q95)/area).”. It looks like it should be the difference between low and high flow conditions.

L473. There’s a mistake in “including like precipitation”.

L479. “seasonal flooding” refers to the seasonal flooding extent? “reduce or otherwise impact” is confusing to me. Do the authors mean that it could also be the other way around? That is, that larger peak discharges cause larger flooding extents?

References

Aboelnour, M. A., Engel, B. A., Frisbee, M. D., Gitau, M. W., and Flanagan, D. C.: Impacts of Watershed Physical Properties and Land Use on Baseflow at Regional Scales, Journal of Hydrology: Regional Studies, 35, 100810, https://doi.org/10.1016/j.ejrh.2021.100810, 2021.
Berghuijs, W. R., Woods, R. A., Hutton, C. J., and Sivapalan, M.: Dominant flood generating mechanisms across the United States, Geophys. Res. Lett., 43, 4382–4390, https://doi.org/10.1002/2016GL068070, 2016.
Berghuijs, W. R., Larsen, J. R., van Emmerik, T. H. M., and Woods, R. A.: A Global Assessment of Runoff Sensitivity to Changes in Precipitation, Potential Evaporation, and Other Factors, Water Resour. Res., 53, 8475–8486, https://doi.org/10.1002/2017WR021593, 2017.
Bloomfield, J. P., Gong, M., Marchant, B. P., Coxon, G., and Addor, N.: How is Baseflow Index (BFI) impacted by water resource management practices?, Hydrology and Earth System Sciences, 25, 5355–5379, https://doi.org/10.5194/hess-25-5355-2021, 2021.
Briggs, M. A., Goodling, P., Johnson, Z. C., Rogers, K. M., Hitt, N. P., Fair, J. B., and Snyder, C. D.: Bedrock depth influences spatial patterns of summer baseflow, temperature and flow disconnection for mountainous headwater streams, Hydrology and Earth System Sciences, 26, 3989–4011, https://doi.org/10.5194/hess-26-3989-2022, 2022.
Carlier, C., Wirth, S. B., Cochand, F., Hunkeler, D., and Brunner, P.: Geology controls streamflow dynamics, Journal of Hydrology, 566, 756–769, https://doi.org/10.1016/j.jhydrol.2018.08.069, 2018.
Gudmundsson, L., Greve, P., and Seneviratne, S. I.: The sensitivity of water availability to changes in the aridity index and other factors-A probabilistic analysis in the Budyko space, Geophys. Res. Lett., 43, 6985–6994, https://doi.org/10.1002/2016GL069763, 2016.
Sawicz, K., Wagener, T., Sivapalan, M., Troch, P. A., and Carrillo, G.: Catchment classification: empirical analysis of hydrologic similarity based on catchment function in the eastern USA, Hydrol. Earth Syst. Sci., 15, 2895–2911, https://doi.org/10.5194/hess-15-2895-2011, 2011.

Citation: https://doi.org/10.5194/hess-2024-119-RC1
- AC1: 'Reply on RC1', Melanie Vanderhoof, 22 Aug 2024
  
  Manuscript: HESS-2024-119
  
  Summary of Revisions:
  We very much appreciate the thoughtful comments made by both reviewers. In response to their comments we (1) added two new climate variables to better represent high flow conditions, (2) added a new geologic variable, (3) updated aridity to now use PET instead of ET, and (4) recalculated precipitation and maximum temperature coefficient of variation to use monthly inputs instead of daily. Because of these changes in potential variables and variable values, we re-ran the hyper-parameterization and variable selection processes, and re-calculated all original and additional statistical tests. Our results are very similar to the original models, but we wanted to make sure both reviewers were aware that the results had been re-run and all tables and figures were updated, correspondingly.
  
  To better evaluate model performance, we added calculations of MSE and AIC, in addition to R², adjusted R² and RMSE. We also added a significance test, using rfPermute, for the inundation variables selected for inclusion in models. Further, we added a new Table to the manuscript consolidating all of the inundation-related data in a single place, to improve the ease with which reviewers can interpret what role the inundation variables played between signatures, models and the correlation analysis.
  
  We have also modified the title, Abstract, Results and Discussion sections to include more numerical values, and more accurately represent the small to moderate improvements in model performance. Lastly, we added a new paragraph to the Discussion section to more thoroughly discuss sources of uncertainty.
  
  We believe that the manuscript is now much improved and we hope the reviewers find the revised manuscript suitable for publication.
  
  Reviewer #1
  Major comments
  
  Comment: The main conclusions are based on increases in the explanatory power (R²) from the model including only climate as independent variables (M_climate) to the model including all variables as independent variables (M_all). These increases (Table 5) range from 0 to 10%, which results in adjusted R² increases up to 0.04. These are indeed very low model improvements. To better assess if these increases are not by chance, the authors should include another method for hypothesis testing, computing the p-value and the uncertainty behind these results.
  Response: To better be able to compare model performance, in addition to the R², adjusted R²and RMSE values, we added mean square error (MSE) and Akaike information criterion (AIC). We also added a new table (Table 6) to consolidate our characterization of the potential influence of inundation dynamics. This includes testing if the indices are significantly correlated with an inundation variable, if inundation variables were selected for the model using the forward step-wise variable selection process, and if the variables were significant (p<0.01) in the models, as well as 5 statistical measures comparing M_Climate to M_All.
  
  Regarding the request for hypothesis testing, the impact of surface water storage on discharge is known to be subtle and hard to isolate. For example, annual precipitation showed a Spearman correlation value of R=0.75 with fraction of watershed showing seasonal floodplain inundation. This is because precipitation impacts not just discharge, but also surface water dynamics. This means that the climate only model is not an independent model that is able to exclude the influence of surface water. We have added additional text to the Discussion section to further discuss this important challenge. Specifically,
  
  “In cases where variables can be isolated (e.g., basins with tile drainage, compared to basins without tile drainage), significant differences between models can be an appropriate mechanism to help quantify the impact of a variable (Rainio et al. 2024). But both discharge and surface water extent tend to be a function of climate inputs and catchment characteristics (Heimhuber et al., 2016; Vanderhoof et al., 2018). Consequently, our inundation variables were significantly correlated with not only catchment characteristics such as depth to bedrock, slope, and topographic diversity, but also climate variables, including annual precipitation, aridity, and the rainfall and runoff factor (Table A4), with the highest correlation occurring between the amount of seasonal inundation on the floodplain and the watershed rainfall and runoff factor (R=0.80, p<0.01) (Table A4). Because our inundation variables were significantly correlated with select climate variables, the M_Climate could not be considered a null model, relative to M_All, and therefore comparing variables selected, variable significance and importance as well as model improvement using evaluation metrics was seen as most appropriate.”
  
  Comment: The authors model the maximum annual flow, however, none of the climatic variables are related to high precipitation events (e.g. maximum annual precipitation). Given that peak flow is often linked to peak precipitation and snowmelt (Berghuijs et al., 2016), I suggest that such variables are added to the models. Additionally, why is the maximum annual flow computed for the time scale of 30 days? How do the results change if shorter time scales are also analyzed? (e.g. maximum annual daily flow; 7-day flow).
  Response: To better reflect high precipitation conditions we added (1) a Rainfall and Runoff Factor, or rainfall intensity, which reflects the amount of rainfall and peak intensity of each storm, and (2) a maximum monthly precipitation variable. Six of the models, including both flashiness models, selected rainfall intensity for inclusion. Since snowmelt only occurs in about ½ of the watersheds, we did not include a SWE specific variable, but the influence of SWE is partially reflected in the DAYMET precipitation CV and seasonality variables. Variable selection and models were re-run with the additional potential variables. In earlier exploratory efforts we included both MAX30/area and MAX7/area but our results were very similar between the two variables and so we only retained MAX30/area to improve the manuscript’s readability.
  
  Comment: The baseflow index is usually highly connected to the geology of a catchment (Aboelnour et al., 2021; Bloomfield et al., 2021; Briggs et al., 2022; Carlier et al., 2018). None of the independent variables to model the baseflow index includes geological characteristics. This could potentially change the results and final conclusions obtained in the manuscript.
  Response: The soil and geologic variable included in the analysis were derived from the GAGES-II (GAGES-II: Geospatial Attributes of Gages for Evaluating Streamflow - ScienceBase-Catalog), which does not contain complete information on aquifers. However, we currently have percent clay, percent sand, average soil thickness, which reflects the depth to bedrock, and annual minimum depth to water table. We added silt fraction and geologic permeability as new variables and re-ran the variable selection process. Our variable list includes many of the variables that Bloomfield et al (2021) reported helped explain the Baseflow Index, including clay fraction, crop cover, topography, and aridity. Briggs et al. (2022) cited bedrock depth as the primary explanatory variable, which we have included as average soil thickness. Further, Aboelnour et al. (2021) found soil groups and precipitation played key roles in explaining baseflow, and our inclusion of climate variables as well as soil variables is consistent with this paper. In addition to adding the two additional soil and geology variables, we have relabeled soil thickness to “depth to bedrock” and also added a comment to the Discussion section to further note the possible uncertainty. Specifically:
  
  “Our findings may also depend on the variables included in the analysis. While we included diverse climate and catchment characteristics, it is possible that additional catchment variables, such as data on aquifers (Bloomfield et al., 2021) or additional geologic characteristics, such as proportion sandstone (Carlier et al. 2018), could improve the explanatory power of certain hydrologic signatures, like baseflow index and reduce our model uncertainty.”
  
  Comment: The title could be modified as it states knowledge that is already well established, with several references in the introduction showing that surface water storage impacts streamflow. It also could be misleading as it does not focus on the mechanisms behind these impacts.
  Response: We have changed the title to, “Integrating remotely sensed surface water dynamics in hydrologic signature modelling”
  
  Comment: Some of the flow signatures with the strongest influences of inundation variables are related to the temporal variability at the daily scale (flashiness index) rather than to a central measure of magnitude (e.g. MAX30/area, DryMonth/area). Would other flow variables related to the temporal variability be worth investigating? (e.g. CV of MAX30/area).
  Response: Yes, we agree that it would be worthwhile to explore how inundation variables might impact a wide number of hydrologic signatures. We were concerned that including too many hydrologic signatures in a single paper could make it more challenging for readers to interpret. We originally tested several CV signatures, including CV of dry season and annual CV. Annual CV showed results similar to the flashiness signatures. We ultimately excluded the CV signatures as we decided that they were more challenging to interpret hydrologically, since the variability could reflect episodic or seasonal variability. In the Discussion section we have added a comment in acknowledgement of this point of concern:
  
  “Our results, for instance, could depend on the hydrologic signatures included in the analysis (McMillan et al., 2021). It is possible that inundation has a greater or lesser influence on different aspects of the flow regime than those explored here. For the hydrologic signatures considered, such signatures can show substantial uncertainty, attributable to error in precipitation and discharge datasets (Westerberg and McMillan, 2015). To account for uncertainty, the hydrologic signatures were calculated annually, and then averaged across multiple years, while independent variables were averaged over multiple years and across each watershed, both steps that have been shown to reduce uncertainty (Westerberg and McMillan, 2015).”
  
  Minor comments
  
  L26-29. What is precisely “amount of semi-permanent and permanent floodplain inundation” and “increases in seasonal floodplain inundation”? If it refers to the spatial extent, is it in terms of km² or relative to the catchment’s area?
  Response: We revised the abstract to clarify that this refers to the proportion or fraction of a watershed with that inundation type extent.
  
  Comment: From the Abstract alone, it is hard to get an idea of how significantly water storage influences streamflow. The Abstract would be clearer if it presented some numbers to show how much some key flow signatures are explained by the inundation variables (e.g. differences between M_climate and M_all R²).
  Response: We have edited the abstract to include more numerical information on the findings.
  
  Comment: Fig. 1 and flow signatures. “annual actual evapotranspiration divided by annual precipitation,” Shouldn’t it be potential evapotranspiration? The aridity index usually considers potential evapotranspiration rather than actual evapotranspiration (e.g., Berghuijs et al., 2017; Gudmundsson et al., 2016; Sawicz et al., 2011). It also looks like something is off because Fig. 1 shows that mean annual actual evaporation is up to 29.8 times higher than mean annual precipitation. Values in Fig. 1 do not match the values in Table 2 (Aridity index); are they the same variable?
  Response: We updated aridity to integrate TerraClimate’s PET variable, so that aridity is now calculated using PET/PR. We updated the text in section 2.3.1, correlations and models, and Figure 1 correspondingly. Regarding Figure 1, the aridity values in Figure 1 are identical to the aridity values in Table 2. The maximum aridity values in CONUS occur in the Sonoran desert, well outside of our study basins. Using PET/PR the values are in the 60’s in the Sonoran desert. The challenge in providing a map of aridity, is that the values are highly skewed, so that the minimum and maximum values are not very representative. Therefore, we changed the legend number labels in Figure 1 to reflect the median values of each color instead of the extreme minimum and maximum values and updated the caption to note this change in labeling.
  
  Section 2.2. Is the calendar year used for the computation of the hydrologic signatures?
  Response: Yes, we now clarify in the text that the calendar year, not the water year, was used to compute the hydrologic signatures.
  
  Comment: Are the gap-filled inundation time series publicly available? It would be necessary to replicate the study.
  Response: We have now released the associated inundation data. The data release includes a raster of each watershed with a spatially explicit categorization of inundation dynamics (2016-2023) corresponding to the inundation categories included in the analysis. This will allow for the replication of the surface water data. The rest of the data used in the analysis is publicly available. We have updated the data availability section.
  
  Vanderhoof, M.K., Nieuwlandt, P., Golden, H., Lane, C., Christensen, J.R. (2024) Data release for surface water storage influences streamflow signatures across the United States (2017-2021). U.S. Geological Survey data release, Sciencebase, https://doi.org/10.5066/P9RLFMEQ.
  
  Comment: What were the precise selection criteria behind the watersheds chosen? Why aren’t more watersheds analyzed, given that there are hundreds (or thousands) of gauges with data available?
  Response: We recognize that relying on the CAMEL dataset (n=671, watershed size ranges from 1 to 25,800 km2), could in theory greatly increase our sample size. However, we are also well aware that watershed size has a strong influence on runoff responses, therefore one of our primary objectives was to control for watershed size (avoiding very small and very large watersheds) and avoid nested watersheds. The watersheds included in our analysis were predominantly between 1500 and 5000 km². We note that only 74 of the CAMEL watersheds fall within this size range (348 within the 300 km² to 10,000 km² size range).
  
  Additionally, the primary objective of the analysis was to incorporate novel, remotely sensed surface water variables reflecting both location (floodplain, non-floodplain) as well as hydroperiod (temporary, seasonal, semi-permanent to permanent). The advantage of utilizing the Sentinel-1 and Sentinel-2 water algorithms, is that they map vegetated water, which is where most of the temporal dynamics or variability occurs, but is excluded from most available surface water products, which typically map only open water. Further, the use of Sentinel-1 allows us to bypass cloud cover and provide more accurate estimates of hydroperiod that do not reflect data gaps due to clouds. However, deriving these variables was extremely time intensive, which limited the number of basins that we could look at.
  
  We revised section 2.1 to clarify that selecting non-nested watershed within a bounded size class was the primary selection criteria, followed by other criteria including considering the accuracy of the surface water algorithms, location of major dams, and exclusion of tidal wetlands.
  
  L176-177. How precisely is the temperature CV calculated? Is it calculated from monthly temperature values considering the entire time series? The same applies to precipitation CV.
  Response: The CV variables were originally calculated from daily data, but upon further reflection, we felt that monthly data would be more appropriate for precipitation, therefore we re-calculated both the temperature and precipitation CV from monthly data. We have also added this detail to section 2.3.1.
  
  Section 2.3.2 and Table 2. Are the subsurface and topography variables computed using the mean (or median) value of all raster cells within the catchment?
  Response: The mean value was reported, this detail has been added or clarified in section 2.3.2.
  
  Comment: What is the spatial configuration of the inundation variables? It would be great to include maps showing their spatial distribution and a brief discussion. What are the relationships between the inundation variables? E.g. how are they correlated?
  Response: Although it isn’t feasible to show the spatial distribution for all of the basins, we added a new Figure (now Figure 3), showing examples of variability in inundation between watersheds. We also modified now Figure 5 to show spatial inundation examples of watersheds representing the low and high ends of each scatter plot. Correlations between inundation variables ranged from 0 to 0.98 with a median correlation value of 0.35. Since we have quite a few Appendix tables already, we decided not to include a new table showing the correlations between the inundation variables, but correlation limits in the variable selection process limit the inclusion of highly correlated variables.
  
  Results - Section 3.2 Flashiness signatures. The authors show how the flashiness indices are only weakly correlated with climate, have the models with the lowest R² compared to other flow signatures (Table 4), and have the highest increases in R² when all independent variables are added. I’m guessing – Is one contributor to the weak climate signal the fact that the flashiness index focuses on the daily variability while the climate indices focus on the monthly variability (or the annual mean)? Could this contribution to the lowest M_climate R² indirectly explain the highest potential for R²increases when including all variables?
  Response: Rainfall intensity was selected in 3 of the 4 flashiness models. After including the additional climate variables for consideration, the R² for the Flashiness index M_Climate increased from 0.50 to 0.54 but the M_All increased from 0.55 to 0.61 so the addition of improved climate conditions does not seem to have impacted the contribution of the inundation variables.
  
  Table 2; Table 4; Table 6. Is it actual or potential evapotranspiration? Does the coefficient of variation of climate variables refer to the monthly change or annual change? Also, it looks like the authors might have computed the standard deviation instead of the coefficient of variation because of the units of mm, ºC, and the high values. The coefficient of variation is unitless as it is relative to the sample mean.
  Response: Evapotranspiration uses “eto” with is actual evapotranspiration using grass as a reference vegetation. This is now clarified in the Methods. Aridity has now been updated to be calculated from PET and P. The units on the coefficient of variations have been corrected to be unitless and were re-calculated using monthly data instead of daily data. The values have been updated in Table 2.
  
  Comment: I believe that the Results section would read better if the sensitivity analysis of the data (Section 3.1) was not the first result presented and discussed, given that this is not the main topic of research.
  Response: We moved section 3.1 to the Methods section, since the sensitivity analysis is helping to justify decision points related to the Methods. This allows the Results to focus on the main topic of research.
  
  L301-303. To me, the DryMonth/area bias of -2.2% seems very low for interpreting it as a drier condition than for the longer period, given that the between-year coefficient of variation of flow indices is usually much higher than 2%. The interpretation is up to the authors, but it looks like a very small bias and perhaps not statistically significant. The bias in the baseflow index (-11.8%) might come from a larger volume of high flows (as shown with MAX30/area) rather than lower flows or a drier period. These interpretations would agree with those using the PDSI.
  Response: We agree, and have revised this sentence accordingly: “Additionally, while the DryMonth/area bias was minimal, the baseflow index showed a relative bias of -11.8%, potentially reflecting a higher volume of water coming from high flows within the 8-year period, relative to the longer period (Table 2).”
  
  Comment: Parts of the results show the Pearson correlation (R) and its associated p-values without stating if the residuals follow the normal distribution or if the authors checked for outliers. It would be good to either state that the authors checked for the correlation assumptions or use Spearman correlation throughout the manuscript.
  AND
  L340-341. Here the authors refer to R as the Spearman correlation, but in other parts of the manuscript as the Pearson correlation.
  Response: Pearson was used only to correlate the hydrologic indices over the two different time periods, since these correlations were linear and the hydrologic indices had already been tested for normality. In this case, all assumptions were met for using a Pearson correlation. When correlating the independent and dependent variables with each other, we used Spearman correlation with a Bonferroni correction to account for non-normal distributions in the independent variables. To reduce confusion, we retained the identification of the specific correlation tests used in the Methods but eliminated mention of the statistical test in the Results and Appendix sections.
  
  Comment: How correlated are the indices Precipitation (P), Aridity index (ET/P) and Water demand (P-ET)? They look very highly correlated given the similar results.
  Response: Annual precipitation has a Spearman rank correlation value of R = 0.82 with water demand, and a correlation of R = -0.79 with the updated aridity index (PET/P). The updated aridity shows a correlation of R=-0.95 with water demand. Correlation between variables is considered and limited within the model variable selection process.
  
  Comment: Table 6 and associated discussion. Is it possible to know whether these variables are positively or negatively associated with the flow signatures?
  Response: As random forest models do not produce regression coefficients that can inform the directionality of the relationship, partial dependence plots are the primary mechanism through which directionality (i.e., positive, negative) can be displayed. However, as the relationship is not assumed to be linear, the directionality is not necessarily consistent over the full range of the variable. Therefore discerning directionality from Table 4, where each signature is correlated with each variable, is the most straightforward mechanism to interpret patterns of directionality.
  
  Fig. 4 and Fig. 5. The numbers are hard to read because they are in the middle of the plot, mixed with the observations. It would read better if the numbers were displayed on the border of the plots. Fig. 4 shows R as the Spearman correlation but the Results section uses R as the Pearson correlation.
  Response: We moved the x-axis and y-axis labels to for both Fig 4 and Fig 5 to improve readability. Spearman correlations were used for everything, with the exception of correlating the longer-term hydrologic signatures with the shorter-term hydrologic signatures, which is the only place where Pearson correlation was used.
  
  L434-436, Discussion. Here I believe it should be clarified how much the models improved with the inundation variables, providing some numbers.
  Response: We have revised the Discussion section to include more numerical values.
  
  L439-441, Discussion. It will be easier to interpret if the authors clarify whether these inundation variables are significantly correlated with other independent variables, particularly climate.
  Response: We now clarify in the Discussion about the correlations between the inundation and climate variables and how this influenced our interpretation of the analysis.
  
  L452-454, Discussion. How do geographically isolated wetlands contribute to baseflow? Given that they are isolated, what physical phenomena could explain this contribution? It is surprising to me that geographically isolated wetlands contribute to increasing baseflow while floodplain wetlands don’t. The reasons for this difference could also be discussed.
  Response: Geographically isolated wetlands are still often very much connected via wetland-groundwater interactions. The term only refers to a lack of surface water connections. A strong description of the physical processes are provided in McLaughlin et al (2014), but essentially differences in specific yield between uplands and GIWs leads to frequent reversals in hydraulic gradients that results in GIWs acting as both groundwater sinks and sources. This process has also been demonstrated in multiple independent modeling efforts (Evenson et al., 2015; Ameli and Creed, 2017; Blanchette et al., 2019; Yeo et al., 2019). We expanded our explanation of the physical process in the Discussion section.
  
  L524-528, Conclusion. I believe that it should also be clarified how much the inundation variables increased the model’s explanatory power. It would be interesting if the Conclusion section also presented results for the different inundation variables (e.g. floodplain and non-floodplain).
  Response: We revised the Conclusion section to include some numerical values, as well as re-frame the results to focus on the floodplain inundation.
  
  Technical comments
  
  L49. “endanger poverty”. Do the authors mean increase (exacerbate) poverty?
  Response: The phrase currently used is “endanger property”.
  
  Table 5. There is a parenthesis missing it the column “RMSE LOOCV)”.
  Response: We have revised this column title and the remaining parenthesis has been removed.
  
  Table 6. Are the values shown unitless?
  Response: Yes, the values shown are unitless.
  
  Table 3, caption. I believe that R and p should be italics.
  Response: R and p have been italicized.
  
  L419. “difference between baseflow and high flow conditions (i.e., (Q10-Q95)/area).”. It looks like it should be the difference between low and high flow conditions.
  Response: Changed as recommended.
  
  L473. There’s a mistake in “including like precipitation”.
  Response: Sentence has been corrected.
  
  L479. “seasonal flooding” refers to the seasonal flooding extent? “reduce or otherwise impact” is confusing to me. Do the authors mean that it could also be the other way around? That is, that larger peak discharges cause larger flooding extents?
  Response: Sentence has been revised to clarify “seasonal flooding extent” and revised to improve clarity.
  
  References
  
  Aboelnour, M. A., Engel, B. A., Frisbee, M. D., Gitau, M. W., and Flanagan, D. C.: Impacts of Watershed Physical Properties and Land Use on Baseflow at Regional Scales, Journal of Hydrology: Regional Studies, 35, 100810, https://doi.org/10.1016/j.ejrh.2021.100810, 2021.
  Berghuijs, W. R., Woods, R. A., Hutton, C. J., and Sivapalan, M.: Dominant flood generating mechanisms across the United States, Geophys. Res. Lett., 43, 4382–4390, https://doi.org/10.1002/2016GL068070, 2016.
  Berghuijs, W. R., Larsen, J. R., van Emmerik, T. H. M., and Woods, R. A.: A Global Assessment of Runoff Sensitivity to Changes in Precipitation, Potential Evaporation, and Other Factors, Water Resour. Res., 53, 8475–8486, https://doi.org/10.1002/2017WR021593, 2017.
  Bloomfield, J. P., Gong, M., Marchant, B. P., Coxon, G., and Addor, N.: How is Baseflow Index (BFI) impacted by water resource management practices?, Hydrology and Earth System Sciences, 25, 5355–5379, https://doi.org/10.5194/hess-25-5355-2021, 2021.
  Briggs, M. A., Goodling, P., Johnson, Z. C., Rogers, K. M., Hitt, N. P., Fair, J. B., and Snyder, C. D.: Bedrock depth influences spatial patterns of summer baseflow, temperature and flow disconnection for mountainous headwater streams, Hydrology and Earth System Sciences, 26, 3989–4011, https://doi.org/10.5194/hess-26-3989-2022, 2022.
  Carlier, C., Wirth, S. B., Cochand, F., Hunkeler, D., and Brunner, P.: Geology controls streamflow dynamics, Journal of Hydrology, 566, 756–769, https://doi.org/10.1016/j.jhydrol.2018.08.069, 2018.
  Gudmundsson, L., Greve, P., and Seneviratne, S. I.: The sensitivity of water availability to changes in the aridity index and other factors-A probabilistic analysis in the Budyko space, Geophys. Res. Lett., 43, 6985–6994, https://doi.org/10.1002/2016GL069763, 2016.
  Sawicz, K., Wagener, T., Sivapalan, M., Troch, P. A., and Carrillo, G.: Catchment classification: empirical analysis of hydrologic similarity based on catchment function in the eastern USA, Hydrol. Earth Syst. Sci., 15, 2895–2911, https://doi.org/10.5194/hess-15-2895-2011, 2011.
  
  Citation: https://doi.org/10.5194/hess-2024-119-AC1
RC2:
'Comment on hess-2024-119', Anonymous Referee #2, 12 Jul 2024
This is a review of the manuscript “Surface water storage influences streamflow signatures”, authored by Vanderhoof et al. and submitted to Hydrology and Earth System Sciences (HESS). The paper explores the relationship between streamflow signatures and climate, land cover, geology, topography, and surface water storage variables using random forest (RF) models for 72 catchments in the contiguous United States (CONUS). The study addresses an important subject in an important study area and should interest HESS readers. The study has promising results, however, there are methodological issues that should be carefully addressed before publication. Please, find below some suggestions that I hope will improve the overall reliability of the manuscript.

GENERAL COMMENTS
The manuscript is well-written and sound. The introduction is informative of the study’s scope and context; the RF approach is very interesting.

The general quality of the figures is good, but some improvements can help enhance the interpretability of the results. Please, refer to specific point-by-point comments.

The title and abstract are not representative of the study. The title expresses well-known information about the relationship between streamflow signatures and surface water storage which is mentioned many times in the introduction. It states a strong conclusion that, in my understanding, is not supported by the results. The abstract could be improved if some numerical information about the findings were provided, such as differences in the explanatory power of the model after the inclusion of the wetlands and inundation variables.

If I understand correctly, all of the 72 catchments are pooled together. For instance, the pooling will necessarily pool together information on small and large catchments, and also on catchments with distinct runoff generation mechanisms, and hence, with distinct responses to the explanatory variables. This pooling would result in a large variability in the hydrological signatures, which combined with a quite limited sample size (e.g., 72 samples), could result in biased estimates of the model parameters and large uncertainties. How is this issue addressed in the analysis? Please: (i) provide additional analysis; or (ii) clearly mention this as a potential limitation of the study, which currently is little discussed in the manuscript.

The use of the random forest (RF) model is very interesting to address the research questions. However, the output from the RF model provided in the manuscript is simply the R-squared and the relative importance of each explanatory variable. This is a very limited result which is not a meaningful interpretation of the hydrological process – especially taking into account all the effort employed in the data curation. I think that the RF is a very useful tool for a first-step analysis, but complementary analysis must be done for a more meaningful interpretation of the results.

The overall improvement in the model's performance due to the inclusion of wetlands and inundation variables is very limited since improvements are up to 10% and in most cases are less than 6%. The manuscript’s results do not corroborate the conclusions, and it is unclear if the results are indeed meaningful from a hydrological process point of view. I think that would be very useful if p-values were provided to understand if these results are indeed significant.

There are a lot of sources of uncertainties in the methodology related to both the streamflow signatures (see Westerberg and McMillan, 2015) and also due to the modeling approach which are not accounted for in the manuscript. Those uncertainties may be large and hence the results can be misleading. This issue makes the results even less reliable and is timidly discussed in the manuscript. Please, provide uncertainty estimation or mention this as a critical limitation of the study.

I was wondering about the choice of some explanatory variables. For instance, the 30-day annual maximum streamflow signatures are related to the extreme streamflow regime. In this case, it would be interesting to include explanatory variables related to extreme streamflow dynamics. Snow melting is indirectly accounted for by temperature, however, relevant variables such as extreme precipitation and antecedent wetness – based on soil moisture or antecedent precipitation (see Lun et al., 2021; Merz and Blöschl, 2009) are omitted in the model. Also, are the results sensitive to the choice of the 30-day window?

The discussion provided in the manuscript is very limited and mostly based on a literature review rather than the paper results, which makes the discussion most speculative.

SPECIFIC COMMENTS
Lines 156-158. It is not clear whether the baseflow index is derived from the UFSF, which is based on model simulations, or whether it was just calculated in a similar fashion. Please clarify this information in the revised manuscript.

Lines 377-381. Figure 3. It is difficult to follow the results shown in Fig. 3, especially for (c), (d), and (e) frames due to the units used.

Lines 399-408. Figures 4 and 5. The overall quality of Fig. 4 and 5 should be improved, the overlapping of axis labels with other graphical elements makes the interpretation difficult.

Lines 163-165. It is not clear which correlation method was used. For instance, in the main text, the Pearson correlation is mentioned, however, in tables and figure captions the Spearman is mentioned. Please clarify this information in the revised manuscript. If the Pearson correlation was indeed used, please check the assumptions for its use. Also, which test is used for the correlation significance? The Pearson/Spearman correlation itself is not a statistical test.

Lines 308-310. Indeed, the relative variations in hydrologic signature values between the long-term flow records (24 years) compared to the study period (8 years) are quite similar. This would be a pragmatic decision based on Sentinel data availability rather than a “solid justification”.

TECHNICAL COMMENTS

Lines 387-390. Table 5. There is a missing parenthesis in the RMSE of leave-one-out cross-validation.

REFERENCES

Lun, D., Viglione, A., Bertola, M., Komma, J., Parajka, J., Valent, P., and Blöschl, G.: Characteristics and process controls of statistical flood moments in Europe – a data-based analysis, Hydrology and Earth System Sciences, 25, 5535–5560, https://doi.org/10.5194/hess-25-5535-2021, 2021.

Merz, R., & Blöschl, G.: Process controls on the statistical flood moments - a data based analysis. Hydrological Processes, 23, 675–696, 10.1002/hyp.7168, 2009.

Westerberg, I. K. and McMillan, H. K.: Uncertainty in hydrological signatures, Hydrology and Earth System Sciences, 19, 3951–3968, https://doi.org/10.5194/hess-19-3951-2015, 2015.
Citation: https://doi.org/10.5194/hess-2024-119-RC2
- AC2: 'Reply on RC2', Melanie Vanderhoof, 22 Aug 2024
  
  Manuscript: HESS-2024-119
  
  Summary of Revisions:
  We very much appreciate the thoughtful comments made by both reviewers. In response to their comments we (1) added two new climate variables to better represent high flow conditions, (2) added a new geologic variable, (3) updated aridity to now use PET instead of ET, and (4) recalculated precipitation and maximum temperature coefficient of variation to use monthly inputs instead of daily. Because of these changes in potential variables and variable values, we re-ran the hyper-parameterization and variable selection processes, and re-calculated all original and additional statistical tests. Our results are very similar to the original models, but we wanted to make sure both reviewers were aware that the results had been re-run and all tables and figures were updated, correspondingly.
  
  To better evaluate model performance, we added calculations of MSE and AIC, in addition to R², adjusted R² and RMSE. We also added a significance test, using rfPermute, for the inundation variables selected for inclusion in models. Further, we added a new Table to the manuscript consolidating all of the inundation-related data in a single place, to improve the ease with which reviewers can interpret what role the inundation variables played between signatures, models and the correlation analysis.
  
  We have also modified the title, Abstract, Results and Discussion sections to include more numerical values, and more accurately represent the small to moderate improvements in model performance. Lastly, we added a new paragraph to the Discussion section to more thoroughly discuss sources of uncertainty.
  
  We believe that the manuscript is now much improved and we hope the reviewers find the revised manuscript suitable for publication.
  
  Reviewer #2
  Summary Comment: This is a review of the manuscript “Surface water storage influences streamflow signatures”, authored by Vanderhoof et al. and submitted to Hydrology and Earth System Sciences (HESS). The paper explores the relationship between streamflow signatures and climate, land cover, geology, topography, and surface water storage variables using random forest (RF) models for 72 catchments in the contiguous United States (CONUS). The study addresses an important subject in an important study area and should interest HESS readers. The study has promising results, however, there are methodological issues that should be carefully addressed before publication. Please, find below some suggestions that I hope will improve the overall reliability of the manuscript.
  Response: We really appreciate your thoughtful and thorough review and comments. Addressing them has improved the quality of the manuscript. Please see detailed responses to all specific comments below.
  
  General Comments
  
  Comment: The manuscript is well-written and sound. The introduction is informative of the study’s scope and context; the RF approach is very interesting.
  Response: Thank you!
  
  Comment: The general quality of the figures is good, but some improvements can help enhance the interpretability of the results. Please, refer to specific point-by-point comments.
  Response: We have addressed all specific comments below.
  
  Comment: The title and abstract are not representative of the study. The title expresses well-known information about the relationship between streamflow signatures and surface water storage which is mentioned many times in the introduction. It states a strong conclusion that, in my understanding, is not supported by the results.
  Response: We have changed the title to, “Integrating remotely sensed surface water dynamics in hydrologic signature modelling”
  
  Comment: The abstract could be improved if some numerical information about the findings were provided, such as differences in the explanatory power of the model after the inclusion of the wetlands and inundation variables.
  Response: We have edited the abstract to include more numerical information on the findings.
  
  Comment: If I understand correctly, all of the 72 catchments are pooled together. For instance, the pooling will necessarily pool together information on small and large catchments, and also on catchments with distinct runoff generation mechanisms, and hence, with distinct responses to the explanatory variables. This pooling would result in a large variability in the hydrological signatures, which combined with a quite limited sample size (e.g., 72 samples), could result in biased estimates of the model parameters and large uncertainties. How is this issue addressed in the analysis? Please: (i) provide additional analysis; or (ii) clearly mention this as a potential limitation of the study, which currently is little discussed in the manuscript.
  Response: Yes, we are modeling spatial variability in the hydrologic signatures and therefore the 72 catchments are pooled together. We are aware that watershed size influences runoff responses, therefore we (1) divided signatures by area where necessary to help control for variability in watershed size, and (2) limited the range of watershed sizes included (80% were between 1500 and 5000 km², and nested watersheds were avoided).
  
  Our sample size was primarily limited by the time intensiveness of generating the remotely sensed water variables. The primary objective of the analysis was to incorporate novel, remotely sensed surface water variables reflecting both location (floodplain, non-floodplain) as well as hydroperiod (temporary, seasonal, semi-permanent to permanent). The advantage of utilizing the Sentinel-1 and Sentinel-2 water algorithms, is that they map vegetated water, which is where most of the temporal dynamics or variability occurs, but is excluded from most available surface water products, which typically map only open water. Further, the use of Sentinel-1 allows us to bypass cloud cover and provide more accurate estimates of hydroperiod that do not reflect data gaps due to clouds. However, deriving these variables was extremely time intensive, which limited the number of basins that we could look at. We have added text to the Discussion section to more clearly articulate uncertainty that could be attributable to sample size.
  
  Comment: The use of the random forest (RF) model is very interesting to address the research questions. However, the output from the RF model provided in the manuscript is simply the R-squared and the relative importance of each explanatory variable. This is a very limited result which is not a meaningful interpretation of the hydrological process – especially taking into account all the effort employed in the data curation. I think that the RF is a very useful tool for a first-step analysis, but complementary analysis must be done for a more meaningful interpretation of the results.
  Response: To better be able to compare model performance, in addition to the adjusted R²values and RMSE, we added mean square error (MSE) and Akaike information criterion (AIC). We also now use rfPermute to calculate the significance of the inundation variables, as well as the estimated increase in MSE with their exclusion. We also added a new table (Table 6) to consolidate the statistics related to the inundation variables. This new table includes a (1) summary of the percent change of all 5 model performance comparisons, (2) which inundation variables were selected in the models after using the forward step-wise variable selection process, (3) the statistical significance and MSE for the inundation variables, and (4) which inundation variables were significantly correlated with each of the signatures.
  
  The impact of surface water storage on discharge is known to be subtle and hard to isolate. For example, annual precipitation showed a Spearman correlation value of R=0.75 with fraction of watershed showing seasonal floodplain inundation. This is because precipitation impacts not just discharge, but also surface water dynamics. This means that the climate only model is not an independent model that is able to exclude the influence of surface water. We have added additional text to the Discussion section to further discuss this important challenge. Specifically,
  
  “In cases where variables can be isolated (e.g., basins with tile drainage, compared to basins without tile drainage), significant differences between models can be an appropriate mechanism to help quantify the impact of a variable (Rainio et al. 2024). But both discharge and surface water extent tend to be a function of climate inputs and catchment characteristics (Heimhuber et al., 2016; Vanderhoof et al., 2018). Consequently, our inundation variables were significantly correlated with not only catchment characteristics such as depth to bedrock, slope, and topographic diversity, but also climate variables, including annual precipitation, aridity, and the rainfall and runoff factor (Table A4), with the highest correlation occurring between the amount of seasonal inundation on the floodplain and the watershed rainfall and runoff factor (R=0.80, p<0.01) (Table A4). Because our inundation variables were significantly correlated with select climate variables, the M_Climate could not be considered a null model, relative to M_All, and therefore comparing variables selected, variable significance and importance as well as model improvement using evaluation metrics was seen as most appropriate.”
  
  Comment: The overall improvement in the model's performance due to the inclusion of wetlands and inundation variables is very limited since improvements are up to 10% and in most cases are less than 6%. The manuscript’s results do not corroborate the conclusions, and it is unclear if the results are indeed meaningful from a hydrological process point of view. I think that would be very useful if p-values were provided to understand if these results are indeed significant.
  Response: We agree that improvement in the models was moderate at best. Recognizing the challenge of interpreting whether results are meaningful from a hydrological processes perspective, we have revised the title of the paper to, “Integrating remotely sensed surface water dynamics into hydrologic signature models”. We have also revised the language in the Abstract, Results and Discussion to better represent the findings. We now report multiple model evaluation metrics, and better articulate the weight (or lack of) evidence (e.g., new Table 6).
  
  In the Discussion we have re-focused part of the discussion from contextualizing with the literature in general, to instead, look at the inundation variables selected in the models and explore how hydrologically reasonable those specific variables are. We have also revised the Discussion section to interpret the results more conservatively and re-focused part of the discussion from contextualizing with the literature in general, to instead, look at the inundation variables selected in the models and explore how hydrologically reasonable those specific variables are. We also added a new paragraph to the Discussion to discuss sources of uncertainty.
  
  Regarding the request for p-values, we are assuming that the reviewers are looking for a statistical test comparing whether differences between models are significant. The impact of surface water storage on discharge is known to be subtle and hard to isolate. For example, annual precipitation showed a Spearman correlation value of R=0.75 with fraction of watershed showing seasonal floodplain inundation. This is because precipitation impacts not just discharge, but also surface water dynamics. This means that the climate only model is not an independent model that is able to exclude the influence of surface water and would complicate the interpretation of a significance test. However, we have added the p-values for the individual models, as well as p-values for the inundation variables selected for model inclusion. We have also added additional text to the Discussion section to further discuss this important challenge.
  
  Comment: There are a lot of sources of uncertainties in the methodology related to both the streamflow signatures (see Westerberg and McMillan, 2015) and also due to the modeling approach which are not accounted for in the manuscript. Those uncertainties may be large and hence the results can be misleading. This issue makes the results even less reliable and is timidly discussed in the manuscript. Please, provide uncertainty estimation or mention this as a critical limitation of the study.
  Response: We have added the paragraph below to the Discussion section to better discuss sources of uncertainty, and now reference Westerberg and McMillan (2015):
  
  “4.2 Sources of Uncertainty
  Modelling hydrologic signatures to evaluate the relative influence of drivers on hydrologic responses has many potential sources of uncertainty. Our results, for instance, could depend on the hydrologic signatures included in the analysis (McMillan et al., 2021). It is possible that inundation has a greater or lesser influence on different aspects of the flow regime than those explored here. For the hydrologic signatures considered, such signatures can show substantial uncertainty, attributable to error in precipitation and discharge datasets (Westerberg and McMillan, 2015). To account for uncertainty, the hydrologic signatures were calculated annually, and then averaged across multiple years, while independent variables were averaged over multiple years and across each watershed, both steps that have been shown to reduce uncertainty (Westerberg and McMillan, 2015). Our findings may also depend on the variables included in the analysis. While we included diverse climate and catchment characteristics, it is possible that additional catchment variables, such as data on aquifers (Bloomfield et al., 2021) or additional geologic characteristics, such as proportion sandstone (Carlier et al. 2018), could improve the explanatory power of certain hydrologic signatures, like baseflow index and reduce our model uncertainty. Uncertainty can also be attributable to the watersheds selected (McMillan et al., 2021). While we limited the range of watershed sizes and sampled across diverse regions, we under-sampled certain regions including the northeastern U.S. and mountainous regions, where a high proportion of forest cover and steep slopes, respectively, tend to increase our uncertainty in mapping surface water. Generating the surface water variables was also computationally intensive and limited our feasible sample size, which also likely contributed uncertainty to the modelling effort. Further, while surface water extent was used to represent surface water storage, the two are distinct measurements, and in the future, conversion of surface water (2D) to storage (3D) will facilitate improved modelling of total water distribution. Lastly, uncertainty can be introduced by the statistical modelling approach itself. To minimize modelling-related uncertainty we applied hyper-parameter optimization and variable selection procedures. Random forest models have previously been found to be an effective mechanism to model hydrologic signatures (Trancoso et al., 2016; Addor et al., 2018; Oppel and Schumann, 2020). Further exploration of how inundation impacts diverse components of flow regimes will be an important next step to reduce the uncertainty associated with this effort.”
  
  Comment: I was wondering about the choice of some explanatory variables. For instance, the 30-day annual maximum streamflow signatures are related to the extreme streamflow regime. In this case, it would be interesting to include explanatory variables related to extreme streamflow dynamics. Snow melting is indirectly accounted for by temperature, however, relevant variables such as extreme precipitation and antecedent wetness – based on soil moisture or antecedent precipitation (see Lun et al., 2021; Merz and Blöschl, 2009) are omitted in the model. Also, are the results sensitive to the choice of the 30-day window?
  Response: To better reflect high precipitation we added (1) a Rainfall and Runoff Factor or rainfall intensity, which reflects the amount of rainfall and peak intensity of each storm, and (2) a maximum monthly precipitation variable. Six of the models, including both flashiness models, selected rainfall intensity for inclusion, and one of the models selected maximum monthly precipitation for inclusion. Since snowmelt only occurs in about ½ of the watersheds, we did not include a SWE specific variable, but the influence of SWE is partially reflected in the DAYMET precipitation CV and seasonality variables. Variable selection and models were re-run with the additional potential variables. In earlier exploratory efforts we included both MAX30/area and MAX7/area but our results were very similar between the two variables and so we only retained MAX30/area to improve the manuscript’s readability.
  
  Comment: The discussion provided in the manuscript is very limited and mostly based on a literature review rather than the paper results, which makes the discussion most speculative.
  Response: We have substantially revised the Discussion results to focus more on the paper results as well as sources of uncertainty, and now use literature references to inform/support the analysis, rather than discussing the topic more generally.
  
  Specific Comments
  
  Lines 156-158. It is not clear whether the baseflow index is derived from the UFSF, which is based on model simulations, or whether it was just calculated in a similar fashion. Please clarify this information in the revised manuscript.
  Response: We calculated baseflow index ourselves. We revised the sentence to move the reference to the end of the statement to reduce confusion.
  
  Lines 377-381. Figure 3. It is difficult to follow the results shown in Fig. 3, especially for (c), (d), and (e) frames due to the units used.
  Response: We modified the Figure caption to improve clarity, specifically: “Greater flashiness (a, b), higher peak flows (c, d), and greater flows during low flow periods (e, f) are shown in blue.”
  
  Lines 399-408. Figures 4 and 5. The overall quality of Fig. 4 and 5 should be improved, the overlapping of axis labels with other graphical elements makes the interpretation difficult.
  Response: We moved the axis labels on both Figures 4 and 5 to improve readability.
  
  Lines 163-165. It is not clear which correlation method was used. For instance, in the main text, the Pearson correlation is mentioned, however, in tables and figure captions the Spearman is mentioned. Please clarify this information in the revised manuscript. If the Pearson correlation was indeed used, please check the assumptions for its use. Also, which test is used for the correlation significance? The Pearson/Spearman correlation itself is not a statistical test.
  Response: Pearson was used only to correlate the hydrologic indices over the two different time periods, since these correlations were linear and the hydrologic indices had already been tested for normality. All assumptions were met for using a Pearson correlation. When correlating the independent and dependent variables with each other, we used Spearman correlation with a Bonferroni correction to account for non-normal distributions in the independent variables. Correlation matrices with significance values (p-values) for Pearson correlations and Spearman rank correlations were calculated using the Hmisc package in R. To reduce confusion, we retained the identification of the specific correlation tests used in the Methods but eliminated mention of the statistical test in the Results and Appendix sections. We also now note that Hmisc R package was used to calculate significance.
  
  Lines 308-310. Indeed, the relative variations in hydrologic signature values between the long-term flow records (24 years) compared to the study period (8 years) are quite similar. This would be a pragmatic decision based on Sentinel data availability rather than a “solid justification”.
  Response: We removed this sentence from the text. This section has also been moved from the Results to the Methods, in response to a comment from Reviewer #1.
  
  Technical Comments
  
  Lines 387-390. Table 5. There is a missing parenthesis in the RMSE of leave-one-out cross-validation.
  Response: We have revised the column title to correct this error.
  
  References
  Lun, D., Viglione, A., Bertola, M., Komma, J., Parajka, J., Valent, P., and Blöschl, G.: Characteristics and process controls of statistical flood moments in Europe – a data-based analysis, Hydrology and Earth System Sciences, 25, 5535–5560, https://doi.org/10.5194/hess-25-5535-2021, 2021.
  
  Merz, R., & Blöschl, G.: Process controls on the statistical flood moments - a data based analysis. Hydrological Processes, 23, 675–696, 10.1002/hyp.7168, 2009.
  
  Westerberg, I. K. and McMillan, H. K.: Uncertainty in hydrological signatures, Hydrology and Earth System Sciences, 19, 3951–3968, https://doi.org/10.5194/hess-19-3951-2015, 2015.
  
  Citation: https://doi.org/10.5194/hess-2024-119-AC2

Melanie K. Vanderhoof, Peter Nieuwlandt, Heather E. Golden, Charles R. Lane, Jay R. Christensen, Will Keenan, and Wayana Dolan

Viewed

Total article views: 1,150 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
888	217	45	1,150	39	51

HTML: 888
PDF: 217
XML: 45
Total: 1,150
BibTeX: 39
EndNote: 51

Views and downloads (calculated since 06 May 2024)

Month	HTML	PDF	XML	Total
May 2024	187	64	9	260
Jun 2024	31	10	2	43
Jul 2024	77	17	9	103
Aug 2024	40	16	6	62
Sep 2024	17	5	0	22
Oct 2024	9	2	0	11
Nov 2024	14	4	0	18
Dec 2024	14	7	2	23
Jan 2025	8	7	2	17
Feb 2025	27	4	1	32
Mar 2025	11	8	2	21
Apr 2025	6	12	1	19
May 2025	16	3	3	22
Jun 2025	19	12	1	32
Jul 2025	31	6	2	39
Aug 2025	70	7	2	79
Sep 2025	295	10	2	307
Oct 2025	16	23	1	40

Cumulative views and downloads (calculated since 06 May 2024)

Month	HTML	PDF	XML	Total
May 2024	187	64	9	260
Jun 2024	31	10	2	43
Jul 2024	77	17	9	103
Aug 2024	40	16	6	62
Sep 2024	17	5	0	22
Oct 2024	9	2	0	11
Nov 2024	14	4	0	18
Dec 2024	14	7	2	23
Jan 2025	8	7	2	17
Feb 2025	27	4	1	32
Mar 2025	11	8	2	21
Apr 2025	6	12	1	19
May 2025	16	3	3	22
Jun 2025	19	12	1	32
Jul 2025	31	6	2	39
Aug 2025	70	7	2	79
Sep 2025	295	10	2	307
Oct 2025	16	23	1	40

Viewed (geographical distribution)

Total article views: 1,139 (including HTML, PDF, and XML) Thereof 1,139 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 23 Oct 2025

Short summary

Streamflow signatures can help characterize a watershed’s response to rainfall and snowmelt events. We explored if surface water storage-related variables, which are typically excluded from streamflow signature analyses, may help explain the variability in streamflow signatures. We found that remotely sensed surface water storage watershed location and hydroperiod were correlated with or explained a portion of the variability in hydrologic signatures across 72 streamflow gages.


Total:	0
HTML:	0
PDF:	0
XML:	0

Surface water storage influences streamflow signatures

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.