Reply on RC2

"This study focuses on the factors that explain whether a precipitation event results in a streamflow response in intermittent streams or not. It uses an amazing dataset of flow observations for many different sites with intermittent flow in a catchment in Luxembourg. Because streamflow intermittency depends on geology, the analyses are done for sites and catchments underlain by three different geologies. The authors do not only compare the mean values of the event characteristics for the events that led to streamflow and those that didn’t, but also analyse the relative importance of these factors in a random forest model to predict the occurrence of flow or no flow. The work is very interesting, novel, and based on an incredibly large dataset. I find the methods robust and the analyses solid (though I am not an expert on random forest modeling). However, the introduction and particularly the discussion section of the manuscript are not strong and the writing could be improved. The work is not put in a larger perspective and instead the specific results for the specific sites or catchments are discussed. More discussion on how this work fits into the limited but rapidly expanding literature on dynamic stream networks and streamflow in intermittent streams would significantly strengthen the manuscript."

"This study focuses on the factors that explain whether a precipitation event results in a streamflow response in intermittent streams or not. It uses an amazing dataset of flow observations for many different sites with intermittent flow in a catchment in Luxembourg. Because streamflow intermittency depends on geology, the analyses are done for sites and catchments underlain by three different geologies. The authors do not only compare the mean values of the event characteristics for the events that led to streamflow and those that didn't, but also analyse the relative importance of these factors in a random forest model to predict the occurrence of flow or no flow. The work is very interesting, novel, and based on an incredibly large dataset. I find the methods robust and the analyses solid (though I am not an expert on random forest modeling). However, the introduction and particularly the discussion section of the manuscript are not strong and the writing could be improved. The work is not put in a larger perspective and instead the specific results for the specific sites or catchments are discussed. More discussion on how this work fits into the limited but rapidly expanding literature on dynamic stream networks and streamflow in intermittent streams would significantly strengthen the manuscript." We appreciate your helpful and detailed review of our manuscript. We are happy to see your positive evaluation of our study design, data and results. We fully agree with your suggestion that our manuscript could be significantly improved by discussing our findings in a broader context of intermittent streamflow literature. We will reply to your questions below. Please find your questions and comments marked as e.g. << R2.C1: question/comment>> followed by our answer marked as e.g. R2.A1: below. We will improve the manuscript in those sections where your questions have been raised as well as based on the very detailed and helpful manuscript annotations by you and the second reviewer.
"My main comments are provided below. Additional comments and suggestions are provided in the annotated pdf." << R2C1.1 The streamflow and soil moisture data were collected at an impressive number of sites but it is unclear how they were selected and how they are related. >> R2A1.1: The study for this manuscript was embedded in a larger research project "Catchments As Organized Systems" (German Research Foundation (DFG), Research Unit FOR 1598). All research was carried out in the Attert catchment The sites were chosen for the best possible representation of the combined land use and geology at a variety of slope gradients, expositions (North, South) and position on the slope (top, mid, valley). We will include more information on this in the revised manuscript. The procedure of soil moisture calculation is described on page 6, line 17 of the manuscript, but needs a clarification that the averaging of soil moisture values per geology was carried out for each sensor depth (10cm, 30cm, 50cm).
The soil moisture values were first normalized for each sensor. After normalization all data from sensors in a given depths were averaged for each geology. Thus, the soil moisture dynamics are represented by a geological average per depth and not a catchment average. Some sites are referred to as logging tracks. This was confusing to me. What are they? Ditches along roads that occasionally flow? Or erosion channels that have occasional flow? Or are these real streams? >> R2A2.1: The sites which are referred to as logging tracks are indeed erosion channels (on forest roads) that occasionally flow. We will state this more clearly in the revised manuscript.
<< R2C2.2 What is your definition of a stream? >> R2A2.2: In our study a stream is defined as all topographic features in the landscape where longitudinal runoff occurs in a defined channel. Defined channels include erosion channels, ditches along roads and drainage ditches in agricultural areas / forest but also natural streams. We will include this definition in the revised manuscript.

<< R2C2.3
It would be useful if this was clarified in the text a bit more and if it was more clearly indicated which sites are ditches or roads or logging tracks and which are stream channels. >> R2A2.3: Thank you for this suggestion. This data will be provided in the revised manuscript in form of a table including geology, stream channel and number of streams in the corresponding geology.
<< R2C3.1 Some basic information on the study catchment is missing. What is the drainage density? >> R2A3.1: As we know from the intermittency research, the drainage density is not constant but is highly dynamic over time. The research of Kaplan et al., 2020a demonstrated that also in this area the drainage network expands and contracts significantly over time. I also showed the large impact of geology on the drainage network length in the Attert catchment. Thus, a description of drainage density with a single number for one catchment is 1) not easy to answer and 2) maybe not an adequate measure to describe the hydrological situation in the catchment. The use of drainage density may need to be generally rethought within the hydrological community towards the use of maximum, minimum and average drainage density in a hydrological functional unit which is not necessarily always bounded by the catchment boundaries but can also be defined by e.g. geological units. Nevertheless, we will provide the drainage density of the mapped streams in the topographic map.

<< R2C3.2
What is the average soil depth? >> R2A3.2: We are trying to include information on soil depth in the revised manuscript. Again, similar to drainage density, average soil depth varies between the geologies but even stronger among the slope positions and forms (convex, concave). Thus, we can provide a rough estimate of the soil depths close to the sites included in this study in the revised manuscript.

<< R2C3.3
Also what data was used to determine the catchment boundaries and the stream network that is shown in the maps? >> R2A3.3: The catchment boundaries were determined by a 15m digital elevation model and the SAGA GIS tool "flow accumulation recursive" as described in Kaplan et al., 2020a. We will include this information in the revised manuscript. The limitations of the coarse DEM resolution to delineate the catchment boundaries are discussed in the paper from Kaplan et al. (2020a). However, as the spatial data plays a minor role in this paper no further discussion on this is planned for the revised manuscript.

<<R2C4.1
The different measures that represent the antecedent wetness conditions (soil moisture at 10 and 50 cm and API7 and API14) are probably highly correlated. I understand that the random forest approach is not very sensitive to correlation between predictor variables but still it would be good for the reader to know how well these different predictors are correlated. Can you include a plot or table that shows the correlation between all of the predictors? >> R2A4.1: Yes, we will include a plot of the predictor correlation in the supplementary material and add a sentence in the main manuscript.
<<R2C5.1 P12L10: Did you do the t-tests for each site individually or as Figure 4 shows grouped per geology? This unclear from the text. Also were they one-sided or two-sided? >> R2A5.1: Each predictor variable was t-tested two-sided and grouped per geology. We will include this information in the revised manuscript.
<<R2C6.1 P17L3: The models for sites with an unbalanced number of flow and no-flow events are not acceptable. What does this tell us? Does it just mean that we can't use the random forest approach in these cases or that we need more data for these cases, or does it mean that other predictor variables need to be used? A bit more information and discussion on this would be very helpful. >> R2A6.1: Class imbalance is generally a major problem for all statistical models as soon as the sample size is too small to adequately represent all classes. The resampling methods applied in this study help to overcome the problem of class imbalance. However, in cases where e.g. the flow-class only has one or two datapoints in the sample, this flow-events are not described in a statistically adequate manner and this cannot be overcome by the resampling but by a longer time series with a higher sample size of events. Thus, the problem of class imbalance and misrepresentation of a class cannot be resolved by including other predictors, but only by gathering more data. We will add the above information in a short paragraph in the discussion of the revised manuscript.
<<R2C7.1: The discussion section tries to infer runoff generation mechanisms for the catchments on the three different geologies. However, it doesn't really focus on the intermittent stream dynamics themselves, nor how this work improves our understanding of what drives streamflow responses in intermittent streams (P1L27), which is the main topic of the manuscript and the research questions. There is almost no comparison of the results with other studies on intermittent streamflow responses (e.g., Jenssen et al, Zimmer et al., Durighetto et al., Warix et al., Hale and Godsey, etc.). Instead, the inferences and discussions are all related to other studies in this catchment. As such, it is a narrow discussion that is focused on this catchment and doesn't go "beyond" this study catchment. I suggest that the authors significantly revise the discussion and focus much more on comparisons with other studies on intermittent streams and what these results mean for people who want to understand or predict which reaches are likely to respond to certain events and which don't. For example, after reading the manuscript, it is still not clear to me if other studies found similar predictor variables to be most important or if all of these results are unique. This is a petty because the dataset is very novel and the manuscript could have a much larger impact. >> R2A7.1: We agree with both referees that the missing discussion in the broader context of other studies of the intermittent stream research and the focus of the catchment is a major shortcoming and needs to be changed in the revised manuscript. In this context we will also check the potential of additional statistical approaches to supplement the existing results.
<<R2C8.1: The section on the uncertainties in the classification of events etc is good but there is no discussion on how good the random forest models are. It is clear that the poor models were excluded from the analyses of the most important predictors but are the models that you included good or just barely OK? >> R2A8.1: We will include a more detailed analysis on this in the revised manuscript. Figure  7 provides already a first impression of how good the models work with the test-data sample. A classification into fair, good and very good models will help to get a better feeling for the overall suitability of the random forest model approach.
<<R2C8.2: Would this approach allow you to predict the streamflow response at a site? If so, how good or bad would that prediction be?>> R2A8.2: This is procedure is actually carried out in this study with the split of the dataset at each site into model and test data. The evaluation criteria are based on the accuracy (Sensitivity/Specificity) to adequately predict the two classes flow or no-flow. << R2C8.3: If you used a model that was created for one site for another site on the same geology, would that lead to a reasonable prediction of the occurrence of a flow response? I realize that this study is not about modeling the streamflow responses per se but some discussion of how good or poor the models are that you use to determine the most important predictors -and thus how good these predictors are -would be very useful. >> R2A8.3: This point will be also discussed in the revised manuscript. Data driven statistical models are usually not easily transferable to another site, even in the same geology. This could be tested by using the sites of the Hei-catchment -500m apart but very different response behavior. Thus, the statistical model is a tool to identify the controls at a site and derive the general patterns of these controls for the geology.
<< R2C9.1: The writing and organization of the text can be improved. The order of several paragraphs is not very logical as important information comes too late. Also several sentences are either too long or more often are written too cryptic with articles and commas missing. The results should be written in the past tense as these events occurred in the past. Furthermore, it would be good to use more consistent terminology (runoff or streamflow, not both for the same thing, reactions or responses, etc.). I am attaching a marked-up pdf. I am not asking the authors to implement all of these suggestions but hope that they can use this as an indication on how and especially where to improve the writing and structure of the manuscript. The results sections 4.1.1 (page 12) and 4.1.2 (page 14) could probably be shortened as most of these values could be shown in a table as well.>> R2A9.1: We will revise the manuscript throughout and with the best possible inclusion of the marked-up recommendations in your pdf file. We will also shorten the sections on page 12 and 14 as suggested. Thank you again for providing us with these detailed suggestions.