hess-2021-36

This manuscript presents a method to automatically measure water level in streams using NIR-cameras and image processing techniques. The paper is generally well written and the results are promising. However, I do think that the paper should be improved before it can be considered for publication in HESS. First, the structure of the paper is somewhat unbalanced. The introduction is relatively extensive, making the results and discussion section seem rather marginal. I think the latter section would benefit from a stronger and more elaborated evaluation of the presented method, and include a more detailed outlook to future work and potential applications. Second, the data availability statement is not in line with HESS policy. This should be updated before the manuscript can be considered. Finally, the paper would benefit from additional support for and clarification about the setup and choices made, detailed in the general and specific comments below.


Introduction
Another reason why ephemeral streams are so relevant is perhaps that the onset of flow may result in the mobilization of (anthropogenic) debris and sediment as well?
The link with citizen science makes sense, as this offers an unprecedented opportunity for upscaling of data collection. However, how would this work for the locations of interested in this manuscript, i.e. ungauged headwater catchments? These may not be the locations where many citizens may be available to contribute with data collection. The introduction in general is well-written. I do think it is a bit long and goes on a tangent here and there. Perhaps the authors can reduce the length a bit and focus more on the potential of their approach, and why this is a promising addition to the existing suite of monitoring techniques.

Methods
Perhaps a sketch of the monitoring setup can be included in addition to Fig. 1. What is the motivation for taking images every 30 minutes? What is the relevant timescale for ephemeral streams? I'd argue that a single to a couple of images per day would suffice, drastically reducing the required storage. With the current setup someone needs to read out the data every two weeks, which I would personally find quite much for ungauged headwater catchments. 1: I find this figure a bit unclear. Perhaps some additional headings to complete the workflow makes it a bit clearer. Please include some more details about the setup. How long is the pole? How is the pole robustly placed in what looks to be a rather "wild" environment? What is the distance between the pole and the camera? How is the camera fixed? What is the estimated pixel length (mm, cm)? Maybe a overview map can be included to show the outdoor testing locations. For the data validation, was the water level identification done by the same person? Or by a group of people? If the latter, was there any bias between the observers? Also, I was wondering if there was a reason to not measure the water level with an accurate water level logger.

Results and discussion
Why was Test A done with the same water level for each image? As this method is most valuable to detect changes in water level, would it not have been valuable to test the method for the full ranges of values? The method seems to work quite well for Test C, which includes quite some dynamic behaviour. For Test D and E, the dynamics seem not to be captured completely. Can the authors elaborate on this, including the implications for what that would mean for long-term monitoring? The discussion is rather limited. I would encourage the authors to include a critical synthesis and more elaborated outlook on future work. What are the next steps for this method? How do the authors envision application in the field? Only for measurements of a couple of days, or also for seasonal or even multi-year monitoring efforts? When reading the paper I partially get very enthusiastic about this method, because it offers a nice new method for automatic monitoring. On the other hand, I keep on wondering what the added value of this method is over a traditional water level logger with millimetre accuracy, at more or less the same price. Such sensors are very robust, don't need frames, and additional constructions, have a very long battery life (weeks, months), and don't need any further processing. What I also wonder is whether this approach may be expanded with detection and monitoring of (anthropogenic) debris, such as woody debris, plastic pollution, or otherwise (van Lieshout et al., 2020). Then there's a clear added value over more traditional sensing equipment.

Conclusions
In the conclusions the authors sate that their method allows for "supervising the stream area and banks". This is not elaborated on in the paper, so I suggest to either omit this statement or actually provide some additional analyses to support this in the paper.

Data and code availability
The data availability is not in line with HESS policy: https://www.hydrology-and-earthsystem-sciences.net/policies/data_policy.html. I would strongly suggest to make the data openly available through one a repository. And otherwise follow HESS' policy to include a statement on why there are not available ("if the data are not publicly accessible, a detailed explanation of why this is the case is required").

Specific comments:
Line 18-21: Maybe omit some references, seems a bit much. Line 26-48: Useful summary of other techniques and drawbacks, but can maybe be written more concisely. Line 85: Although not "purely hydrological", van Lieshout et al. (2020) recently demonstrated the potential of using cameras and deep learning for automatic plastic monitoring in rivers. Quite some lessons learned and practical considerations may be relevant for this manuscript as well. Line 122: How is the ROI automatically trimmed around it? Line 137: What moving average is used? E.g. how many datapoints? How does the length of the window influence the accuracy? Line 138: Is the 90% quantile based on the entire dataseries? Or a subset (e.g. without outliers)?