Using LSTM to monitor continuous discharge indirectly with electrical conductivity observations

Chang, Yong; Mewes, Benjamin; Hartmann, Andreas

doi:https://doi.org/10.5194/hess-2022-77

Preprints

https://doi.org/10.5194/hess-2022-77

Preprints

21 Mar 2022

| 21 Mar 2022

Status: this preprint was under review for the journal HESS but the revision was not accepted.

Using LSTM to monitor continuous discharge indirectly with electrical conductivity observations

Yong Chang, Benjamin Mewes, and Andreas Hartmann

Abstract. Due to EC’s easy recordability and the existence of a strong correlation between EC (electrical conductivity) and discharge in certain catchments, EC is a potential predictor of discharge. This potential has not yet to be widely addressed. In this paper, we investigate the feasibility of using EC as a proxy for long-term discharge monitoring in a small karst catchment where EC always shows a negative correlation with the spring discharge. Given their complex relationship, a special machine learning architecture, LSTM (Long Short Term Memory), was used to handle the mapping from EC to discharge. LSTM results indicate that the spring discharge can be predicted well with EC, particularly in storms when the dilution dominates the EC dynamic; however, the prediction may have relatively large uncertainties in the small or middle recharge events. A small number of discharge observations are sufficient to obtain a robust LSTM for the long-term discharge prediction from EC, indicating the practicality of recording EC in ungauged catchments for indirect discharge monitoring. Our study also highlights that the random or fixed-interval discharge measurement strategy, which covers various climate conditions, is more informative for LSTM to give robust predictions than other strategies. While our study is implemented in a karst catchment, the method may be also suitable for non-karst catchments where there is a strong correlation between EC and discharge.

Received: 22 Feb 2022 – Discussion started: 21 Mar 2022

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Yong Chang, Benjamin Mewes, and Andreas Hartmann

Status: closed

RC1:
'Comment on hess-2022-77', Anonymous Referee #1, 16 Apr 2022

Chang bet al. present the application of a statistical approach (LSTM) to determine discharge of rainfall event runoff from instream EC measurements.

MAJOR

Title/Premise: The authors present the application of a statistical approach (LSTM) to calculate discharge during rainfall events from EC observations. The title (Using LSTM to monitor continuous discharge indirectly with electrical conductivity observations) might perhaps mislead the reader, as certain time periods (low flow, initial runoff) are clearly excluded from the analysis. A more fitting title would be: Using LSTM to monitor STORMFLOW DISCHARGE indirectly with EC observations.

The performance of a model using EC only is compared to models using both EC and P and only P. It might be interesting to compare the selected model to a more simple approach, to really highlight the added value of a more complex model.

20: In your abstract in line 20 you write that in your spring EC always has a negative correlation with spring discharge. However, in line 126-130 you mention that there is occasionally a positive correlation (EC peak at the initial runoff).

23-25: “LSTM results indicate that the spring discharge can be predicted well with EC, particularly in storms when the dilution dominates the EC dynamic; however, the prediction may have relatively large uncertainties in the small or middle recharge events.” It seems the findings of your study do not support this conclusion at all. As I understood, spring discharge could ONLY be predicted well for large storm events; there are large uncertainties when it comes to intermediate and small events and it was not possible at all to use EC for the estimation of baseflow/low flow. So, one might conclude that overall spring discharge can actually not be predicted well.

130: It is unclear why a there is a need to correct the maximum EC values in 2017 to match them with 2018 and 2019. Please elaborate why the maximum EC should be the same in all years.

130: You corrected for drift of the sensor by subtracting 23µS/cm. Please elaborate why you choose this specific value. Also: A simple subtraction of measured EC does not adequately account for gradual drift.

424: You elaborate that the EC dynamics of the investigated spring are relatively simple without temporal EC peaks at the beginning of storms. However, in line 126-130 you describe that you found indeed initial EC peaks at the beginning of storm events in your 2018 and 2019 data and you state that you excluded these observations from your analysis.

426: To my knowledge, the cited paper of Hess and White (1993) does not give any reference to “piston flow”, it doesn’t mention the words ‘piston flow’

MINOR

83 –geographical coordinates of the spring might be useful

83-91 citation might be useful

Figure 1a: labels in map are too small to read

120 -121: “the spring`s EC dynamic is MAINLY controlled by the rock dissolution and the dilution from the low-EC event water during storms.” – what other minor influencing factors are there?

133: wrong unit: 23us/cm -> 23µs/cm

170: “LSTM belongs to a special kind of recurrent neural network” – I suggest different wording

253: “The performances of MP and MECP deteriorate obviously probably due to …” – obviously or probably, which one is it?

Figure 3e: red line in legend is missing

285: wording: middle -> intermediate

Citation: https://doi.org/10.5194/hess-2022-77-RC1
- AC1:
  'Reply on RC1', Yong Chang, 16 Jul 2022
  Thank you very much for the valuable comments.
  
  MAJOR
  Title/Premise: The authors present the application of a statistical approach (LSTM) to calculate discharge during rainfall events from EC observations. The title (Using LSTM to monitor continuous discharge indirectly with electrical conductivity observations) might perhaps mislead the reader, as certain time periods (low flow, initial runoff) are clearly excluded from the analysis. A more fitting title would be: Using LSTM to monitor STORMFLOW DISCHARGE indirectly with EC observations.
  
  Response: Thank you. This is an excellent suggestion. We will change the paper title accordingly.
  
  The performance of a model using EC only is compared to models using both EC and P and only P. It might be interesting to compare the selected model to a more simple approach, to really highlight the added value of a more complex model.
  
  Response: In the manuscript, we have compared the LSTM model to a simple linear regression model (see lines 189-191 and figure 3b). The regression model shows much worse performance than M_EC.
  
  20: In your abstract in line 20 you write that in your spring EC always has a negative correlation with spring discharge. However, in line 126-130 you mention that there is occasionally a positive correlation (EC peak at the initial runoff).
  
  Response: Thanks. We will revise line 20. The spring EC always has a negative correlation with spring discharge in most times.
  
  23-25: “LSTM results indicate that the spring discharge can be predicted well with EC, particularly in storms when the dilution dominates the EC dynamic; however, the prediction may have relatively large uncertainties in the small or middle recharge events.” It seems the findings of your study do not support this conclusion at all. As I understood, spring discharge could ONLY be predicted well for large storm events; there are large uncertainties when it comes to intermediate and small events and it was not possible at all to use EC for the estimation of baseflow/low flow. So, one might conclude that overall spring discharge can actually not be predicted well.
  
  Response: We will further revise the sentences. Actually, most spring discharges including the baseflow under large storms can be well predicted by EC. This can be seen in Fig.4 that M_EC has a large NSE value in storm events. In the manuscript, we define the discharge in the storm event as the period from the end of the last recharge events to the beginning of the next recharge events. It is also possible to predict the discharge under intermediate events but with a large uncertainty compared to the predictions under storm events. It is true that it was not possible to use EC for the estimation of low flow since a low correlation between EC and discharge in the small recharge events. Although this drawback, our approach is still promising because the discharge dynamic under storms or intermediate recharge events is the key information for flood management or understanding the behavior of hydrological systems combined with other hydrochemical indices.
  
  130: It is unclear why a there is a need to correct the maximum EC values in 2017 to match them with 2018 and 2019. Please elaborate why the maximum EC should be the same in all years.
  
  Response: The hydrochemistry of the studies spring is mainly controlled by the dissolution of carbonate rocks. The maximum EC of the spring water corresponds to the calcium carbonate equilibria. Meanwhile, the spring locates in the phreatic zone and its most hydrochemical indices, such as temperature, pH and EC,basically do not show obvious seasonal variation according to the previous monitoring. Therefore, the maximum EC of this spring is always relatively stable in different years. Given two different data loggers were used to monitor EC in 2017 and the other two years, it is reasonable to assert that the discrepancy in maximum EC between 2017 and the other two years is mainly caused by the instrumental drift.
  
  We will add these clarifications in the revised version.
  
  130: You corrected for drift of the sensor by subtracting 23µS/cm. Please elaborate why you choose this specific value. Also: A simple subtraction of measured EC does not adequately account for gradual drift.
  
  Response: The selection of 23 µS/cm is based on the assumption that the maximum EC value in the 2017 is same to that in the other two years. Actually, although the maximum EC of spring water is relatively stable without an obvious seasonal variation at the study site, the maximum EC value may still have a slight variation. In the revised manuscript, we will add another plot to show the variation of the simulation result of M_EC with the different EC adjustment values in test period 1 to further illustrate the uncertainty caused by this drift adjustment.
  
  424: You elaborate that the EC dynamics of the investigated spring are relatively simple without temporal EC peaks at the beginning of storms. However, in line 126-130 you describe that you found indeed initial EC peaks at the beginning of storm events in your 2018 and 2019 data and you state that you excluded these observations from your analysis.
  
  Response: I will further revise the sentence.
  
  426: To my knowledge, the cited paper of Hess and White (1993) does not give any reference to “piston flow”, it doesn’t mention the words ‘piston flow’
  
  Response: Thanks. There is a mistake here. The cited reference is a paper published also by Hess and White (1988) in which they found the phenomenon that the spring EC may rise firstly before beginning to drop during the storm. The ‘piston effect’ was named by Goldscheider and Drew (2007) in the book <Methods in Karst Hydrogeology>. We will add this reference in the revised version.
  
  MINOR
  
  83 –geographical coordinates of the spring might be useful
  Response: We will add the geographical coordinate.
  
  83-91 citation might be useful
  Response: We will add the relevant references.
  
  Figure 1a: labels in map are too small to read
  Response: We will increase the label size accordingly.
  
  120 -121: “the spring`s EC dynamic is MAINLY controlled by the rock dissolution and the dilution from the low-EC event water during storms.” – what other minor influencing factors are there?
  Response: The spring EC may also slightly influenced by the concentration variation of some other irons during storms, such as K⁺, Na⁺, Cl^-, and SO₄^2-.
  
  133: wrong unit: 23us/cm -> 23µs/cm
  Response: Thanks, we will revise the unit.
  
  170: “LSTM belongs to a special kind of recurrent neural network” – I suggest different wording
  Response: Revise to ‘LSTM is a special recurrent neural network’.
  
  253: “The performances of MP and MECP deteriorate obviously probably due to …” – obviously or probably, which one is it?
  Response: Revise the sentence to ‘The performances of MP and MECP show an obvious deterioration which is probably due to…’.
  
  Figure 3e: red line in legend is missing
  Response: Thanks, we will update the legend.
  
  285: wording: middle -> intermediate
  Response: Thank you. We will change ‘middle’ to ‘intermediate’ in the revised version.
  
  Reference:
  Hess, J. W., & White, W. B. (1988). Storm response of the karstic carbonate aquifer of southcentral Kentucky. Journal of Hydrology, 99(3), 235–252. https://doi.org/10.1016/0022-1694(88)90051-0
  Goldscheider, N., & Drew, D. (2007). Methods in karst hydrogeology: IAH: International contributions to hydrogeology (Vol. 26). CRC Press.
  
  Citation: https://doi.org/10.5194/hess-2022-77-AC1
RC2:
'Comment on hess-2022-77', Anonymous Referee #2, 18 Jun 2022

Yong Chang et al. present a study on estimating hourly discharge in a small 1 km² karst catchment from precipitation and EC measurements using a LSTM. They set up three different LSTMs based on EC, precipitation and both signals together. Moreover, they explore the performance of other versions of these models with a reduced amount of provided training data. The topic of the study is an interesting contribution to the field, since the added value of EC measurements with gauge levels is indeed underexplored. Also the question about gauging strategies to build a rating curve is of interest. However, I see a couple of severe issues with the study which are in conflict with the strong claims raised and which require to be resolved before final publication.

Major Points

If I understand correctly, the only true estimate of discharge from EC is done with the M_EC model. Given the claims of the title, abstract and introduction, I would not expect precipitation as further variable. L77ff. again precipitation is not mentioned but the use of EC as a proxy. I think the rest of the paper does not really follow this line.

The models including precipitation input are directly predicting discharge. Hence a simple hydrological model and not a linear regression should be the benchmark for these models. Given the situation that the models including precipitation input perform worst in the 2nd evaluation period and otherwise in training and the 1st evaluation period, this raises concerns about what the LSTM actually learned during training. Apparently the temporal patterns of discharge in 2017 and 2019 are more similar than in 2018. What would happen if the model was trained in a different period? Why do the authors expect that the LSTM got sufficient data, when it obviously fails for the test period 2?

Why do the authors use a mean squared error as objective function (L200) instead of a more specific or several complementary evaluation functions?

Using the NSE for evaluation has the known shortcomings and tendency to high values with seasonal climate (Schaefli and Gupta 2007). Given the monsoon climate in the study region, a NSE >0.5 in the evaluation period should not at all be surprising or convincing. Given the adaptability of a LSTM a NSE near 1 should be expected during training. A NSE<0 refers to predictions worse than the mean value. Hence I would expect that the authors would not show arbitrary y-axes limits but to give clear guidance that the performance is not really impressive. Moreover, I would expect further performance measures like KGE, Spearman rank correlation etc.

If I understood correctly, the LSTM is allowed to receive forecasted EC values. I wonder if this is a fair comparison if P is only given in hindcast. If P and EC measurements could be used as proxy measurements, why should I bother about not using forecasted P too? How did the authors assess the chosen time window? I was also unable to identify the m-parameter defining this window. Moreover, I did not really understand the selection of a 7 h time delay factor (L192) since the LSTM should well be capable to learn this.

The authors rightfully expose discharge as central hydrological variable (L36f). But if I would replace this measurement with a model, why should I still be at least somewhat confident about my water balance to be met? Why should I use precipitation as a further explanatory variable to predict discharge if I then would use discharge and precipitation to estimate further characteristics? This fundamentally opens the gates for spurious correlation ill-posing the matter of measuring discharge in the first place.

Given these questions, I am under the impression that the second part of the analyses with different subsets of training data is actually highly case specific. This does not only relate to the selected arrangement of training period, objective function and evaluation procedure. It also refers to the system under study: 1) The authors already modified the EC data (L128ff.). 2) A Karst system should rather directly relate to fill-and-spill dynamics (McDonnell et al. 2020), which are a perfect learning case for LSTMs rarely met in other hydrological systems. 3) The catchment is very small (1 km²). Hence, I would be very cautious about the capabilities to perform this kind of analysis and the strong claims interpreted from the results. In the current form, I would not really agree that the findings are sufficiently supported.

Minor Points (only points in addition to the major ones are listed)

Title: I find the title not really in line with the content of the paper.

L21: What complex relationship? What special ML architecture? This is far too fuzzy.

L25: I did not spot any assessment of uncertainties. I guess you refer to the overall model performance evaluation.

L39f: depth? water level!; defined relationship? rating curve! Why omitting the established terminology?

Fig 1: I do not really get anything from the maps a and b. Map c is difficult to interpret.

L106: what is a combination of rectangular weirs? Do you have a rating curve for the weirs or is the discharge merely calculated with an empirical weir function? How is the gauge measured? Which uncertainty would you expect?

L108f: I suspect a Onset U24? Why do you report 15 min resolution if later on hourly data is used?

L124: What is unsaturated fast flow?

Fig 2b: Why are the side panels in reverse order and without annotated marks in the main panel?Why is the linear model used as reference not plotted? Why is (again) a different correlation measure used?

L148f: I guess you refer to discharge events (not rain events)?

L155f: A strong relationship? I would not claim a correlation of -0.51 to be specifically strong. Hence the relationship might be somewhat tangible there and is not found when plotting EC to Q for lower discharge.

Sec 3.1: Why dont you calrify your strategy with the three models M_EC, M_P and M_ECP upfront?

L192: What is really meant with the 7h forward shifting?

L206: Why do you report the NSE equation. Not needed. Better add further evaluation estimates.

L263: The benchmark is the linear regression which is slightly better than a pure mean value…

L265: See major points about the NSE and the expectations for an LSTM. Avoid normative claims. Certainly they do not expose excellent capability…

Fig 3: Caption reports Fig 2 instead of 3.

L276: Again, how do you support the claim? Test period 2 obviously fails and it is not analysed if this is due to the lack of precip data. Actually I do not expect that this is the case if the evaluation without OBGD remains that low.

Fig 5: Why do you show Nash values below -1?

[I have not recorded further minor points after L303 since I expect this to require substantial workover anyways.]

Code and data availability: Come on! We are in 2022! I find it absolutely necessary that we do not have to beg for seeing what is under the hood. HESS data and code policy is rather clear about this. I find it as an obligation for the authors to provide their data and code - especially for a study like yours which is merely applying a Keras LSTM so a very limited data set.

——

McDonnell, J. J., Spence, C., Karran, D. J., Meerveld, H. J. (Ilja) van, and Harman, C. J.: Fill-and-Spill: A Process Description of Runoff Generation at the Scale of the Beholder, Water Resour Res, 57, https://doi.org/10.1029/2020wr027514, 2021.

Schaefli, B. and Gupta, H. V.: Do Nash values have value?, Hydrol Process, 21, 2075–2080, https://doi.org/10.1002/hyp.6825, 2007.

Citation: https://doi.org/10.5194/hess-2022-77-RC2
- AC2: 'Reply on RC2', Yong Chang, 16 Jul 2022
  
  Thank you very much for the valuable comments.
  
  Yong Chang et al. present a study on estimating hourly discharge in a small 1 km² karst catchment from precipitation and EC measurements using a LSTM. They set up three different LSTMs based on EC, precipitation and both signals together. Moreover, they explore the performance of other versions of these models with a reduced amount of provided training data. The topic of the study is an interesting contribution to the field, since the added value of EC measurements with gauge levels is indeed underexplored. Also the question about gauging strategies to build a rating curve is of interest. However, I see a couple of severe issues with the study which are in conflict with the strong claims raised and which require to be resolved before final publication.
  
  Major Points
  1) If I understand correctly, the only true estimate of discharge from EC is done with the M_EC model. Given the claims of the title, abstract and introduction, I would not expect precipitation as further variable. L77ff. again precipitation is not mentioned but the use of EC as a proxy. I think the rest of the paper does not really follow this line.
  Response: Thank you for this remark. The prediction of discharge by precipitation was just used as a comparison to the prediction result by EC. To avoid confusion, the simulation result of model M_ECP (using precipitation and EC to predict discharge) will be deleted in the revised manuscript, since it does not provide any effective information in the paper.
  
  2) The models including precipitation input are directly predicting discharge. Hence a simple hydrological model and not a linear regression should be the benchmark for these models. Given the situation that the models including precipitation input perform worst in the 2nd evaluation period and otherwise in training and the 1st evaluation period, this raises concerns about what the LSTM actually learned during training. Apparently the temporal patterns of discharge in 2017 and 2019 are more similar than in 2018. What would happen if the model was trained in a different period? Why do the authors expect that the LSTM got sufficient data, when it obviously fails for the test period 2?
  Response: We used the linear regression model was used as a benchmark model for M_EC since currently there is yet no hydrological model that can predict the discharge using EC. For the model M_p, which uses prediction to predict discharge, we do not apply any benchmarking because this model is just used as a comparison to M_EC. We will revise the sentence in lines 189-191.
  The models including precipitation, like M_p and M_ECP (will be removed in the revised manuscript), has worse performance in the second test period due to a large error of precipitation data (OBGD). Whereas, the performance of model M_EC is not severely influenced by the existence of OBGD (see Fig. 3b) because this mode just uses EC to predict discharge. That is, M_EC does not fail to predict discharge in the test period 2. This also indicates the advantage of M_EC to predict discharge over M_p in mountainous catchments where precipitation has a strong spatial variability. A sparse rain gage network would bring large precipitation uncertainty and bad discharge predictions by M_p.
  Since the LSTM is a pure data-driven model, it may have a weak extrapolation ability. Therefore, when the LSTM was used for the discharge prediction by EC, we should collect EC-discharge data under a variety of rainfall conditions.
  
  3) Why do the authors use a mean squared error as objective function (L200) instead of a more specific or several complementary evaluation functions?
  Response: We used the mean squared error as it is a widely-used objective function in many machine learning works, see (Campolo et al., 1999; Gao et al., 2020; Kratzert et al., 2018). The aim of this paper is to explore the feasibility of predicting discharge with EC using a standard LSTM including a typical objective function, i.e. the MSE. Whether the selection of different objective functions affects the final simulation result is beyond the scope of this paper.
  
  4) Using the NSE for evaluation has the known shortcomings and tendency to high values with seasonal climate (Schaefli and Gupta 2007). Given the monsoon climate in the study region, a NSE >0.5 in the evaluation period should not at all be surprising or convincing. Given the adaptability of a LSTM a NSE near 1 should be expected during training. A NSE<0 refers to predictions worse than the mean value. Hence I would expect that the authors would not show arbitrary y-axes limits but to give clear guidance that the performance is not really impressive. Moreover, I would expect further performance measures like KGE, Spearman rank correlation etc.
  Response: We will revise the y-axes limits in Fig.3. In addition, we will provide the KGE and r values of the calibration and validation periods in the revised manuscript. The mean values of KGE of M_EC are 0.86, 0.70 and 0.38 in the calibration and two test periods, respectively. The corresponding mean values of the correlation coefficients of M_EC are 0.96, 0.82 and 0.73. The low KGE in the test period 2 is due to the poor performance of M_EC on the low flows because the low discharge occupy most time in this period.
  
  5) If I understood correctly, the LSTM is allowed to receive forecasted EC values. I wonder if this is a fair comparison if P is only given in hindcast. If P and EC measurements could be used as proxy measurements, why should I bother about not using forecasted P too? How did the authors assess the chosen time window? I was also unable to identify the m-parameter defining this window. Moreover, I did not really understand the selection of a 7 h time delay factor (L192) since the LSTM should well be capable to learn this.
  Response: We only use the previous and current precipitation to predict the current discharge because of the obvious fact that observed spring discharge is just the catchment response to the previous precipitation. The model performance of M_p would not be improved even the precipitation data after the prediction time were used in the model. Whereas for M_EC, because the EC dynamic always lags behind discharge, it is necessary to consider the EC data after the prediction time to forecast discharge. The procedure to determine input length (m) is shown in the appendix.
  
  The 7 hours delay was only used in the simple regression benchmark model to account for delay between discharge and EC, not in the LSTM model.
  
  6) The authors rightfully expose discharge as central hydrological variable (L36f). But if I would replace this measurement with a model, why should I still be at least somewhat confident about my water balance to be met? Why should I use precipitation as a further explanatory variable to predict discharge if I then would use discharge and precipitation to estimate further characteristics? This fundamentally opens the gates for spurious correlation ill-posing the matter of measuring discharge in the first place.
  Response: The model M_ECP that uses the precipitation and EC to predict discharge will be deleted in the revised manuscript.
  
  7) Given these questions, I am under the impression that the second part of the analyses with different subsets of training data is actually highly case specific. This does not only relate to the selected arrangement of training period, objective function and evaluation procedure. It also refers to the system under study: 1) The authors already modified the EC data (L128ff.). 2) A Karst system should rather directly relate to fill-and-spill dynamics (McDonnell et al. 2020), which are a perfect learning case for LSTMs rarely met in other hydrological systems. 3) The catchment is very small (1 km2). Hence, I would be very cautious about the capabilities to perform this kind of analysis and the strong claims interpreted from the results. In the current form, I would not really agree that the findings are sufficiently supported.
  Response: Firstly, we would like to clarify that the aim of this paper is to explore for the very first time the ability to use EC to predict discharge using a standard LSTM. Exploring the impact of using different objective functions to train the LSTM would therefore not be the scope of this paper. The longest data series from March 1 to August 1 in 2019 was selected as the training period since the LSTM is a pure data-driven model and requires abundant data to get a stable simulation result. For the model evaluation, the performance of M_EC basically is not influenced by the precipitation error in test period 2 since this model just uses EC as the model input.
  Secondly, the adjustment of EC value in test period 1 is based on the fact that the maximum EC of this spring is always relatively stable in different years according to the previous monitoring and different data loggers were used to monitor EC in 2017 and other two years. To further interpret the possible uncertainty caused by this adjustment, we will add another figure to the revised manuscript that shows the variation of model performance with the different EC adjustment values in test period 1.
  Finally, this work in the paper is the first time to apply LSTM model to predict discharge using EC. Although the study catchment is small, the observed spring discharge and EC dynamics are similar to many other karst springs (Olarinoye et al., 2020). Therefore, we think the catchment area should not be a problem to apply our approach. Regarding whether our approach can also be used in other hydrological systems, further work is needed which is our next step.
  
  Minor Points (only points in addition to the major ones are listed)
  Title: I find the title not really in line with the content of the paper.
  Response: The title will be revised to ‘Using LSTM to monitor stormflow discharge indirectly with EC observations’ according to the comment from reviewer 1.
  
  L21: What complex relationship? What special ML architecture? This is far too fuzzy.
  Response: We will further revise the sentence.
  
  L25: I did not spot any assessment of uncertainties. I guess you refer to the overall model performance evaluation.
  Response: Change the word ‘uncertainties’ to model performance.
  
  L39f: depth? water level!; defined relationship? rating curve! Why omitting the established terminology?
  Response: Accept. We will change the words.
  
  Fig 1: I do not really get anything from the maps a and b. Map c is difficult to interpret.
  Response: Map a shows the location of study catchment in China. Map b displays the locations of two climatic stations and their observations were used to fill two recording gaps. Map c just shows the catchment area of the karst spring.
  
  L106: what is a combination of rectangular weirs? Do you have a rating curve for the weirs or is the discharge merely calculated with an empirical weir function? How is the gauge measured? Which uncertainty would you expect?
  Response: The discharge is calculated by the empirical weir function. The water level was measured by a HOBO data Logger U20 with precision of 0.3cm.
  
  L108f: I suspect a Onset U24? Why do you report 15 min resolution if later on hourly data is used?
  Response: Yes, the Onset U24 was used for the EC monitoring. The hourly data was used because the resolution of discharge in some periods is one hour.
  
  L124: What is unsaturated fast flow?
  Response: change to ‘low-EC event water’.
  
  Fig 2b: Why are the side panels in reverse order and without annotated marks in the main panel? Why is the linear model used as reference not plotted? Why is (again) a different correlation measure used?
  Response: We will add the annotation in the main panel. Figure 2b just displays the overall relationship between observed discharge and EC. The different correlation coefficients in the right panel of Fig.2b correspond to a different relationship between discharge and EC under different recharge events. Figure 2 just displays the observation data without any simulation results of different models.
  
  L148f: I guess you refer to discharge events (not rain events)?
  Response: Thanks, we will change the words.
  
  L155f: A strong relationship? I would not claim a correlation of -0.51 to be specifically strong. Hence the relationship might be somewhat tangible there and is not found when plotting EC to Q for lower discharge.
  Response: We will further revise the sentence.
  
  Sec 3.1: Why dont you calrify your strategy with the three models M_EC, M_P and M_ECP upfront?
  Response: M_ECP will be deleted in the revised manuscript. M_p was just used as the comparison to the prediction result by EC. We will add the description in the revised version.
  
  L192: What is really meant with the 7h forward shifting?
  Response: The 7h forward shifting was just used in the simple regression model since this model cannot finely learn the delay between EC and discharge.
  
  L206: Why do you report the NSE equation. Not needed. Better add further evaluation estimates.
  Response: Delete NSE equation.
  
  L263: The benchmark is the linear regression which is slightly better than a pure mean value…
  Response: The bad performance of the benchmark model is due to the weak linear relationship between discharge and EC. However, the model M_EC can still get a better performance than the linear regression model
  
  L265: See major points about the NSE and the expectations for an LSTM. Avoid normative claims. Certainly they do not expose excellent capability…
  Response: Excellent changes to ‘good’
  
  Fig 3: Caption reports Fig 2 instead of 3.
  Response: Thanks.
  
  L276: Again, how do you support the claim? Test period 2 obviously fails and it is not analysed if this is due to the lack of precip data. Actually I do not expect that this is the case if the evaluation without OBGD remains that low.
  Response: The bad performance of M_p in the test period 2 even without OBGD is mainly caused by its poor simulation results in the low flow because the low flow takes up the most time in this period with only two storm events. We will add the explanation in the revised manuscript.
  
  Fig 5: Why do you show Nash values below -1?
  Response: Cap the y-axis to -1.
  
  [I have not recorded further minor points after L303 since I expect this to require substantial workover anyways.]
  Response: We will carefully revise the following sections according to the reviewer’s comment before.
  
  Code and data availability: Come on! We are in 2022! I find it absolutely necessary that we do not have to beg for seeing what is under the hood. HESS data and code policy is rather clear about this. I find it as an obligation for the authors to provide their data and code - especially for a study like yours which is merely applying a Keras LSTM so a very limited data set.
  Response: The code and data will be uploaded to a public repository.
  
  Reference:
  Campolo, M., Andreussi, P., Soldati, A., 1999. River flood forecasting with a neural network model. Water Resour. Res. 35, 1191–1197.
  Gao, S., Huang, Y., Zhang, S., Han, J., Wang, G., Zhang, M., Lin, Q., 2020. Short-term runoff prediction with GRU and LSTM networks without requiring time step optimization during sample generation. J. Hydrol. 589, 125188. https://doi.org/10.1016/j.jhydrol.2020.125188
  Kratzert, F., Klotz, D., Brenner, C., Schulz, K., Herrnegger, M., 2018. Rainfall – runoff modelling using Long Short-Term Memory ( LSTM ) networks, 6005–6022.
  Olarinoye, T., Gleeson, T., Marx, V., Seeger, S., Adinehvand, R., Allocca, V., Andreo, B., Apaéstegui, J., Apolit, C., Arfib, B., Auler, A., Bailly-Comte, V., Barberá, J. A., Batiot-Guilhe, C., Bechtel, T., Binet, S., Bittner, D., Blatnik, M., Bolger, T., Hartmann, A. (2020). Global karst springs hydrograph dataset for research and management of the world’s fastest-flowing groundwater. Nature Scientific Data, 7(1). https://doi.org/10.1038/s41597-019-0346-5
  
  Citation: https://doi.org/10.5194/hess-2022-77-AC2

Status: closed

RC1:
'Comment on hess-2022-77', Anonymous Referee #1, 16 Apr 2022

Chang bet al. present the application of a statistical approach (LSTM) to determine discharge of rainfall event runoff from instream EC measurements.

MAJOR

Title/Premise: The authors present the application of a statistical approach (LSTM) to calculate discharge during rainfall events from EC observations. The title (Using LSTM to monitor continuous discharge indirectly with electrical conductivity observations) might perhaps mislead the reader, as certain time periods (low flow, initial runoff) are clearly excluded from the analysis. A more fitting title would be: Using LSTM to monitor STORMFLOW DISCHARGE indirectly with EC observations.

The performance of a model using EC only is compared to models using both EC and P and only P. It might be interesting to compare the selected model to a more simple approach, to really highlight the added value of a more complex model.

20: In your abstract in line 20 you write that in your spring EC always has a negative correlation with spring discharge. However, in line 126-130 you mention that there is occasionally a positive correlation (EC peak at the initial runoff).

23-25: “LSTM results indicate that the spring discharge can be predicted well with EC, particularly in storms when the dilution dominates the EC dynamic; however, the prediction may have relatively large uncertainties in the small or middle recharge events.” It seems the findings of your study do not support this conclusion at all. As I understood, spring discharge could ONLY be predicted well for large storm events; there are large uncertainties when it comes to intermediate and small events and it was not possible at all to use EC for the estimation of baseflow/low flow. So, one might conclude that overall spring discharge can actually not be predicted well.

130: It is unclear why a there is a need to correct the maximum EC values in 2017 to match them with 2018 and 2019. Please elaborate why the maximum EC should be the same in all years.

130: You corrected for drift of the sensor by subtracting 23µS/cm. Please elaborate why you choose this specific value. Also: A simple subtraction of measured EC does not adequately account for gradual drift.

424: You elaborate that the EC dynamics of the investigated spring are relatively simple without temporal EC peaks at the beginning of storms. However, in line 126-130 you describe that you found indeed initial EC peaks at the beginning of storm events in your 2018 and 2019 data and you state that you excluded these observations from your analysis.

426: To my knowledge, the cited paper of Hess and White (1993) does not give any reference to “piston flow”, it doesn’t mention the words ‘piston flow’

MINOR

83 –geographical coordinates of the spring might be useful

83-91 citation might be useful

Figure 1a: labels in map are too small to read

120 -121: “the spring`s EC dynamic is MAINLY controlled by the rock dissolution and the dilution from the low-EC event water during storms.” – what other minor influencing factors are there?

133: wrong unit: 23us/cm -> 23µs/cm

170: “LSTM belongs to a special kind of recurrent neural network” – I suggest different wording

253: “The performances of MP and MECP deteriorate obviously probably due to …” – obviously or probably, which one is it?

Figure 3e: red line in legend is missing

285: wording: middle -> intermediate

Citation: https://doi.org/10.5194/hess-2022-77-RC1
- AC1:
  'Reply on RC1', Yong Chang, 16 Jul 2022
  Thank you very much for the valuable comments.
  
  MAJOR
  Title/Premise: The authors present the application of a statistical approach (LSTM) to calculate discharge during rainfall events from EC observations. The title (Using LSTM to monitor continuous discharge indirectly with electrical conductivity observations) might perhaps mislead the reader, as certain time periods (low flow, initial runoff) are clearly excluded from the analysis. A more fitting title would be: Using LSTM to monitor STORMFLOW DISCHARGE indirectly with EC observations.
  
  Response: Thank you. This is an excellent suggestion. We will change the paper title accordingly.
  
  The performance of a model using EC only is compared to models using both EC and P and only P. It might be interesting to compare the selected model to a more simple approach, to really highlight the added value of a more complex model.
  
  Response: In the manuscript, we have compared the LSTM model to a simple linear regression model (see lines 189-191 and figure 3b). The regression model shows much worse performance than M_EC.
  
  20: In your abstract in line 20 you write that in your spring EC always has a negative correlation with spring discharge. However, in line 126-130 you mention that there is occasionally a positive correlation (EC peak at the initial runoff).
  
  Response: Thanks. We will revise line 20. The spring EC always has a negative correlation with spring discharge in most times.
  
  23-25: “LSTM results indicate that the spring discharge can be predicted well with EC, particularly in storms when the dilution dominates the EC dynamic; however, the prediction may have relatively large uncertainties in the small or middle recharge events.” It seems the findings of your study do not support this conclusion at all. As I understood, spring discharge could ONLY be predicted well for large storm events; there are large uncertainties when it comes to intermediate and small events and it was not possible at all to use EC for the estimation of baseflow/low flow. So, one might conclude that overall spring discharge can actually not be predicted well.
  
  Response: We will further revise the sentences. Actually, most spring discharges including the baseflow under large storms can be well predicted by EC. This can be seen in Fig.4 that M_EC has a large NSE value in storm events. In the manuscript, we define the discharge in the storm event as the period from the end of the last recharge events to the beginning of the next recharge events. It is also possible to predict the discharge under intermediate events but with a large uncertainty compared to the predictions under storm events. It is true that it was not possible to use EC for the estimation of low flow since a low correlation between EC and discharge in the small recharge events. Although this drawback, our approach is still promising because the discharge dynamic under storms or intermediate recharge events is the key information for flood management or understanding the behavior of hydrological systems combined with other hydrochemical indices.
  
  130: It is unclear why a there is a need to correct the maximum EC values in 2017 to match them with 2018 and 2019. Please elaborate why the maximum EC should be the same in all years.
  
  Response: The hydrochemistry of the studies spring is mainly controlled by the dissolution of carbonate rocks. The maximum EC of the spring water corresponds to the calcium carbonate equilibria. Meanwhile, the spring locates in the phreatic zone and its most hydrochemical indices, such as temperature, pH and EC,basically do not show obvious seasonal variation according to the previous monitoring. Therefore, the maximum EC of this spring is always relatively stable in different years. Given two different data loggers were used to monitor EC in 2017 and the other two years, it is reasonable to assert that the discrepancy in maximum EC between 2017 and the other two years is mainly caused by the instrumental drift.
  
  We will add these clarifications in the revised version.
  
  130: You corrected for drift of the sensor by subtracting 23µS/cm. Please elaborate why you choose this specific value. Also: A simple subtraction of measured EC does not adequately account for gradual drift.
  
  Response: The selection of 23 µS/cm is based on the assumption that the maximum EC value in the 2017 is same to that in the other two years. Actually, although the maximum EC of spring water is relatively stable without an obvious seasonal variation at the study site, the maximum EC value may still have a slight variation. In the revised manuscript, we will add another plot to show the variation of the simulation result of M_EC with the different EC adjustment values in test period 1 to further illustrate the uncertainty caused by this drift adjustment.
  
  424: You elaborate that the EC dynamics of the investigated spring are relatively simple without temporal EC peaks at the beginning of storms. However, in line 126-130 you describe that you found indeed initial EC peaks at the beginning of storm events in your 2018 and 2019 data and you state that you excluded these observations from your analysis.
  
  Response: I will further revise the sentence.
  
  426: To my knowledge, the cited paper of Hess and White (1993) does not give any reference to “piston flow”, it doesn’t mention the words ‘piston flow’
  
  Response: Thanks. There is a mistake here. The cited reference is a paper published also by Hess and White (1988) in which they found the phenomenon that the spring EC may rise firstly before beginning to drop during the storm. The ‘piston effect’ was named by Goldscheider and Drew (2007) in the book <Methods in Karst Hydrogeology>. We will add this reference in the revised version.
  
  MINOR
  
  83 –geographical coordinates of the spring might be useful
  Response: We will add the geographical coordinate.
  
  83-91 citation might be useful
  Response: We will add the relevant references.
  
  Figure 1a: labels in map are too small to read
  Response: We will increase the label size accordingly.
  
  120 -121: “the spring`s EC dynamic is MAINLY controlled by the rock dissolution and the dilution from the low-EC event water during storms.” – what other minor influencing factors are there?
  Response: The spring EC may also slightly influenced by the concentration variation of some other irons during storms, such as K⁺, Na⁺, Cl^-, and SO₄^2-.
  
  133: wrong unit: 23us/cm -> 23µs/cm
  Response: Thanks, we will revise the unit.
  
  170: “LSTM belongs to a special kind of recurrent neural network” – I suggest different wording
  Response: Revise to ‘LSTM is a special recurrent neural network’.
  
  253: “The performances of MP and MECP deteriorate obviously probably due to …” – obviously or probably, which one is it?
  Response: Revise the sentence to ‘The performances of MP and MECP show an obvious deterioration which is probably due to…’.
  
  Figure 3e: red line in legend is missing
  Response: Thanks, we will update the legend.
  
  285: wording: middle -> intermediate
  Response: Thank you. We will change ‘middle’ to ‘intermediate’ in the revised version.
  
  Reference:
  Hess, J. W., & White, W. B. (1988). Storm response of the karstic carbonate aquifer of southcentral Kentucky. Journal of Hydrology, 99(3), 235–252. https://doi.org/10.1016/0022-1694(88)90051-0
  Goldscheider, N., & Drew, D. (2007). Methods in karst hydrogeology: IAH: International contributions to hydrogeology (Vol. 26). CRC Press.
  
  Citation: https://doi.org/10.5194/hess-2022-77-AC1
RC2:
'Comment on hess-2022-77', Anonymous Referee #2, 18 Jun 2022

Yong Chang et al. present a study on estimating hourly discharge in a small 1 km² karst catchment from precipitation and EC measurements using a LSTM. They set up three different LSTMs based on EC, precipitation and both signals together. Moreover, they explore the performance of other versions of these models with a reduced amount of provided training data. The topic of the study is an interesting contribution to the field, since the added value of EC measurements with gauge levels is indeed underexplored. Also the question about gauging strategies to build a rating curve is of interest. However, I see a couple of severe issues with the study which are in conflict with the strong claims raised and which require to be resolved before final publication.

Major Points

If I understand correctly, the only true estimate of discharge from EC is done with the M_EC model. Given the claims of the title, abstract and introduction, I would not expect precipitation as further variable. L77ff. again precipitation is not mentioned but the use of EC as a proxy. I think the rest of the paper does not really follow this line.

The models including precipitation input are directly predicting discharge. Hence a simple hydrological model and not a linear regression should be the benchmark for these models. Given the situation that the models including precipitation input perform worst in the 2nd evaluation period and otherwise in training and the 1st evaluation period, this raises concerns about what the LSTM actually learned during training. Apparently the temporal patterns of discharge in 2017 and 2019 are more similar than in 2018. What would happen if the model was trained in a different period? Why do the authors expect that the LSTM got sufficient data, when it obviously fails for the test period 2?

Why do the authors use a mean squared error as objective function (L200) instead of a more specific or several complementary evaluation functions?

Using the NSE for evaluation has the known shortcomings and tendency to high values with seasonal climate (Schaefli and Gupta 2007). Given the monsoon climate in the study region, a NSE >0.5 in the evaluation period should not at all be surprising or convincing. Given the adaptability of a LSTM a NSE near 1 should be expected during training. A NSE<0 refers to predictions worse than the mean value. Hence I would expect that the authors would not show arbitrary y-axes limits but to give clear guidance that the performance is not really impressive. Moreover, I would expect further performance measures like KGE, Spearman rank correlation etc.

If I understood correctly, the LSTM is allowed to receive forecasted EC values. I wonder if this is a fair comparison if P is only given in hindcast. If P and EC measurements could be used as proxy measurements, why should I bother about not using forecasted P too? How did the authors assess the chosen time window? I was also unable to identify the m-parameter defining this window. Moreover, I did not really understand the selection of a 7 h time delay factor (L192) since the LSTM should well be capable to learn this.

The authors rightfully expose discharge as central hydrological variable (L36f). But if I would replace this measurement with a model, why should I still be at least somewhat confident about my water balance to be met? Why should I use precipitation as a further explanatory variable to predict discharge if I then would use discharge and precipitation to estimate further characteristics? This fundamentally opens the gates for spurious correlation ill-posing the matter of measuring discharge in the first place.

Given these questions, I am under the impression that the second part of the analyses with different subsets of training data is actually highly case specific. This does not only relate to the selected arrangement of training period, objective function and evaluation procedure. It also refers to the system under study: 1) The authors already modified the EC data (L128ff.). 2) A Karst system should rather directly relate to fill-and-spill dynamics (McDonnell et al. 2020), which are a perfect learning case for LSTMs rarely met in other hydrological systems. 3) The catchment is very small (1 km²). Hence, I would be very cautious about the capabilities to perform this kind of analysis and the strong claims interpreted from the results. In the current form, I would not really agree that the findings are sufficiently supported.

Minor Points (only points in addition to the major ones are listed)

Title: I find the title not really in line with the content of the paper.

L21: What complex relationship? What special ML architecture? This is far too fuzzy.

L25: I did not spot any assessment of uncertainties. I guess you refer to the overall model performance evaluation.

L39f: depth? water level!; defined relationship? rating curve! Why omitting the established terminology?

Fig 1: I do not really get anything from the maps a and b. Map c is difficult to interpret.

L106: what is a combination of rectangular weirs? Do you have a rating curve for the weirs or is the discharge merely calculated with an empirical weir function? How is the gauge measured? Which uncertainty would you expect?

L108f: I suspect a Onset U24? Why do you report 15 min resolution if later on hourly data is used?

L124: What is unsaturated fast flow?

Fig 2b: Why are the side panels in reverse order and without annotated marks in the main panel?Why is the linear model used as reference not plotted? Why is (again) a different correlation measure used?

L148f: I guess you refer to discharge events (not rain events)?

L155f: A strong relationship? I would not claim a correlation of -0.51 to be specifically strong. Hence the relationship might be somewhat tangible there and is not found when plotting EC to Q for lower discharge.

Sec 3.1: Why dont you calrify your strategy with the three models M_EC, M_P and M_ECP upfront?

L192: What is really meant with the 7h forward shifting?

L206: Why do you report the NSE equation. Not needed. Better add further evaluation estimates.

L263: The benchmark is the linear regression which is slightly better than a pure mean value…

L265: See major points about the NSE and the expectations for an LSTM. Avoid normative claims. Certainly they do not expose excellent capability…

Fig 3: Caption reports Fig 2 instead of 3.

L276: Again, how do you support the claim? Test period 2 obviously fails and it is not analysed if this is due to the lack of precip data. Actually I do not expect that this is the case if the evaluation without OBGD remains that low.

Fig 5: Why do you show Nash values below -1?

[I have not recorded further minor points after L303 since I expect this to require substantial workover anyways.]

Code and data availability: Come on! We are in 2022! I find it absolutely necessary that we do not have to beg for seeing what is under the hood. HESS data and code policy is rather clear about this. I find it as an obligation for the authors to provide their data and code - especially for a study like yours which is merely applying a Keras LSTM so a very limited data set.

——

McDonnell, J. J., Spence, C., Karran, D. J., Meerveld, H. J. (Ilja) van, and Harman, C. J.: Fill-and-Spill: A Process Description of Runoff Generation at the Scale of the Beholder, Water Resour Res, 57, https://doi.org/10.1029/2020wr027514, 2021.

Schaefli, B. and Gupta, H. V.: Do Nash values have value?, Hydrol Process, 21, 2075–2080, https://doi.org/10.1002/hyp.6825, 2007.

Citation: https://doi.org/10.5194/hess-2022-77-RC2
- AC2: 'Reply on RC2', Yong Chang, 16 Jul 2022
  
  Thank you very much for the valuable comments.
  
  Yong Chang et al. present a study on estimating hourly discharge in a small 1 km² karst catchment from precipitation and EC measurements using a LSTM. They set up three different LSTMs based on EC, precipitation and both signals together. Moreover, they explore the performance of other versions of these models with a reduced amount of provided training data. The topic of the study is an interesting contribution to the field, since the added value of EC measurements with gauge levels is indeed underexplored. Also the question about gauging strategies to build a rating curve is of interest. However, I see a couple of severe issues with the study which are in conflict with the strong claims raised and which require to be resolved before final publication.
  
  Major Points
  1) If I understand correctly, the only true estimate of discharge from EC is done with the M_EC model. Given the claims of the title, abstract and introduction, I would not expect precipitation as further variable. L77ff. again precipitation is not mentioned but the use of EC as a proxy. I think the rest of the paper does not really follow this line.
  Response: Thank you for this remark. The prediction of discharge by precipitation was just used as a comparison to the prediction result by EC. To avoid confusion, the simulation result of model M_ECP (using precipitation and EC to predict discharge) will be deleted in the revised manuscript, since it does not provide any effective information in the paper.
  
  2) The models including precipitation input are directly predicting discharge. Hence a simple hydrological model and not a linear regression should be the benchmark for these models. Given the situation that the models including precipitation input perform worst in the 2nd evaluation period and otherwise in training and the 1st evaluation period, this raises concerns about what the LSTM actually learned during training. Apparently the temporal patterns of discharge in 2017 and 2019 are more similar than in 2018. What would happen if the model was trained in a different period? Why do the authors expect that the LSTM got sufficient data, when it obviously fails for the test period 2?
  Response: We used the linear regression model was used as a benchmark model for M_EC since currently there is yet no hydrological model that can predict the discharge using EC. For the model M_p, which uses prediction to predict discharge, we do not apply any benchmarking because this model is just used as a comparison to M_EC. We will revise the sentence in lines 189-191.
  The models including precipitation, like M_p and M_ECP (will be removed in the revised manuscript), has worse performance in the second test period due to a large error of precipitation data (OBGD). Whereas, the performance of model M_EC is not severely influenced by the existence of OBGD (see Fig. 3b) because this mode just uses EC to predict discharge. That is, M_EC does not fail to predict discharge in the test period 2. This also indicates the advantage of M_EC to predict discharge over M_p in mountainous catchments where precipitation has a strong spatial variability. A sparse rain gage network would bring large precipitation uncertainty and bad discharge predictions by M_p.
  Since the LSTM is a pure data-driven model, it may have a weak extrapolation ability. Therefore, when the LSTM was used for the discharge prediction by EC, we should collect EC-discharge data under a variety of rainfall conditions.
  
  3) Why do the authors use a mean squared error as objective function (L200) instead of a more specific or several complementary evaluation functions?
  Response: We used the mean squared error as it is a widely-used objective function in many machine learning works, see (Campolo et al., 1999; Gao et al., 2020; Kratzert et al., 2018). The aim of this paper is to explore the feasibility of predicting discharge with EC using a standard LSTM including a typical objective function, i.e. the MSE. Whether the selection of different objective functions affects the final simulation result is beyond the scope of this paper.
  
  4) Using the NSE for evaluation has the known shortcomings and tendency to high values with seasonal climate (Schaefli and Gupta 2007). Given the monsoon climate in the study region, a NSE >0.5 in the evaluation period should not at all be surprising or convincing. Given the adaptability of a LSTM a NSE near 1 should be expected during training. A NSE<0 refers to predictions worse than the mean value. Hence I would expect that the authors would not show arbitrary y-axes limits but to give clear guidance that the performance is not really impressive. Moreover, I would expect further performance measures like KGE, Spearman rank correlation etc.
  Response: We will revise the y-axes limits in Fig.3. In addition, we will provide the KGE and r values of the calibration and validation periods in the revised manuscript. The mean values of KGE of M_EC are 0.86, 0.70 and 0.38 in the calibration and two test periods, respectively. The corresponding mean values of the correlation coefficients of M_EC are 0.96, 0.82 and 0.73. The low KGE in the test period 2 is due to the poor performance of M_EC on the low flows because the low discharge occupy most time in this period.
  
  5) If I understood correctly, the LSTM is allowed to receive forecasted EC values. I wonder if this is a fair comparison if P is only given in hindcast. If P and EC measurements could be used as proxy measurements, why should I bother about not using forecasted P too? How did the authors assess the chosen time window? I was also unable to identify the m-parameter defining this window. Moreover, I did not really understand the selection of a 7 h time delay factor (L192) since the LSTM should well be capable to learn this.
  Response: We only use the previous and current precipitation to predict the current discharge because of the obvious fact that observed spring discharge is just the catchment response to the previous precipitation. The model performance of M_p would not be improved even the precipitation data after the prediction time were used in the model. Whereas for M_EC, because the EC dynamic always lags behind discharge, it is necessary to consider the EC data after the prediction time to forecast discharge. The procedure to determine input length (m) is shown in the appendix.
  
  The 7 hours delay was only used in the simple regression benchmark model to account for delay between discharge and EC, not in the LSTM model.
  
  6) The authors rightfully expose discharge as central hydrological variable (L36f). But if I would replace this measurement with a model, why should I still be at least somewhat confident about my water balance to be met? Why should I use precipitation as a further explanatory variable to predict discharge if I then would use discharge and precipitation to estimate further characteristics? This fundamentally opens the gates for spurious correlation ill-posing the matter of measuring discharge in the first place.
  Response: The model M_ECP that uses the precipitation and EC to predict discharge will be deleted in the revised manuscript.
  
  7) Given these questions, I am under the impression that the second part of the analyses with different subsets of training data is actually highly case specific. This does not only relate to the selected arrangement of training period, objective function and evaluation procedure. It also refers to the system under study: 1) The authors already modified the EC data (L128ff.). 2) A Karst system should rather directly relate to fill-and-spill dynamics (McDonnell et al. 2020), which are a perfect learning case for LSTMs rarely met in other hydrological systems. 3) The catchment is very small (1 km2). Hence, I would be very cautious about the capabilities to perform this kind of analysis and the strong claims interpreted from the results. In the current form, I would not really agree that the findings are sufficiently supported.
  Response: Firstly, we would like to clarify that the aim of this paper is to explore for the very first time the ability to use EC to predict discharge using a standard LSTM. Exploring the impact of using different objective functions to train the LSTM would therefore not be the scope of this paper. The longest data series from March 1 to August 1 in 2019 was selected as the training period since the LSTM is a pure data-driven model and requires abundant data to get a stable simulation result. For the model evaluation, the performance of M_EC basically is not influenced by the precipitation error in test period 2 since this model just uses EC as the model input.
  Secondly, the adjustment of EC value in test period 1 is based on the fact that the maximum EC of this spring is always relatively stable in different years according to the previous monitoring and different data loggers were used to monitor EC in 2017 and other two years. To further interpret the possible uncertainty caused by this adjustment, we will add another figure to the revised manuscript that shows the variation of model performance with the different EC adjustment values in test period 1.
  Finally, this work in the paper is the first time to apply LSTM model to predict discharge using EC. Although the study catchment is small, the observed spring discharge and EC dynamics are similar to many other karst springs (Olarinoye et al., 2020). Therefore, we think the catchment area should not be a problem to apply our approach. Regarding whether our approach can also be used in other hydrological systems, further work is needed which is our next step.
  
  Minor Points (only points in addition to the major ones are listed)
  Title: I find the title not really in line with the content of the paper.
  Response: The title will be revised to ‘Using LSTM to monitor stormflow discharge indirectly with EC observations’ according to the comment from reviewer 1.
  
  L21: What complex relationship? What special ML architecture? This is far too fuzzy.
  Response: We will further revise the sentence.
  
  L25: I did not spot any assessment of uncertainties. I guess you refer to the overall model performance evaluation.
  Response: Change the word ‘uncertainties’ to model performance.
  
  L39f: depth? water level!; defined relationship? rating curve! Why omitting the established terminology?
  Response: Accept. We will change the words.
  
  Fig 1: I do not really get anything from the maps a and b. Map c is difficult to interpret.
  Response: Map a shows the location of study catchment in China. Map b displays the locations of two climatic stations and their observations were used to fill two recording gaps. Map c just shows the catchment area of the karst spring.
  
  L106: what is a combination of rectangular weirs? Do you have a rating curve for the weirs or is the discharge merely calculated with an empirical weir function? How is the gauge measured? Which uncertainty would you expect?
  Response: The discharge is calculated by the empirical weir function. The water level was measured by a HOBO data Logger U20 with precision of 0.3cm.
  
  L108f: I suspect a Onset U24? Why do you report 15 min resolution if later on hourly data is used?
  Response: Yes, the Onset U24 was used for the EC monitoring. The hourly data was used because the resolution of discharge in some periods is one hour.
  
  L124: What is unsaturated fast flow?
  Response: change to ‘low-EC event water’.
  
  Fig 2b: Why are the side panels in reverse order and without annotated marks in the main panel? Why is the linear model used as reference not plotted? Why is (again) a different correlation measure used?
  Response: We will add the annotation in the main panel. Figure 2b just displays the overall relationship between observed discharge and EC. The different correlation coefficients in the right panel of Fig.2b correspond to a different relationship between discharge and EC under different recharge events. Figure 2 just displays the observation data without any simulation results of different models.
  
  L148f: I guess you refer to discharge events (not rain events)?
  Response: Thanks, we will change the words.
  
  L155f: A strong relationship? I would not claim a correlation of -0.51 to be specifically strong. Hence the relationship might be somewhat tangible there and is not found when plotting EC to Q for lower discharge.
  Response: We will further revise the sentence.
  
  Sec 3.1: Why dont you calrify your strategy with the three models M_EC, M_P and M_ECP upfront?
  Response: M_ECP will be deleted in the revised manuscript. M_p was just used as the comparison to the prediction result by EC. We will add the description in the revised version.
  
  L192: What is really meant with the 7h forward shifting?
  Response: The 7h forward shifting was just used in the simple regression model since this model cannot finely learn the delay between EC and discharge.
  
  L206: Why do you report the NSE equation. Not needed. Better add further evaluation estimates.
  Response: Delete NSE equation.
  
  L263: The benchmark is the linear regression which is slightly better than a pure mean value…
  Response: The bad performance of the benchmark model is due to the weak linear relationship between discharge and EC. However, the model M_EC can still get a better performance than the linear regression model
  
  L265: See major points about the NSE and the expectations for an LSTM. Avoid normative claims. Certainly they do not expose excellent capability…
  Response: Excellent changes to ‘good’
  
  Fig 3: Caption reports Fig 2 instead of 3.
  Response: Thanks.
  
  L276: Again, how do you support the claim? Test period 2 obviously fails and it is not analysed if this is due to the lack of precip data. Actually I do not expect that this is the case if the evaluation without OBGD remains that low.
  Response: The bad performance of M_p in the test period 2 even without OBGD is mainly caused by its poor simulation results in the low flow because the low flow takes up the most time in this period with only two storm events. We will add the explanation in the revised manuscript.
  
  Fig 5: Why do you show Nash values below -1?
  Response: Cap the y-axis to -1.
  
  [I have not recorded further minor points after L303 since I expect this to require substantial workover anyways.]
  Response: We will carefully revise the following sections according to the reviewer’s comment before.
  
  Code and data availability: Come on! We are in 2022! I find it absolutely necessary that we do not have to beg for seeing what is under the hood. HESS data and code policy is rather clear about this. I find it as an obligation for the authors to provide their data and code - especially for a study like yours which is merely applying a Keras LSTM so a very limited data set.
  Response: The code and data will be uploaded to a public repository.
  
  Reference:
  Campolo, M., Andreussi, P., Soldati, A., 1999. River flood forecasting with a neural network model. Water Resour. Res. 35, 1191–1197.
  Gao, S., Huang, Y., Zhang, S., Han, J., Wang, G., Zhang, M., Lin, Q., 2020. Short-term runoff prediction with GRU and LSTM networks without requiring time step optimization during sample generation. J. Hydrol. 589, 125188. https://doi.org/10.1016/j.jhydrol.2020.125188
  Kratzert, F., Klotz, D., Brenner, C., Schulz, K., Herrnegger, M., 2018. Rainfall – runoff modelling using Long Short-Term Memory ( LSTM ) networks, 6005–6022.
  Olarinoye, T., Gleeson, T., Marx, V., Seeger, S., Adinehvand, R., Allocca, V., Andreo, B., Apaéstegui, J., Apolit, C., Arfib, B., Auler, A., Bailly-Comte, V., Barberá, J. A., Batiot-Guilhe, C., Bechtel, T., Binet, S., Bittner, D., Blatnik, M., Bolger, T., Hartmann, A. (2020). Global karst springs hydrograph dataset for research and management of the world’s fastest-flowing groundwater. Nature Scientific Data, 7(1). https://doi.org/10.1038/s41597-019-0346-5
  
  Citation: https://doi.org/10.5194/hess-2022-77-AC2

Yong Chang, Benjamin Mewes, and Andreas Hartmann

Viewed

Total article views: 2,115 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,727	337	51	2,115	58	84

HTML: 1,727
PDF: 337
XML: 51
Total: 2,115
BibTeX: 58
EndNote: 84

Views and downloads (calculated since 21 Mar 2022)

Month	HTML	PDF	XML	Total
Mar 2022	251	36	3	290
Apr 2022	75	19	6	100
May 2022	37	13	1	51
Jun 2022	29	9	4	42
Jul 2022	47	10	4	61
Aug 2022	12	10	0	22
Sep 2022	14	14	0	28
Oct 2022	14	5	0	19
Nov 2022	19	8	0	27
Dec 2022	8	7	0	15
Jan 2023	10	3	0	13
Feb 2023	20	13	0	33
Mar 2023	11	4	0	15
Apr 2023	9	9	0	18
May 2023	11	9	1	21
Jun 2023	9	1	0	10
Jul 2023	42	11	1	54
Aug 2023	29	2	0	31
Sep 2023	13	5	2	20
Oct 2023	12	9	1	22
Nov 2023	10	4	1	15
Dec 2023	7	3	1	11
Jan 2024	9	3	0	12
Feb 2024	8	9	0	17
Mar 2024	8	10	2	20
Apr 2024	15	3	7	25
May 2024	28	2	3	33
Jun 2024	53	2	2	57
Jul 2024	34	2	36
Aug 2024	38	2	0	40
Sep 2024	26	1	0	27
Oct 2024	30	3	0	33
Nov 2024	31	2	1	34
Dec 2024	29	4	0	33
Jan 2025	35	6	0	41
Feb 2025	26	2	0	28
Mar 2025	37	4	2	43
Apr 2025	27	10	2	39
May 2025	53	9	1	63
Jun 2025	64	11	0	75
Jul 2025	43	27	0	70
Aug 2025	92	7	1	100
Sep 2025	331	9	2	342
Oct 2025	21	7	1	29

Cumulative views and downloads (calculated since 21 Mar 2022)

Month	HTML	PDF	XML	Total
Mar 2022	251	36	3	290
Apr 2022	75	19	6	100
May 2022	37	13	1	51
Jun 2022	29	9	4	42
Jul 2022	47	10	4	61
Aug 2022	12	10	0	22
Sep 2022	14	14	0	28
Oct 2022	14	5	0	19
Nov 2022	19	8	0	27
Dec 2022	8	7	0	15
Jan 2023	10	3	0	13
Feb 2023	20	13	0	33
Mar 2023	11	4	0	15
Apr 2023	9	9	0	18
May 2023	11	9	1	21
Jun 2023	9	1	0	10
Jul 2023	42	11	1	54
Aug 2023	29	2	0	31
Sep 2023	13	5	2	20
Oct 2023	12	9	1	22
Nov 2023	10	4	1	15
Dec 2023	7	3	1	11
Jan 2024	9	3	0	12
Feb 2024	8	9	0	17
Mar 2024	8	10	2	20
Apr 2024	15	3	7	25
May 2024	28	2	3	33
Jun 2024	53	2	2	57
Jul 2024	34	2	36
Aug 2024	38	2	0	40
Sep 2024	26	1	0	27
Oct 2024	30	3	0	33
Nov 2024	31	2	1	34
Dec 2024	29	4	0	33
Jan 2025	35	6	0	41
Feb 2025	26	2	0	28
Mar 2025	37	4	2	43
Apr 2025	27	10	2	39
May 2025	53	9	1	63
Jun 2025	64	11	0	75
Jul 2025	43	27	0	70
Aug 2025	92	7	1	100
Sep 2025	331	9	2	342
Oct 2025	21	7	1	29

Viewed (geographical distribution)

Total article views: 2,032 (including HTML, PDF, and XML) Thereof 2,032 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 13 Oct 2025

Short summary

This study presents a work to investigate the feasibility of using EC to predict the discharge in a typical karst catchment. We found that the spring discharge can be well predicted by EC in storms using LSTM (Long Short Term Memory) model, while the prediction has relatively large uncertainties in small recharge events. To establish a roust LSTM model for long-term discharge prediction from EC in ungauged catchments, the random or fixed-interval discharge monitoring strategy is recommended.


Total:	0
HTML:	0
PDF:	0
XML:	0

Using LSTM to monitor continuous discharge indirectly with electrical conductivity observations

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.