Technical Note: Data assimilation and autoregression for using near-real-time streamflow observations in long short-term memory networks
- 1Google Research, Mountain View, CA, United States
- 2University of California Davis, Department of Land, Air & Water Resources, Davis, CA, United States
- 3LIT AI Lab & Institute for Machine Learning, Johannes Kepler University, Linz, Austria
- 4Upstream Tech, Alameda, CA, USA
- 5Google Research, Vienna, Austria
- 6National Water Center, National Oceanic and Atmospheric Administration, Tuscaloosa, AL, United States
- 7Department of Geological Sciences, University of Alabama, Tuscaloosa, AL, USA
- 8Google Research, Tel Aviv, Israel
- 1Google Research, Mountain View, CA, United States
- 2University of California Davis, Department of Land, Air & Water Resources, Davis, CA, United States
- 3LIT AI Lab & Institute for Machine Learning, Johannes Kepler University, Linz, Austria
- 4Upstream Tech, Alameda, CA, USA
- 5Google Research, Vienna, Austria
- 6National Water Center, National Oceanic and Atmospheric Administration, Tuscaloosa, AL, United States
- 7Department of Geological Sciences, University of Alabama, Tuscaloosa, AL, USA
- 8Google Research, Tel Aviv, Israel
Abstract. Ingesting near-real-time observation data is a critical component of many operational hydrological forecasting systems. In this paper we compare two strategies for ingesting near-real-time streamflow observations into Long Short-Term Memory (LSTM) rainfall-runoff models: autoregression (a forward method) and variational data assimilation. Autoregression is both more accurate and more computationally efficient than data assimilation. Autoregression is sensitive to missing data, however an appropriate (and simple) training strategy mitigates this problem.
Grey S. Nearing et al.
Status: final response (author comments only)
-
RC1: 'Comment on hess-2021-515', Ralf Loritz, 25 Nov 2021
Review of “Data assimilation and autoregression for using near-real-time streamflow observations in long short-term memory networks” by Nearing et al. (in discussion)
Summary and Recommendation
Nearing et al. test how near-real time streamflow observations can effectively be used in Long Short-Term Memory (LSTM) rainfall runoff models. They compare an autoregression (AR) approach with a data assimilation (DA) approach and test, additionally, how sensitive AR is to random gaps in the data. The manuscript (MS) is easy to follow, well-structured and suits the scope of HESS. I particular liked the comprehensive appendix in combination with a short MS. I think that this MS can be published after some minor revisions and provide only some smaller comments and questions below.
Sincerely
Ralf Loritz
Questions and comments:
- Reading the MS I would have like to see a couple of detailed results from three or four catchments where the AR or DA worked particular good or bad and what the (hydrological and ML) reason for this might be (to underpin the discussion of Appendix F and G). For instance, what could be the reason that DA and AR reduces the predictive performance of a few of your models (Fig. F2)? You state that: "We are unsure of the reason for this, but it warrants further exploration." (Line 352) maybe zooming into one of the catchments could help to give a better explaination.
- I find it a bit unrealistic how you added the missing data. I would assume that a broken gauging station is not working for a couple of days or maybe weeks in a row and wonder how this would alter your results (e.g. all streamflow data available for training but then two weeks or more only simulated data during testing with a closer focus on particular that period and not the entire testing period).
- Showing how the variance or the Shannon entropy changes of your simulations in addition to the median would be interesting (Fig.1 and Fig.3). If it remains constant, I would mention that the spread of the predictions is not affected by the data availability.
Personal comment: Three of the seven Co-Authors have presumingly not contributed to this “technical note” as they are not mentioned in the author contribution section.
-
AC1: 'Reply on RC1', Grey Nearing, 04 Feb 2022
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2021-515/hess-2021-515-AC1-supplement.pdf
-
RC2: 'review of hess-2021-515', Anonymous Referee #2, 02 Dec 2021
The technical note compares two different techniques for using near-real-time streamflow observations to improve operational streamflow forecasts from LSTM rainfall-runoff models. The first technique ("autoregression", AR) adds lagged streamflow observations as predictor in the model. The second technique uses variational data assimilation (DA) to update model states within an assimilation window. The two techniques are compared on the CAMELS dataset, including experiments that artificially remove data to simulate scenarios with missing streamflow data.
The paper is generally well written, concise and to the point. The comparison between AR and DA is an interesting and novel contribution to the literature.
Comments:
1. The main conclusion is that "AR significantly out-performed the more complicated DA method" (line 195) and the authors therefore recommend against using DA (line 196). However, I feel the authors are overstating the results: differences in improved performance between AR (10%) and DA (8%) are relatively small, as also seen in Fig. 3 where the DA lines (red) and the AR lines (orange) are close.
2. On line 51 it is stated that "the purpose of this paper is to provide insight into trade-offs between DA and AR". I feel the paper doesn't entirely deliver on this. Yes, the two techniques are compared across a large number of basins, but the reader doesn't get a clear sense when to use which technique. Appendix F contains a regression analysis in this direction but concludes that "we were generally unable to predict differences between the NSE scores of DA and AR". Closer inspection by a human however may lead to some insights. E.g. it could be interesting to look in more detail at extreme cases: ones where AR significantly beats DA, and vice versa. For example figure 2 shows dots in the south that are green (good) for AR and purple (bad) for DA, and vice versa.
3. Related to the previous comments, I think the paper in general would benefit from a more balanced and nuanced discussion of the usefulness of both techniques, i.e. the trade-offs. For example, on line 52 the authors claim that "AR is easier to implement than DA". One could also argue that DA is "easier", or at least more modular, since it does not require changes to the model. Similarly, on line 191 the authors state that "we have no reason to suspect that other DA methods might perform better than variational DA". Without additional explanation or insights, this statement is not supported by the results in the paper. Given the wide range of DA approaches and implementations, it is not clear why this statement would hold. See also comment 5.
4. Metrics, section 2.3: please specify what kind of forecasts you are evaluating, are these nowcasts?
5. Methodology: results of DA typically strongly depend on how error parameters are set. Details on this aspect are provided in the appendices. We have error covariances B and R in eq.B5, which translate to alpha parameters in eq. C1. These alpha parameters are tuned during an independent validation period, with values reported in Table E1. We see that the tuned value of alpha_c (how much we trust/weight the trained model) is zero, and that alpha_y (how much we trust the real-time data) is fixed at a value of 1. If I understand it correctly, setting instead alpha_c=1 and alpha_y=0 in eq.C1 would fall back to the benchmark simulation model, i.e. not using real-time data. Why then not also tune alpha_y? Or tune some weight w=[0,1] with alpha_c=w and alpha_y=1-w? That way the DA model includes the simulation model as a special case and should never perform worse. The current results sometimes (Figures G1 and G3) show worse performance for DA than for the benchmark simulation model. Also, are the alpha parameters the same for all basins? Why not estimate separate values for each basin?
6. Appendix B describes variational DA and its application to LSTM. I think the math needs to be 'cleaned up' a bit for clarity:
-loss function L is written as function of model inputs x and outputs y, L(x, y), while loss is typically a function of model outputs y and corresponding observations. Where the model output depends on the unknown parameters or states for which derivatives are computed.
-Eqs. B13-B15: I don't think the gradient chains are correct, since they assume h[t] is independent of previous time slices given c[t], while the model equations B6-B11 show that there is an additional 'path' from h[t-1] to h[t]. I understand the appendix is meant to give the reader a general sense of what is happening, but you might as well write it down more correctly to avoid confusion.
-Eq. B14: the derivative on the left should be with respect to c_l
-Eq. B15: on the right we should have x and y from t-s to t instead of from 0 to t? And on the left derivative with respect to c_l[t-s], and x[t-s:t] instead of x[0:t]?
-I found it confusing that Eq. C1 switches to [t, t+s] from [t-s, t] in Eq. B15.
7. Eq. 1: what is epsilon?
8. Eq. 1: don't you want to divide by N here? Otherwise NSE values increase with N...?
9. Line 84: "is reproduced"
10. Line 199: at the time of this review, no code was provided in the linked github repository
-
AC2: 'Reply on RC2', Grey Nearing, 04 Feb 2022
The comment was uploaded in the form of a supplement: https://hess.copernicus.org/preprints/hess-2021-515/hess-2021-515-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Grey Nearing, 04 Feb 2022
Grey S. Nearing et al.
Grey S. Nearing et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
776 | 323 | 24 | 1,123 | 12 | 13 |
- HTML: 776
- PDF: 323
- XML: 24
- Total: 1,123
- BibTeX: 12
- EndNote: 13
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1