Hydrological concept formation inside long  short-term memory (LSTM) networks

Lees, Thomas; Reece, Steven; Kratzert, Frederik; Klotz, Daniel; Gauch, Martin; De Bruijn, Jens; Kumar Sahu, Reetik; Greve, Peter; Slater, Louise; Dadson, Simon J.

doi:https://doi.org/10.5194/hess-26-3079-2022

Articles | Volume 26, issue 12

https://doi.org/10.5194/hess-26-3079-2022

Articles | Volume 26, issue 12

Research article

20 Jun 2022

Research article |

| 20 Jun 2022

Hydrological concept formation inside long short-term memory (LSTM) networks

Thomas Lees, Steven Reece, Frederik Kratzert, Daniel Klotz, Martin Gauch, Jens De Bruijn, Reetik Kumar Sahu, Peter Greve, Louise Slater, and Simon J. Dadson

Abstract

Neural networks have been shown to be extremely effective rainfall-runoff models, where the river discharge is predicted from meteorological inputs. However, the question remains: what have these models learned? Is it possible to extract information about the learned relationships that map inputs to outputs, and do these mappings represent known hydrological concepts? Small-scale experiments have demonstrated that the internal states of long short-term memory networks (LSTMs), a particular neural network architecture predisposed to hydrological modelling, can be interpreted. By extracting the tensors which represent the learned translation from inputs (precipitation, temperature, and potential evapotranspiration) to outputs (discharge), this research seeks to understand what information the LSTM captures about the hydrological system. We assess the hypothesis that the LSTM replicates real-world processes and that we can extract information about these processes from the internal states of the LSTM. We examine the cell-state vector, which represents the memory of the LSTM, and explore the ways in which the LSTM learns to reproduce stores of water, such as soil moisture and snow cover. We use a simple regression approach to map the LSTM state vector to our target stores (soil moisture and snow). Good correlations (R²>0.8) between the probe outputs and the target variables of interest provide evidence that the LSTM contains information that reflects known hydrological processes comparable with the concept of variable-capacity soil moisture stores.

The implications of this study are threefold: (1) LSTMs reproduce known hydrological processes. (2) While conceptual models have theoretical assumptions embedded in the model a priori, the LSTM derives these from the data. These learned representations are interpretable by scientists. (3) LSTMs can be used to gain an estimate of intermediate stores of water such as soil moisture. While machine learning interpretability is still a nascent field and our approach reflects a simple technique for exploring what the model has learned, the results are robust to different initial conditions and to a variety of benchmarking experiments. We therefore argue that deep learning approaches can be used to advance our scientific goals as well as our predictive goals.

Download & links

Article (PDF, 7878 KB)

Download & links

How to cite.

Received: 12 Nov 2021 – Discussion started: 23 Nov 2021 – Revised: 15 Mar 2022 – Accepted: 04 Apr 2022 – Published: 20 Jun 2022

1 Introduction

Long short-term memory networks (LSTMs) have demonstrated state-of-the-art performance for rainfall-runoff modelling for a variety of locations and tasks (Kratzert et al., 2018, 2019 d; Ma et al., 2020; Lees et al., 2021; Frame et al., 2021). However, whether we can use these models to better interpret the hydrological system remains an open question. Given that LSTM-based models offer state-of-the-art hydrological performance, more research is required to better understand which conceptual structures the LSTM has learned and to diagnose potential gaps in our conceptual and process-based models, ultimately to stimulate innovation in hydrological theory.

The primary objective of this study is to test the hypothesis that the information stored in the LSTM state vector reflects known hydrological concepts that are important for discharge generation, including soil water storage and snow processes. What have these models learned about the hydrological system that allows them to make highly accurate predictions? Can we interrogate the model to determine whether the LSTM has learned a physically realistic mapping from inputs to outputs? Being able to reason about the model and its behaviour is a key component of dependable models. It allows researchers and practitioners to interrogate the model, making sure that it is giving the right results for the right reasons (Kirchner, 2006).

Deriving insights about the hydrological system has always been a goal of hydrological modelling (Beven, 2011). Peter Young’s work on data-based mechanistic modelling (DBM) emphasized the need to apply flexible data-driven models before then applying a mechanistic interpretation to the learned representation of these models (Young and Beven, 1994; Young, 2003, 1998). Philosophically, this approach is similar to the one we take here, although the number of parameters in the DBM approach is much smaller. In an early application of neural networks to rainfall-runoff modelling, Wilby et al. (2003) sought to challenge preconceptions of neural network approaches as uninterpretable. They found that nodes in their multi-layer perceptron corresponded to quick flow, baseflow, and soil saturation and showed how the learned representation of deep learning models could be interpreted. They sought to determine whether neural networks were capable of reproducing both the outputs and internal functioning of conceptual hydrological models.

Recent studies call to more fully explore the potential for techniques from the fields of artificial intelligence and machine learning (Beven, 2020; Reichstein et al., 2019; Shen, 2018; Karpatne et al., 2017) by demonstrating predictive performance alongside interpretations of the model itself to improve our understanding of the modelled system. Several studies have suggested that LSTM rainfall-runoff models learn a generalizable representation of the underlying physical processes. This allows them to perform well in out-of-sample conditions, such as prediction in ungauged basins (PUB) (Kratzert et al., 2019 c; Feng et al., 2020; Ma et al., 2020) and unseen extreme events (Frame et al., 2021). These results suggest that LSTMs have captured information that generalizes to these conditions, information that can help us improve hydrological theory and predictions.

Outside of hydrology, calls for interpreting machine learning and deep learning systems are getting louder and even generating legislative changes (European Union Digital Strategy, 2019; UK Statistics Authority, 2019). Spiegelhalter (2020), for example, argues that as algorithmic decision support tools become widespread in everyday life, the ability to describe how predictions are made is essential for building trust in these systems. A large body of literature has arisen to define interpretability (Ribeiro et al., 2016; Lipton, 2018; Doshi-Velez and Kim, 2017), to measure how interpretable models aid human decision making (Nguyen, 2018; Chu et al., 2020), and to develop methods for interpreting models (Olah et al., 2018, 2020; Ghorbani and Zou, 2020; Lundberg and Lee, 2017). Our contribution draws on work from neuro-linguistic programming, where learned embeddings from models trained for speech-recognition tasks have been interpreted to better understand how parts of speech are recognized and used by LSTM models (Hewitt and Liang, 2019).

The exploration of the internal representations of LSTM-based rainfall-runoff models is still at an early developmental stage. Kratzert et al. (2018) showed evidence that individual LSTM cells correlate with snow water content, although the model was only trained to predict discharge from meteorological inputs. Kratzert et al. (2019 d) explored the learned embedding of catchment attributes, showing that an LSTM variant had learned to group the rainfall-runoff behaviours of hydrologically similar catchments. Using dimensionality reduction techniques, the static embedding (the output of the input gate) was shown to reflect spatial and thematic groups of catchments that qualitatively correspond to catchments with similar hydrological behaviours. For two exemplary basins, Kratzert et al. (2019 b) found correlations between the cell states of the LSTM and three hydrological states (upper-zone storage, lower-zone storage, and snow depth) from the Sacramento + Snow-17 hydrological model (Burnash, 1995). All three of these studies introduced methods that can be used for exploring the internal representation of hydrological models, but there exists no comprehensive evaluation of the information stored in the cell-state dimensions across a large sample of basins. Furthermore, these studies only compared individual memory cells with hydrological processes. The LSTM, however, is not forced to store information about one process in a single memory cell but can distribute the information about hydrological processes across several cells. Therefore, we explore methods for extracting the information that is stored in the LSTM cell state, across all cells.

The aim of this research is to examine the internal functioning of the LSTM model. We explore the evolution of the LSTM state vector and test whether information that reflects intermediate stores of water (soil moisture and snow depth) has been learned by the LSTM. This research is novel for providing a means of interpreting which information the LSTM rainfall-runoff model has encoded within its state vector. To our knowledge, we are the first to apply techniques developed in machine learning interpretability and natural language processing research (Hewitt and Liang, 2019) to hydrology. We carry out a comprehensive evaluation of the LSTM cell states across a sample of 669 catchments in Great Britain (Lees et al., 2021). This allows us to rigorously assess whether the LSTM has learned concepts that generalize over space. On this basis we devised several baseline experiments to provide evidence of an internal representation of hydrologically relevant processes. Furthermore, we consider information stored across all values in the LSTM state vector, as opposed to identifying and focusing only on single values from within the cell state. This is important since there are no constraints forcing the LSTM to store information in individual cells.

2 Methods

In this study, we trained LSTM models using the same hyper-parameters as those trained in Lees et al. (2021). We offer a brief introduction to the state-space formulation of the LSTM (Kratzert et al., 2019 b) because it offers a clear explanation for why we explore the cell state (c_t), since it reflects the state vector of the LSTM.

2.1 The LSTM

Hydrological models are often formulated with a state-space-based approach. This means that the states (s) at a specific time (t) depend on the input at time t (x_t), the model state in the previous timestep (s_t−1), and the model parameters (θ) (Kratzert et al., 2019 b).

\begin{matrix} (1) & s_{t} = g (x_{t}, s_{t - 1}; θ_{j}) \end{matrix}

The model output (y_t, discharge) is a function of the states (s_t) and inputs (i_t) at that timestep and the model parameters.

\begin{matrix} (2) & y_{t} = g (x_{t}, s_{t}; θ_{j}) \end{matrix}

Similarly, the LSTM can be formulated as

\begin{array}{l} (3) & c_{t}, h_{t} = f_{LSTM} (x_{t}, c_{t - 1}, h_{t - 1}; θ_{k}), \\ (4) & y_{t} = f_{Dense} (h_{t}; θ_{l}), \end{array}

where the state vector (the “cell state” c_t) and output vector (the “hidden state” h_t) of the LSTM at timestep t are a function of the current inputs (x_t, e.g. meteorological features and catchment attributes), the previous output and state (h_t−1 and c_t−1), and some learnable parameters (θ_k). Similarly to the state-update equations, the output of the model (y_t, e.g. the discharge) is a function of the output of the LSTM (h_t, which is a function of c_t) and some more (learnable) model parameters (θ_i).

The key difference between the LSTM and classical state models (e.g. conceptual and physical hydrology models) is that the LSTM can infer any process that is deducible from the data to solve the training task, while classical hydrological models are limited by the processes that are hard-coded in the model implementation. For a more detailed description of the LSTM, we refer to Kratzert et al. (2019 d), particularly Fig. 1 and Eqs. (1)–(12), both found in Section 2, Methods. A diagrammatic representation of the LSTM can be found in Appendix F, Fig. F1.

In order for the LSTM models to produce accurate simulations of discharge across a variety of catchments, we hypothesize that the LSTM should have learned to represent hydrological processes and stores. We test whether the LSTM is able to recover intermediate stores of water by visualizing the evolution of the LSTM cell state and compare this to soil moisture and snow depth from ERA5-Land.

2.2 Experimental design

We used the following experimental design to investigate the learned hydrological process understanding of LSTMs.

Following Lees et al. (2021), we trained a single LSTM to predict runoff for 669 basins from the CAMELS-GB dataset (Coxon et al., 2020 b). The input sequences are digested into the LSTM, each consisting of 1 year's worth of daily data (365 timesteps). The model is forced by a set of meteorological variables (precipitation, temperature, and potential evapotranspiration) and a series of static catchment attributes describing topography, climatic conditions, soil types, and land cover classes. These static attributes are used to learn differences and similarities between catchments. For more details of the training procedure and for a comprehensive table listing all model inputs, we refer the reader to Table 2 of Lees et al. (2021). It is important to note that neither snow depth nor soil moisture was included as inputs or outputs during model training.

2.3 Probing

In the present context, a probe is a diagnostic device that is used on top of the trained LSTM model to examine the learned internal representation of the LSTM. In its simplest form, a probe is a linear regression model that connects the cell states to a given output. In a more complex form a probe might be realized in the form of a set of stacked multi-layer perceptrons or any other algorithm fit for regression tasks. As such, probes offer the opportunity to explore what the LSTM has learned during training, allowing us to use the LSTM to generate predictions of latent, intermediate variables. They also confirm whether our model has learned physically realistic mappings from inputs to outputs. To our knowledge, probes have not been used on hydrological LSTMs.

https://hess.copernicus.org/articles/26/3079/2022/hess-26-3079-2022-f01

Figure 1An overview of the linear probe analysis. Panel (a) demonstrates the LSTM as an input–state–output model, where at each timestep, inputs (blue spheres, such as precipitation) are processed, producing a prediction of discharge on the 365th day. This reflects how the LSTM is trained. (b) When forcing the model with already trained weights, we extract the LSTM state vector (c_t) at each timestep. (c) We then compile a dataset of inputs (the c_t vectors for each target timestep) and targets (s_t – the soil moisture measurements for the catchment) matched at each catchment and timestep. (d) Finally, we use this dataset to train a linear probe, a set of weights, and a bias term for all catchments.

Hydrological concept formation inside long short-term memory (LSTM) networks

2.1 The LSTM

2.2 Experimental design

2.3 Probing

2.4 ERA5-Land data

3.1 Soil moisture probe

3.2 Snow depth probe

4.1 The LSTM has learned physically realistic mappings

4.2 The catchment biases in the linear probe

4.3 Probes offer a means of interpreting the learned representation of the LSTM

B1 How similar are ESA CCI Soil Moisture and ERA5-Land Soil Moisture?