Knowledge-informed deep learning for hydrological model calibration: an application to Coal Creek Watershed in Colorado

Jiang, Peishi; Shuai, Pin; Sun, Alexander; Mudunuru, Maruti K.; Chen, Xingyuan

doi:https://doi.org/10.5194/hess-27-2621-2023

Articles | Volume 27, issue 14

https://doi.org/10.5194/hess-27-2621-2023

Articles | Volume 27, issue 14

Research article

19 Jul 2023

Research article |

| 19 Jul 2023

Knowledge-informed deep learning for hydrological model calibration: an application to Coal Creek Watershed in Colorado

Peishi Jiang, Pin Shuai, Alexander Sun, Maruti K. Mudunuru, and Xingyuan Chen

Abstract

Deep learning (DL)-assisted inverse mapping has shown promise in hydrological model calibration by directly estimating parameters from observations. However, the increasing computational demand for running the state-of-the-art hydrological model limits sufficient ensemble runs for its calibration. In this work, we present a novel knowledge-informed deep learning method that can efficiently conduct the calibration using a few hundred realizations. The method involves two steps. First, we determine decisive model parameters from a complete parameter set based on the mutual information (MI) between model responses and each parameter computed by a limited number of realizations (∼50). Second, we perform more ensemble runs (e.g., several hundred) to generate the training sets for the inverse mapping, which selects informative model responses for estimating each parameter using MI-based parameter sensitivity. We applied this new DL-based method to calibrate a process-based integrated hydrological model, the Advanced Terrestrial Simulator (ATS), at Coal Creek Watershed, CO. The calibration is performed against observed stream discharge (Q) and remotely sensed evapotranspiration (ET) from the water year 2017 to 2019. Preliminary MI analysis on 50 realizations resulted in a down-selection of 7 out of 14 ATS model parameters. Then, we performed a complete MI analysis on 396 realizations and constructed the inverse mapping from informative responses to each of the selected parameters using a deep neural network. Compared with calibration using observations covering all time steps, the new inverse mapping improves parameter estimations, thus enhancing the performance of ATS forward model runs. The Nash–Sutcliffe efficiency (NSE) of streamflow predictions increases from 0.53 to 0.8 when calibrating against Q alone. Using ET observations, on the other hand, does not show much improvement on the performance of ATS modeling mainly due to both the uncertainty of the remotely sensed product and the insufficient coverage of the model ET ensemble in capturing the observation. By using observed Q only, we further performed a multiyear analysis and show that Q is best simulated (NSE > 0.8) by including in the calibration the dry-year flow dynamics that show more sensitivity to subsurface characteristics than the other wet years. Moreover, when continuing the forward runs till the end of 2021, the calibrated models show similar simulation performances during this evaluation period as the calibration period, demonstrating the ability of the estimated parameters in capturing climate sensitivity. Our success highlights the importance of leveraging data-driven knowledge in DL-assisted hydrological model calibration.

Download & links

Article (PDF, 7322 KB)

Download & links

How to cite.

Received: 04 Aug 2022 – Discussion started: 05 Aug 2022 – Revised: 08 Jan 2023 – Accepted: 06 Jun 2023 – Published: 19 Jul 2023

1 Introduction

Calibrating a hydrological model is critical in accurately capturing the hydrological dynamics of the simulated watershed, which in turn improves the understanding of the corresponding terrestrial water cycle (Singh and Frevert, 2002). While the increasing complexity and spatiotemporal resolution of the hydrological models enable a better representation of the watershed dynamics (Kollet and Maxwell, 2006; Coon et al., 2019; Wang and Kumar, 2022), running these models is computationally expensive (Clark et al., 2017) even with existing high-performance computing resources. This computational burden significantly impedes the efficient and accurate calibration of integrated hydrological models.

Balancing the trade-off between computational cost and calibration accuracy is necessary when adopting traditional model calibration methods (Kavetski et al., 2018). Newton-type optimization methods (Jorge and Stephen, 2006; Qin et al., 2018) are known for their fast iteration convergence but usually only achieve local optimum. On the other hand, the stochastic methods, such as the shuffled complex evolution algorithm (Duan et al., 1992), the dynamically dimensioned search algorithm (Tolson and Shoemaker, 2007), and the ensemble Kalman filter (Reichle et al., 2002; Moradkhani et al., 2005; Evensen, 2009; Sun and Sun, 2015), are capable of providing better global optimum at the cost of high computational demand. One alternative is to use a surrogate model that provides fast emulations to replace the physical model during calibration so that one might save the computational budget while achieving a reasonable calibration result. Mo et al. (2019) employed a dense convolutional encoder–decoder network as the emulator for a two-dimensional contaminant transport model to estimate the conductivity field using an iterative local updating ensemble smoother. Similar subsurface characterization was also performed by Wang et al. (2021) who developed a theory-guided neural network as the surrogate of a flow model which was coupled with an iterative ensemble smoother to estimate the subsurface characteristics. In light of the dimensionality reduction of the model states, Dagon et al. (2020) calibrated biophysical parameters using a global optimizer on a surrogate that emulates the principle components of the outputs of community land models. Jiang and Durlofsky (2021) adopt a recurrent encoder–decoder network as the data-space inversion parameterization to reduce the dimensionality of the model states and parameters and used an ensemble smoother with multiple data assimilation to update the low-dimension latent variables. Despite the successes of using surrogates in calibration, how to develop an accurate and trustworthy emulator can vary from case to case and, in fact, is still a long-standing challenge (McGovern et al., 2022).

Recently, deep learning (DL)-assisted inverse mapping shows promise in addressing inverse problems and has seen early successes in hydrology (Cromwell et al., 2021; Mudunuru et al., 2021; Tsai et al., 2021), petroleum engineering (Razak et al., 2021), and geophysics (Yang and Ma, 2019; Wang et al., 2022). By employing a well-trained DL model (Goodfellow et al., 2016), this approach maps model parameters from model states/outputs/responses such that once trained, the mapping can directly infer the parameters based on observations. The inverse mapping outperforms the traditional calibration approaches in the following ways. First, DL models can better capture the highly nonlinear relationships encoded in the model than ensemble-based methods, which primarily rely on the linear estimation theory through the Kalman filter (Evensen, 2009; Moradkhani et al., 2005; Reichle et al., 2002; Sun and Sun, 2015). Yang and Ma (2019) developed a convolutional neural-network-based inverse mapping that outperforms the traditional full waveform inversion adopting the adjoint-state optimization method in estimating seismic velocity from seismic data. Cromwell et al. (2021) also demonstrate the improved performance of DL-assisted inverse mapping over an ensemble smoother in estimating subsurface permeability used in a watershed model based on the Advanced Terrestrial Simulator (ATS). Second, training DL models may potentially use fewer realizations than the traditional methods such as iterative calibration methods that usually require several thousands of realizations to achieve the model optimization convergence. Mudunuru et al. (2021) show that DL-assisted inverse mapping using 1000 realizations outperforms dynamically dimensioned search algorithm (Tolson and Shoemaker, 2007) that has to leverage 5000 realizations in calibrating multivariate parameters of models based on the Soil and Water Assessment Tool (SWAT). Third, the calibration workflow is simpler given that ensemble simulations do not have to be fully coupled with the inverse mapping. Traditional calibration methods require sophisticated workflows (White et al., 2020; Jiang et al., 2021) to manipulate the model restart (e.g., ensemble Kalman filter), model rerunning (e.g., gradient-based and ensemble-based methods), and the communications between hydrological model and calibration tools, which can be time-consuming. Meanwhile, such an integrated workflow tool is not necessary for developing inverse mapping because model ensemble runs and DL training are now two separate steps. This decoupling of ensemble runs and DL training allows us to use high-performance computing resources to calibrate hydrological models efficiently.

Despite its success, the current DL-assisted inverse mapping is often designed to take all observed states in estimating hydrological model parameters. However, some observational values can be uninformative, or even misinformative, to estimate parameters (Loritz et al., 2018), thus impeding the mapping performance. While the underlying assumption is that the trained DL model can “automatically” delineate the accurate relationship between parameters and observed responses, the limited realizations (e.g., a few hundred) would potentially restrain the DL model from being well trained (Moghaddam et al., 2020). Further, when using all observed responses as inputs, the potentially large amount of trainable weights of the DL model can make the model training hard and induce the overfitting of the model (Ying, 2019), thus calling for more realizations used in training. Lately, several studies proposed new inverse mapping methods that indirectly address this issue by using dimensionality reduction and differential programming. Razak et al. (2021) developed a latent-space inversion that performs the inverse mapping from the model responses to parameters in their reduced spaces through an auto-encoder. The dimensionality reduction by using an auto-encoder is supposed to not only lower the original high-dimensional data space but also indirectly distill the most relevant dynamics to the parameters. Tsai et al. (2021) leveraged differential programming such that the loss function of an inverse mapping is designed to directly minimize the difference between observations and the responses predicted by a differentiable version of the physical model using the estimated parameters. In doing so, the uninformative responses are automatically given less importance in the loss. Nevertheless, both studies reduce the irrelevant information in implicit ways through complicating the DL-based mapping which, due to its “black-box” nature, does not explicitly show to what extent an observation is relevant to a parameter. Also, by adding another layer of complexity, the inverse mapping can potentially be hard to build. For instance, the current solutions to develop a differentiable physical model rely on either a surrogate or rewriting the model using differentiable parameters (Karniadakis et al., 2021), both of which are research challenges that go beyond addressing the inverse problem itself.

The emergence of knowledge-informed DL provides a new opportunity to resolve the uninformative or misinformative issue by explicitly encoding the complex relationship between the inputs and outputs in the DL model (Willard et al., 2020). Knowledge-informed DL includes, but is not limited to, the following three methods: (1) a physics-guided loss function, (2) hybrid modeling, and (3) a physics-guided design of architecture. A physics-guided loss function embeds the mathematical relation between inputs and outputs in the loss function, known as physics-informed deep learning (Karniadakis et al., 2021) and has seen some early successes in earth science. For instance, Jia et al. (2019) leveraged an energy conservation loss in developing a physics-guided recurrent neural network to simulate lake temperature. Hybrid modeling, on the other hand, directly integrates the physical model with the DL model, which often serves as a surrogate for its computationally intense counterpart in the physical model (Kurz, 2021). An example can be coupling a DL-based emulator for turbulent heat fluxes with a process-based hydrological model framework (Bennett and Nijssen, 2021). Lastly, the physics-guided design of architecture explicitly designs the neural networks consistently with prior knowledge. The widely used convolutional (Atlas et al., 1987) and recurrent (Rumelhart and McClelland, 1986) neural networks fall into this category due to their specific network structures to learn the spatial and temporal relationships, respectively. Other related studies include relating intermediate physical variables to hidden neurons (Daw et al., 2020), explicitly learning nonlinear dynamics through the neural operator (Kovachki et al., 2021), and encoding domain knowledge obtained from nonparametric physics-based kernels into the neural network (Sadoughi and Hu, 2019). Compared with the other two types of knowledge-informed DL models, which are usually limited to particular physical dynamics, the physics-guided design of architecture is more generic regarding both the processes of gaining prior knowledge and designing a correspondent neural network.

One important piece of domain knowledge is the pairwise relation between model parameters and responses: that is, how relevant a parameter is to a model response at a given time step. Understanding such a pairwise relationship is essential to select the most relevant model responses to estimate each parameter when building the inverse mapping. To this end, global sensitivity analysis (Razavi and Gupta, 2015; Sarrazin et al., 2016) is a suitable tool due to its capability of quantifying the contribution of uncertainty from model inputs and parameters to model outputs and has been extensively applied in earth system modeling (Hall et al., 2009; Harper et al., 2011; Anderson et al., 2014; Guse et al., 2014; Dai et al., 2017). Through a sensitivity analysis study on SWAT modeling (Jiang et al., 2022), mutual information (MI; Cover and Thomas, 2006) has shown the promise of using a few hundred model realizations to provide similar sensitivity results as the popular Sobol' sensitivity analysis (Sobol, 2001) that usually relies on several thousand realizations. As a result, MI is well suited to unravel the relation between model parameters and responses given a few hundred realizations of a state-of-the-art fully integrated hydrological model.

This study aims to develop a novel knowledge-informed DL method for model calibration by using a few hundred realizations. We leverage MI-based global sensitivity analysis to uncover the dependencies between parameters and observed responses, which are then used to guide the selection of crucial responses as the inputs of DL-assisted inverse mapping. We applied this method in estimating multiple parameters of a fully integrated hydrological model, ATS (Coon et al., 2019), at the Coal Creek Watershed, a snow-dominated alpine basin located in Colorado, US. Multiple water years of hydrological observations are used in both the ATS model's calibration and evaluation. We further performed a multiyear analysis to investigate the significance of wet and dry years in model calibration. Our study highlights the importance of domain knowledge in uncovering the dependencies among variables of interest before hydrological model calibration.

https://hess.copernicus.org/articles/27/2621/2023/hess-27-2621-2023-f01

Figure 1The Coal Creek Watershed and the setup of the Advanced Terrestrial Simulator (ATS). (a) The river network, the digital elevation model (DEM), and the surface mesh of the watershed. (b) The time series of USGS streamflow observations (station number 09111250) at the watershed outlet, and the Moderate Resolution Imaging Spectroradiometer (MODIS) 8 d composite evapotranspiration (ET) averaged across the watershed, where the observations before 1 October 2019 are used for model calibration, and the remaining observations till 31 December 2021 are used for evaluating the climate sensitivity of the estimated model parameters. (c, d) The delineated soil and geological layers, respectively.

2 Methods

2.1 Study site

The Coal Creek Watershed is located in the western part of the larger East Taylor Watershed in Colorado (Fig. 1a). The majority of the discharge flows through Coal Creek from the west to the east. The watershed is an HUC12 (hydrologic unit code) watershed encompassing around 53.2 km² of the drainage area (HUC12 ID: 140200010204). According to the Köppen classification system (Köppen and Geiger, 1930), this high alpine watershed is classified as warm summer and humid continental climate with a significant snow process dominating the hydrological cycle. Based on the long-term Daymet forcing dataset (Thornton et al., 2021), the watershed receives ∼530 mm of snowfall annually, dominating its annual precipitation (∼850 mm). This watershed exhibits strong variations in topography with elevations ranging from 2706 to 3770 m, where the primary land covers are evergreen forest (62.6 %) and shrub (20.5 %). Hydrological observations are available through (1) a USGS gaging station (station number 09111250) that records daily discharge (Q) observations at the watershed outlet, and (2) a remote sensing product of the Moderate Resolution Imaging Spectroradiometer (MODIS) 8 d composite evapotranspiration (ET) at 500 m resolution. Figure 1 shows the time series of Q and watershed-averaged MODIS ET during 1 October 2016 to 31 December 2021, which are used as observations for calibrating and evaluating the ATS model.

2.2 ATS model setup

ATS is a fully distributed hydrologic model that integrates surface and subsurface flow dynamics (Coon et al., 2019). The surface hydrological process is characterized by a two-dimensional diffusion wave approximation of the Saint-Venant equation. A three-dimensional Richards equation is used to represent the subsurface flow. The model adopts the Priestley–Taylor equation to simulate evapotranspiration (ET) from various processes (e.g., snow and plant transpiration), which are coupled with the surface–subsurface hydrological cycle.

We leveraged an existing ATS setup at the Coal Creek Watershed (Shuai et al., 2022). The Watershed Workflow package (Coon and Shuai, 2022) was used to delineate the mesh, the surface land covers, and the subsurface characteristics of the watershed. The resulting mesh consists of 171 760 cells, formed by a two-dimensional triangle surface mesh followed by 19 terrain-following subsurface layers (Fig. 1a). The surface mesh contains 8588 triangular cells with varying sizes that range from ∼5000 m² near the stream network to ∼50 000 m² away from the stream network. On the surface, the National Land Cover Database was used to delineate the land cover types. In the subsurface, the 19 layers add up to 28 m and contain (1) six soil layers in the top 2 m, (2) 12 geological layers in the middle, and (3) one bedrock layer in the bottom of the simulation domain. The maximum depth to bedrock (28 m) was determined by SoilGrids (Shangguan et al., 2017). The subsurface characteristics of the soil and geological layers are retrieved from the National Resources Conservation Service (NRCS) Soil Survey Geographic (SSURGO) soils database and GLobal HYdrogeology MaPS (GLHYMPS) 2.0 (Huscroft et al., 2018), respectively. The k-means clustering algorithm (Likas et al., 2003) was used to group the soil and geological types based on the default permeability values from SSURGO and GLHYMPS, which leads to five soil types and four geological types shown in Fig. 1c and d, respectively. Each clustered soil or geological type is associated with a specific set of subsurface characteristics (such as permeability), which are assigned to the corresponding grouped grid cells. These subsurface characteristics are important in controlling flow dynamics and can be estimated from hydrological observations. To ensure that the model achieved a physically appropriate initial state, two spin-ups were performed sequentially, including (1) a cold spinup that ran the model for 1000 years by using constant rainfall and led to steady-state model outputs (e.g., converged total amount of subsurface water storage), and (2) a warm spinup that was initialized by the steady-state spinup result and performed a transient simulation for 10 years (i.e., 1 October 2004–1 October 2014) under the Daymet forcing. Please refer to Shuai et al. (2022) for the detailed model setup and spinup.

We select a preliminary set of 14 model parameters to be calibrated, which can be categorized into ET, snow, river channel, and subsurface characteristics. The ET parameters include two coefficients used by the Priestley–Taylor equation (Priestley and Taylor, 1972) in calculating the potential ET of snow and transpiration, respectively (i.e., priestley_taylor_alpha-snow and priestley_taylor_alpha-transpiration). The snow parameters are the snow melting rate (snowmelt_rate) and the temperature determining the snow melting (snowmelt_degree_diff). The river channel characteristic is the Manning's coefficient (manning_n), which describes the roughness of the surface channel. The subsurface characteristics include the major soil and geological permeability (i.e., perm_s1, perm_s2, perm_s3, perm_s4, perm_s5, perm_g1, perm_g2, perm_g3, and perm_g4). A detailed description of these parameters can be found in Table A1.

2.3 Knowledge-informed model calibration using deep learning

We develop a new methodology to calibrate ATS using knowledge-informed DL, as shown in Fig. 2. The key idea is to leverage a data-driven approach to identify the sensitive model response as the inputs to the DL-assisted inverse mapping for estimating each parameter. Here, we use the MI as the sensitivity analysis tool due to its capability to uncover nonlinear relationships. Derived from Shannon's entropy (Cover and Thomas, 2006), MI quantifies the shared information between two variables: a model response Y and a model parameter X as follows:

\begin{matrix} (1) & \begin{aligned} I (X; Y) = & H (Y) - H (Y | X) = \sum_{X = x} \sum_{Y = y} p (x, y) \\ \log (\frac{p (x, y)}{p (x) p (y)}), \end{aligned} \end{matrix}

where p is the probability density function and can be estimated by the fixed binning method, $H (Y) = - \sum_{Y = y} p (y) \log (p (y))$ is Shannon's entropy describing the overall uncertainty of Y, and H(Y|X) is the conditional entropy that quantifies the uncertainty of Y given the knowledge of X. Equation (1) shows that I(X;Y) is quantified as the shared dependency between the variables and is zero when X and Y are statistically independent. Jiang et al. (2022) show that MI computed by a few hundred realizations with a statistical significance test (SST) can yield comparable sensitivity results with the full Sobol' sensitivity analysis that usually uses thousands of realizations through a multivariate sensitivity analysis of SWAT. Therefore, MI is an ideal tool to perform the sensitivity analysis on the several hundred model realizations that are computationally affordable by the ATS model (Cromwell et al., 2021). In this study, we follow a similar strategy of Jiang et al. (2022) to estimate p using 10 evenly divided bins along each dimension and perform SST tests to filter out any nonsignificant MI value with a significance level of 95 % based on 100 bootstrap samples. In other words, the computed MI is set to zero if the statistical significance test fails.

https://hess.copernicus.org/articles/27/2621/2023/hess-27-2621-2023-f02

Figure 2Diagram of deep learning (DL) inverse mapping development including four steps: (1) performing a preliminary mutual information (MI) analysis using 50 model runs to narrow down the parameters to be estimated, (2) performing a full MI analysis on 396 model runs to correctly delineate the sensitivity between each parameter and each observed response, (3) developing DL inverse mappings with and without being knowledge informed, and (4) estimating parameters from observations with and without observation errors.

Knowledge-informed deep learning for hydrological model calibration: an application to Coal Creek Watershed in Colorado

2.1 Study site

2.2 ATS model setup

2.3 Knowledge-informed model calibration using deep learning

2.3.1 Step 1: narrowing down the parameters to be calibrated

2.3.2 Step 2: computing the parameter sensitivity

2.3.3 Step 3: developing knowledge-informed DL inverse mapping

2.3.4 Step 4: parameter estimation with observation errors

2.4 Deep learning-based inverse mapping development

3.1 MI-based sensitivity analysis

3.1.1 Parameter prescreening using a preliminary MI analysis

3.1.2 The full MI analysis

3.1.3 Physical knowledge obtained by MI analysis

3.2 Performances of deep learning-based inverse mappings

3.2.1 The improved parameter estimation using knowledge-informed inverse mapping

3.3 Forward runs of calibrated ATS using 3-year Q, ET, and Q–ET

3.3.1 Adaptability of the calibrated model in the evaluation period

3.3.2 The extrapolation issue of ET observations

3.4 Forward runs of calibrated ATS using multiyear Q

3.4.1 The significance of dry-year dynamics

3.4.2 Impact of observation errors