the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
When ancient numerical demons meet physics-informed machine learning: adjoint-based gradients for implicit differentiable modeling
Abstract. Recent advances in differentiable modeling, a genre of physics-informed machine learning that trains neural networks (NNs) together with process-based equations, has shown promise in enhancing hydrologic models’ accuracy, interpretability, and knowledge-discovery potential. Current differentiable models are efficient for NN-based parameter regionalization, but the simple explicit numerical schemes paired with sequential calculations (operator splitting) can incur large numerical errors whose impacts on models’ representation power and learned parameters are not clear. Implicit schemes, however, cannot rely on automatic differentiation to calculate gradients due to potential issues of gradient vanishing and memory demand. Here we propose a “discretize-then-optimize” adjoint method to enable differentiable implicit numerical schemes for the first time for large-scale hydrologic modeling. The adjoint model demonstrates comprehensively improved performance, with Kling-Gupta efficiency coefficients, peak-flow and low-flow metrics, and evapotranspiration that moderately surpass the already-competitive explicit model. Therefore, the previous sequential-calculation approach had a detrimental impact on the model’s ability to represent hydrologic dynamics. Furthermore, with a structural update that describes capillary rise, the adjoint model can better describe baseflow in arid regions and also produce low and peak flows that outperform even pure machine learning methods such as long short-term memory networks. The adjoint model rectified some parameter distortions but did not alter spatial parameter distributions, demonstrating the robustness of regionalized parameterization. Despite higher computational expenses and modest improvements, the adjoint model’s success removes the barrier for complex implicit schemes to enrich differentiable modeling in hydrology.
- Preprint
(5528 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on hess-2023-258', Ilhan Özgen-Xian, 09 Dec 2023
Dear authors, Dear editor,Here is my review of the submitted work. I recommend accepting the manuscript after minor revisions.
Kindly
Ilhan Özgen-Xian
Summary
The authors explore the use of the adjoint method to replace automatic differentiation for differentiable models, with application to lumped hydrological modelling with implicit time integration. This is proposed to overcome some of the limitations of the automatic differentiation when using implicit time integration, namely excessive memory usage when using iterative linear system solvers and vanishing gradients.
The authors couple the hydrological model HBV to an LSTM and train the coupled hybrid model (NN-HBV) using CONUS and CAMELS data. The improvement in model performance due to the implicit time integration is demonstrated convinvingly.
In addition, the authors add the process of capillary rise to the model and show that this improves the model performance in all model variants.
Overall, the presented work is of interest to the readers of Hydrology and Earth System Sciences. The manuscript is well written.
General comments and questions
1. The authors convincingly make an argument for implicit time integration. The forward Euler time stepping used in this work is indeed at a disadvantage if fixed time steps are used. However, it is not clear to me how higher order explicit time integration methods such as schemes from the explicit Runge-Kutta family (RK) would perform in comparison to the implicit one. If I understood correctly, some of the numerical issues mentioned in the manuscript might also be addressed by (adaptive) multistep schemes of this type. The advantage of RK-type schemes in this context is that the number of computations per time step is known a priori. In contrast, the Newton-Raphson iterative solver may require any number of steps until convergence. High order RK schemes, for example the standard RK45 or the adaptive RK-Fehlberg method, could also potentially benefit from the adjoint method presented in this paper to avoid excessive memory usage. Perhaps the authors can comment on this.
2. The authors mention that the Newton-Raphson solver introduces some overhead to the computation. On average, in the results shown in this paper, how many iteration steps were necessary for the solver to converge?Minor comments
1. P.2, L.70: "graphical processing units" should be "graphics processing units"
2. P.3, LL.105ff.: Does "elliptic operator" in this context correspond to the Laplacian? If so, some of the examples might require some annotation. The Saint-Venant equation only contains Laplacian operators if molecular/turbulent diffusion is accounted for. Many forms of the Saint-Venant equation omit these terms, for example (García-Navarro et al., 2019, doi:10.1007/s10652-018-09657-7; LeVeque et al., 2011, doi:10.1017/S0962492911000043).
3. P.3, LL.105ff. (continued) When I looked at the paper by Aboelyazeed et al. (2023) (cited by the authors), I couldn't see Laplacians in the Farquhar model equations.
4. P6, L.209: "The same forcings ... was used" should be "The same forcings ... were used"
5. P.12, L.335: Should it be Eq. (28) instead of Eq. (27)? May be I am misunderstanding something.
6. P.14, L.398: The authors state that the mass balance preservation of the adjoint-driven NN-HBV model might be the reason behind the improved model performance. I don't understand why the mass conservation should significantly differ from the explicit sequential NN-HBV model if the hydrological process representation remains untouched. Is this related to the use of thresholds to avoid negative storages? Can the authors elaborate a bit more?
7. P.24, L.580: The additional computational cost introduced by the implicit solver is quite substantial (18 h vs. 133 h), suggesting either poor convergence or large communication overhead in the implicit scheme.
Citation: https://doi.org/10.5194/hess-2023-258-RC1 - AC1: 'Reply on RC1', Chaopeng Shen, 07 Jan 2024
-
RC2: 'Comment on hess-2023-258', Uwe Ehret, 18 Dec 2023
Dear Editor, dear Authors,
Please find my review in the attachment.
Yours sincerely, Uwe Ehret
-
AC2: 'Reply on RC2', Chaopeng Shen, 07 Jan 2024
We thank Dr. Ehret for the constructive suggestions. Unfortunately over the AGU and winter break time frame the interactive discussion has ended, we wonder if it could be extended so we can have more discussion.
Please see the attached file for our replies, but here briefly, we respectfully do not agree with the "reject" recommendation and the suggestion to run on very small time steps, for several reasons:
1. Here the main paper of the paper is to enable implicit schemes, not to say explicit scheme is not usable. We will revise to clarify this. It is well-known they have different applicable problems.In practice, only using explicit schemes can indeed run into many issues in the context of differentiable modeling on large datasets:
2. Running very small times with automatic differentiation can incur huge GPU memory use, limiting the window length allowable for training.
3. Minibatch parallelism is super important for learning. However, adaptive time stepping schemes that adapts to the numerical characteristics of each basin is not friendly to minibatching on the GPU, which prefers more uniform operations across the batch.
4. there are numerical reasons explored in previous studies (Clark, et al., 2010) why implicit scheme is preferred. This paper's main purpose to enable implicit scheme. Also, matching input forcing functions/dynamic parameters to those tiny time steps require interpolation and can add much to the complexity.
Considering all of these challenges, we argue it is a bit unfair to ask us to perform the explicit simulations on very time steps. Our current opinion is that to run such a scheme at small time steps at large scale is actually quite difficult with current computing constraints. Rather, we welcome the community to show such comparisons. We also came to the current solution not by chance but through a long exploration process. Please see more detail in the attached PDF.
-
AC3: 'Reply on RC2', Chaopeng Shen, 12 Jan 2024
Please allow us to revise our response here
- The main point of the paper is to enable implicit methods, not to discourage explicit ones. Both are useful, and implicit solvers are long known to be very necessary for many problems
- If the editor insists, we can add some hourly model (or RK, but not both) results, but this should not be a reject decision because we can already run this and can show some results rather easily. Matching inputs to the accurate hourly time steps is harder and is out of the scope. Adaptive time stepping is also out of scope.
- We in fact already ran some initial tests. Changing to hourly model also raised the computational cost. at least 50% more computational time and 30% more RAM than the implicit daily model. We are attempting to make the implicit code, which should have a large room of efficiency. The room of improvement seems smaller with the explicit model.
- Minibatch parallelism is super important for learning and we always need to keep this in mind.
- There are numerical reasons explored in previous studies (Clark, et al., 2010) why implicit scheme is preferred.
We thank the reviewer and editor for your considerations.
Citation: https://doi.org/10.5194/hess-2023-258-AC3 -
RC3: 'Reply on AC3', Uwe Ehret, 17 Jan 2024
Dear Authors, dear Editor,
Please see my reply in the attachment.
Yours sincerely, Uwe Ehret
- AC4: 'Reply on RC3', Chaopeng Shen, 03 Feb 2024
-
AC2: 'Reply on RC2', Chaopeng Shen, 07 Jan 2024
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
606 | 227 | 26 | 859 | 20 | 18 |
- HTML: 606
- PDF: 227
- XML: 26
- Total: 859
- BibTeX: 20
- EndNote: 18
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1