Articles | Volume 28, issue 13
https://doi.org/10.5194/hess-28-3051-2024
© Author(s) 2024. This work is distributed under the Creative Commons Attribution 4.0 License.
When ancient numerical demons meet physics-informed machine learning: adjoint-based gradients for implicit differentiable modeling
Download
- Final revised paper (published on 15 Jul 2024)
- Preprint (discussion started on 09 Nov 2023)
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on hess-2023-258', Ilhan Özgen-Xian, 09 Dec 2023
- AC1: 'Reply on RC1', Chaopeng Shen, 07 Jan 2024
-
RC2: 'Comment on hess-2023-258', Uwe Ehret, 18 Dec 2023
- AC2: 'Reply on RC2', Chaopeng Shen, 07 Jan 2024
-
AC3: 'Reply on RC2', Chaopeng Shen, 12 Jan 2024
-
RC3: 'Reply on AC3', Uwe Ehret, 17 Jan 2024
- AC4: 'Reply on RC3', Chaopeng Shen, 03 Feb 2024
-
RC3: 'Reply on AC3', Uwe Ehret, 17 Jan 2024
Peer review completion
AR – Author's response | RR – Referee report | ED – Editor decision | EF – Editorial file upload
ED: Reconsider after major revisions (further review by editor and referees) (10 Feb 2024) by Ralf Loritz
AR by Chaopeng Shen on behalf of the Authors (23 Mar 2024)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (28 Mar 2024) by Ralf Loritz
RR by Ilhan Özgen-Xian (05 Apr 2024)
RR by Uwe Ehret (26 Apr 2024)
ED: Publish as is (06 May 2024) by Ralf Loritz
AR by Chaopeng Shen on behalf of the Authors (16 May 2024)
Dear authors, Dear editor,
Here is my review of the submitted work. I recommend accepting the manuscript after minor revisions.
Kindly
Ilhan Özgen-Xian
Summary
The authors explore the use of the adjoint method to replace automatic differentiation for differentiable models, with application to lumped hydrological modelling with implicit time integration. This is proposed to overcome some of the limitations of the automatic differentiation when using implicit time integration, namely excessive memory usage when using iterative linear system solvers and vanishing gradients.
The authors couple the hydrological model HBV to an LSTM and train the coupled hybrid model (NN-HBV) using CONUS and CAMELS data. The improvement in model performance due to the implicit time integration is demonstrated convinvingly.
In addition, the authors add the process of capillary rise to the model and show that this improves the model performance in all model variants.
Overall, the presented work is of interest to the readers of Hydrology and Earth System Sciences. The manuscript is well written.
General comments and questions
1. The authors convincingly make an argument for implicit time integration. The forward Euler time stepping used in this work is indeed at a disadvantage if fixed time steps are used. However, it is not clear to me how higher order explicit time integration methods such as schemes from the explicit Runge-Kutta family (RK) would perform in comparison to the implicit one. If I understood correctly, some of the numerical issues mentioned in the manuscript might also be addressed by (adaptive) multistep schemes of this type. The advantage of RK-type schemes in this context is that the number of computations per time step is known a priori. In contrast, the Newton-Raphson iterative solver may require any number of steps until convergence. High order RK schemes, for example the standard RK45 or the adaptive RK-Fehlberg method, could also potentially benefit from the adjoint method presented in this paper to avoid excessive memory usage. Perhaps the authors can comment on this.
2. The authors mention that the Newton-Raphson solver introduces some overhead to the computation. On average, in the results shown in this paper, how many iteration steps were necessary for the solver to converge?
Minor comments
1. P.2, L.70: "graphical processing units" should be "graphics processing units"
2. P.3, LL.105ff.: Does "elliptic operator" in this context correspond to the Laplacian? If so, some of the examples might require some annotation. The Saint-Venant equation only contains Laplacian operators if molecular/turbulent diffusion is accounted for. Many forms of the Saint-Venant equation omit these terms, for example (García-Navarro et al., 2019, doi:10.1007/s10652-018-09657-7; LeVeque et al., 2011, doi:10.1017/S0962492911000043).
3. P.3, LL.105ff. (continued) When I looked at the paper by Aboelyazeed et al. (2023) (cited by the authors), I couldn't see Laplacians in the Farquhar model equations.
4. P6, L.209: "The same forcings ... was used" should be "The same forcings ... were used"
5. P.12, L.335: Should it be Eq. (28) instead of Eq. (27)? May be I am misunderstanding something.
6. P.14, L.398: The authors state that the mass balance preservation of the adjoint-driven NN-HBV model might be the reason behind the improved model performance. I don't understand why the mass conservation should significantly differ from the explicit sequential NN-HBV model if the hydrological process representation remains untouched. Is this related to the use of thresholds to avoid negative storages? Can the authors elaborate a bit more?
7. P.24, L.580: The additional computational cost introduced by the implicit solver is quite substantial (18 h vs. 133 h), suggesting either poor convergence or large communication overhead in the implicit scheme.