Articles | Volume 29, issue 21
https://doi.org/10.5194/hess-29-6257-2025
© Author(s) 2025. This work is distributed under the Creative Commons Attribution 4.0 License.
Fully differentiable, fully distributed rainfall-runoff modeling
Download
- Final revised paper (published on 13 Nov 2025)
- Preprint (discussion started on 07 Mar 2025)
Interactive discussion
Status: closed
Comment types: AC – author | RC – referee | CC – community | EC – editor | CEC – chief editor
| : Report abuse
-
RC1: 'Comment on egusphere-2024-4119', Shijie Jiang, 23 Mar 2025
- AC1: 'Reply on RC1', Fedor Scholz, 17 Apr 2025
-
CC1: 'Comment on egusphere-2024-4119', Benedikt Heudorfer, 01 Apr 2025
- AC2: 'Reply on CC1', Fedor Scholz, 17 Apr 2025
-
RC2: 'Comment on egusphere-2024-4119', Peter Nelemans, 08 Apr 2025
-
AC3: 'Reply on RC2', Fedor Scholz, 17 Apr 2025
-
RC3: 'Reply on AC3', Peter Nelemans, 18 Apr 2025
- AC5: 'Reply on RC3', Fedor Scholz, 22 Apr 2025
-
RC3: 'Reply on AC3', Peter Nelemans, 18 Apr 2025
-
AC3: 'Reply on RC2', Fedor Scholz, 17 Apr 2025
-
CC2: 'Comment on egusphere-2024-4119', Tianfang Xu, 15 Apr 2025
- AC4: 'Reply on CC2', Fedor Scholz, 17 Apr 2025
Peer review completion
AR: Author's response | RR: Referee report | ED: Editor decision | EF: Editorial file upload
ED: Publish subject to minor revisions (further review by editor) (06 May 2025) by Daniel Klotz
AR by Fedor Scholz on behalf of the Authors (26 May 2025)
Author's response
Author's tracked changes
Manuscript
ED: Publish subject to revisions (further review by editor and referees) (02 Jun 2025) by Daniel Klotz
ED: Publish subject to revisions (further review by editor and referees) (19 Jul 2025) by Daniel Klotz
AR by Fedor Scholz on behalf of the Authors (21 Jul 2025)
Author's response
Author's tracked changes
Manuscript
ED: Referee Nomination & Report Request started (22 Jul 2025) by Daniel Klotz
RR by Peter Nelemans (20 Aug 2025)
RR by Shijie Jiang (27 Aug 2025)
ED: Publish subject to minor revisions (review by editor) (11 Sep 2025) by Daniel Klotz
AR by Fedor Scholz on behalf of the Authors (23 Sep 2025)
Author's response
Author's tracked changes
Manuscript
ED: Publish as is (28 Sep 2025) by Daniel Klotz
AR by Fedor Scholz on behalf of the Authors (06 Oct 2025)
This is a thoughtful and well-executed study. The authors propose DRRAiNN, a fully differentiable and fully distributed neural architecture for rainfall-runoff modeling. The model design is novel and ambitious. In general, the model is interesting and many components are well motivated, while several parts of the paper would benefit from clearer framing, stronger justification, and more focused discussion. Below are my detailed comments and suggestions.
1. In the Introduction and Related Work, the authors use a large portion (to line 67) to review process-based and data-driven hydrological models. However, much of this content is already well-established and could be condensed. More importantly, the link between the challenges described in the background and the specific research goal of DRRAiNN is not clearly established. If the main goal is to improve the interpretability of ML models or to incorporate physical constraints into neural architectures, then the extended discussion of PBM limitations seems unnecessarily long. The core research motivation (around line 74) is somewhat buried between the general background and the introduction of differentiable modeling.
If the intended contribution is to implement a distributed hydrological model using NNs, then the literature review does not sufficiently acknowledge recent progress in this direction. In the Related Work section, the authors list several NN-based rainfall-runoff models, but the discussion is somewhat narrow. For instance, the paper states that "not many" fully distributed data-driven models exist (without citation), but several relevant models with explicit routing modules have been proposed in the last two years (e.g., https://doi.org/10.1029/2023WR036170, https://doi.org/10.1029/2023WR035337, https://doi.org/10.1016/j.jhydrol.2024.132165). These are not acknowledged or compared.
I suggest shortening the background section, clearly identifying the research gap, and referencing recent distributed or physically guided models to more accurately position DRRAiNN in the current landscape. Importantly, to clarify the contribution, it would be helpful to include a concise statement explaining why a fully differentiable, fully distributed rainfall-runoff model is needed and what specific challenge it addresses, as motivated by prior work.
2. I like the general idea behind the model -- using separate neural networks to handle spatial and temporal dynamics makes a lot of sense. I do have a few suggestions:
I) One thing I found a bit confusing is the use of the term “runoff.” Even though the authors explain it is not the physical runoff, it still might be misleading. The variable comes from the LSTM’s hidden state and goes through several layers, including a CNN that does not consider flow direction or water accumulation. So it is more like a learned feature than an actual variable. Maybe calling it something like “runoff embedding” or “runoff representation” would make things clearer.
II) The model uses solar radiation to model local ET. I get the idea, but I am not fully convinced it is enough. Radiation tends to be spatially smooth, especially at the catchment scale. I would expect that including vegetation-related variables (like LAI, NDVI, or GPP) could help better capture spatial variability in ET.
III) A few model choices could use more explanation. For example, why use hidden size 4 for the LSTM and 8 for the GRUs? These seem small for a model working across space and time.
IV) The link between the rainfall-runoff module and the discharge model (although I think alternative terms might help avoid confusion with traditional concepts) seems functionally effective but conceptually weak. Right now, it is unclear what information the embedding actually carries for downstream discharge estimation, and whether it supports realistic routing behavior. Some clarification on what this embedding is supposed to represent in hydrological terms would be helpful.
V) Since the model emphasizes interpretability, it might be useful to consider whether the internal states could reflect more structured hydrological components. For example, separating fast and slow flow signals, or introducing latent variables that relate to soil moisture or baseflow. I understand the authors may not follow a process-based philosophy, but some explanation of why those processes were not treated as explicit model components, while lateral propagation was, would be helpful.
3. I understand the idea behind using symmetry-based data augmentation for generalization in purely statistical terms, but I am not sure if it makes hydrological sense. Rotating or flipping the DEM and precipitation might result in flow directions that are not physically meaningful, especially since the river network and station layout stay fixed. Some clarification or discussion around this would be useful.
4. In model evaluation, one thing that could make the analysis stronger is to also report metrics for specific types of hydrological conditions — for instance, during rising limbs, low flows, or flood events. This would help clarify whether the model is merely capturing average behavior or actually learning the dynamics that matter most.
5. The current attribution analysis focuses entirely on where the rainfall matters spatially. However, hydrological response is also highly dependent on timing (e.g., time to peak). It might be worth considering how the model distributes attention over past precipitation steps, or whether it systematically over- or under-reacts to delayed signals. Even a simple attention plot or error histogram over lag times could be insightful.
6. The comparison of hydrographs in Fig. 4 only provides a qualitative perspective. I would suggest including some quantitative metrics to support the statements made in Sect. 4.1. For example, it is mentioned that the large peak on day 80 in Lauffen and Rockenau is underestimated by both models - is this meant to suggest that the error comes from the input data (e.g., underestimation in the precipitation forcing)? If so, it would be helpful to state this explicitly. Otherwise, it could still be related to limitations in the model architecture or the autoregressive setup. Also, only one model instance is shown in the figure. I wonder how consistent the five trained seeds are - do they show similar hydrographs, especially in the later part of the prediction window, or is there high variability? Some indication of uncertainty or model spread would help.
7. There are a couple of claims in Section 4.2 that could benefit from clarification or more substantive support.
I) The authors mention that different seeds lead to different behaviors, with some instances performing better on short lead times and others on long lead times. It is not clear how weight initialization alone would systematically bias a model toward short- or long-term prediction. From a modeling perspective, are there specific components (e.g., LSTM, GRU) that are more sensitive to initialization in this regard?
II) The explanation that some stations are more difficult to model due to unobservable underground flows or pipes feels vague and speculative (lines 401-407). Since the authors already emphasize that DRRAiNN is distributed and physically interpretable, it would be more convincing to check whether these “hard-to-predict” stations differ in observable properties in their controlling catchments, such as elevation range, forest cover, drainage density, geology (e.g., using map overlay analysis with the HydroATLAS dataset).
8. I find the idea of reconstructing catchment areas from saliency maps very interesting. However, I have a few concerns about the framing and the strength of the conclusions in Section 4.3:
I) The attribution map shown is only for a single seed, with the justification that all seeds were temporally validated. However, attribution is often highly sensitive to parameter noise, especially for gradient-based methods. It would strengthen the argument to show whether the attributions are consistent across seeds, or to provide a measure of saliency variance.
II) While it is interesting that a known discrepancy exists, there is no real evidence that DRRAiNN “discovered” the underground flow. A more conservative interpretation might be that the model failed to align with the delineated catchment, and this could be due to unmodeled processes. If the authors want to keep this discussion, it would help to at least show whether the model consistently de-emphasizes that region across multiple sequences.
III) In Figure 7, I noticed that most of the examples shown are for stations in smaller headwater catchments. It would be helpful to also evaluate downstream stations, where we would expect the model to aggregate signals over a broader upstream area. If the attributions remain very local in those cases, it might suggest that the model is not truly learning large-scale accumulation, but rather reacting to recent local precipitation.
IV) More generally, it is unclear whether the attribution reflects actual learned flow dynamics, or just highlights locations where recent rainfall occurred. A simple test might be to check whether attribution strength correlates with rainfall intensity rather than hydrologically relevant pathways. This would help clarify whether the model is truly learning how water moves through the network, or just where it rains.
9. The discussion is rich and touches on many aspects of the model’s behavior and potential. However, it currently reads more like a collection of loosely connected observations and future directions (e.g., abrupt shifts from ablation to data choices to flood forecasting), rather than a focused and structured analysis. Some points are speculative without clear support, while others deserve more detailed treatment.
I) The authors mention that DRRAiNN is “not designed for scalability,” but it remains unclear what limits its scalability: is it due to computational costs, architectural complexity (e.g., the combined grid and graph operations), or something else? It would help to provide a clearer picture of the computational resources and time required for training and inference.
II) The discussion includes many potential future directions (e.g., hourly discharge, new inputs, removal of the warm-up phase, ....). While these are all interesting, I would recommend narrowing the focus to 1-2 directions that are most promising or directly tied to the current model's limitations.
III) The observation that the best attribution map does not correspond to the best predictive performance is interesting. However this claim may be based on a small number of seeds, and attribution can be sensitive to initialization. Could this divergence be due to noise or model variance?