Advancement of a blended hydrologic model for robust model performance

Chlumsky, Robert; Mai, Juliane; Craig, James R.; Tolson, Bryan A.

doi:10.5194/hess-2023-69

Preprints

https://doi.org/10.5194/hess-2023-69

Preprints

23 Mar 2023

| 23 Mar 2023

Status: this preprint was under review for the journal HESS but the revision was not accepted.

Advancement of a blended hydrologic model for robust model performance

Robert Chlumsky, Juliane Mai, James R. Craig, and Bryan A. Tolson

Abstract. A blended model structure has emerged as an alternative to the traditional representation of model structure in a hydrologic model, in which multiple algorithmic choices are used to represent some hydrologic process within a model, and are combined within a single model run using a weighted average of process fluxes. This approach has been shown to improve overall model performance, as well as provide an efficient way to test multiple model structures. We propose that a blended model may also be at least a partial solution to the calls for a more robust Community Hydrologic Model, which can mitigate the need for developing new hydrologic models for each catchment and application.

We develop an updated version of the blended model configuration which defines the suite of all possible hydrologic process options in the blended model. Configuration development was guided by model performance for more than 30 different discrete model configurations across 12 MOPEX catchments. Improvements to the blended model include the introduction of blended potential melt and potential evapotranspiration as new process groups, inclusion of non-blended structural changes, and a revision of the process options within each existing group. This leads to a very high-performing model with a mean calibration Kling-Gupta Efficiency (KGE) score of 0.90 and mean validation KGE score of 0.80 across all 12 MOPEX catchments, a substantial improvement in model performance relative to the initial version of 0.06 and 0.07 in calibration and validation, respectively. We test for overfitting of models and find little statistical evidence that increasing the complexity of blended models reduces validation performance. We then select the preferred model configuration as version 2 of the blended model, and test it with 12 independent catchments, which shows a mean calibration and validation score of 0.89 and 0.76, respectively, and improvement over the original model (0.03 in mean calibration KGE score). Version 2 of the blended model is robust across a range of catchments without the need for adjusting its flexible model structure, and may be useful in future hydrology studies and applications alike.

Received: 10 Mar 2023 – Discussion started: 23 Mar 2023

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Robert Chlumsky, Juliane Mai, James R. Craig, and Bryan A. Tolson

Status: closed

AC1: 'Public repository link', Robert Chlumsky, 24 Apr 2023

The supporing repository has been made public to bolster the interactive discussion and review of the manuscript. The repository can be found at the following Github link, and does not require a Github account to view and download files.
https://github.com/rchlumsk/blendedmodel_update_2022
Cheers,
Rob

Citation: https://doi.org/10.5194/hess-2023-69-AC1
RC1:
'Comment on hess-2023-69', Anonymous Referee #1, 02 May 2023
Review of “Advancement of a blended hydrologic model for robust model performance” by Chlumsky et al.
Overview
This paper describes work done to test multiple configurations of a so-called blended hydrological model. A blended model does not use a single equation to calculate a given model flux, but calculates the flux as a weighted average of multiple equations that each estimate this flux in slightly different ways. The goal of this paper is to improve on the original blended model, which uses a specific selection of processes to be modelled and equations by which to model them. The multiple model configurations tested in this paper are calibrated and evaluated in various ways, using data from 12 MOPEX catchments. One of these new configurations is selected as the new version to use.
This paper is fairly easy to follow, though various clarifications could be made. Suggestions for these can be found in the annotated pdf. What I like is that the authors bring in independent data from new catchments to evaluate their findings. Unfortunately, I don’t think the paper’s main contribution (“delivering an improved blended model as version 2 with a demonstrated increase in calibration and validation performance [..]” – l. 395-397) is supported by the analysis shown in the paper. What this paper needs is a much clearer description of the larger picture this work falls into, why the methods employed are justified, and what the scientific contribution of this work is. I apologize in advance for the length of what is to come, but given that I think this paper cannot be published in its current form I want to make clear where I’m coming from.
Detailed comments
To the best of my understanding, the blended model works as follows:
For a given process (such as lateral flow from a soil reservoir), multiple equations are available that describe this process and each of these equations comes with a (set of) parameter(s).

During a simulation, on every time step the estimated process (e.g. lateral flow) is a weighted average of the process estimates of each individual equation (i.e. a blended process).

In this application, both parameters of the process equations as well as the weights used to combine the process equations into a single estimate are calibrated.

In other words, even though the blended process is constructed from multiple different equations, the blended equation is effectively a single storage-flux relationship. The semantics of this approach can be debated, but the result is, in my opinion, a modeling approach that effectively calibrates a single process equation within the bounds provided by the multiple equations that are available for a given process. This approach falls therefore somewhere in between pure machine-learning (where the model structure is only very loosely defined) and traditional modeling (where the model structure is completely fixed, and only parameter values are left to be estimated/calibrated). The main issue I see here is that the methods used in this paper seem to stem mainly from machine-learning approaches and are thus most appropriate for large data samples, whereas the data used (12 MOPEX catchments with another 12 catchments used for testing) stem from hydrological investigations of individual basins. This leads to friction in various aspects of the methodology:
1. Selection of process equations to test are poorly justified
The small sample of catchments suggest that the process equations used in the modeling framework will be selected based on our understanding of which processes are important in these basins, and which equations are appropriate to model these processes. There is some mention in the manuscript that expert knowledge informed at least some of the changes made to the Raven framework used in this paper, but what this knowledge was and how it informed some of these decisions is unclear. More bluntly, the approach largely seems to come down to testing various options for no other reason than that they are available in the Raven framework and seeing what sticks.
If this approach is meant to bridge the gap between machine-learning and traditional models, it must be clarified why and how the modeling equations that are being tested were chosen to be tested in the first place.
This also applies to Section 2.2.4 “Non-blended structural changes”. It is unclear to me why these changes were implemented and how it was decided that they were appropriate to make.
It also needs to be justified why changes are introduced in the order they are introduced in. Perhaps introducing the changes in a different order would lead to different outcomes.
2. Metrics for success are poorly justified
The manuscript uses multiple metrics to assess the success of different model configuration, but it is unclear to me why the main one of these (average KGE scores across the samples) is appropriate. The number of catchments is quite small and performance changes for any of the model configurations seem a mixed bag at best: performance goes up in some catchment and goes down in others. Given that (1) there is considerable sampling uncertainty in these scores at the best of times, and (2) that it is known that KGE scores are difficult to compare between different flow regimes, it must be justified why looking at mean KGE changes is an appropriate thing to do.
This applies especially strongly to sections 3.4 and the conclusions, where the selection of model configuration 36 as the new blended model is declared a success based on very mixed, but on average slightly positive, changes in KGE scores.
3. Results are strongly conditional on practical limitations of the calibration algorithm
The stated goal of this paper is to find a new blended model configuration to use as a default starting point for further work. Hence multiple possible new blended models are calibrated and tested, to select this new best one. The authors are open about the fact that the calibration algorithm sometimes struggles to find the optimum solution during the combined calibration of weights and parameters (line 281-284). However, I think simply stating this is insufficient in this case. According to Table A1, the model selected as the new blended model configuration to use (#36) is a subset of another model configuration (#24). In other words, model configuration #24 has all the capabilities of model configuration #36 and then some, yet it is not the one selected. The reason for this is that model config #24 performs worse than config #36 during calibration (Figure 3) and on the chosen evaluation metrics (Figure 5). Unless I misunderstand something, logically there can be no other reason for the lacking performance of #24 than that the calibration algorithm failed to find the proper weights and parameters.
I do not think that for a study such as this such a weakness in the calibration part of the work can be brushed aside. The small number of catchments adds to this problem, because the calibration outcomes seem somewhat chance-based, and it is impossible to know if these findings would hold across much larger samples.
Summary
None of the items above are necessarily bad on their own (I understand that sometimes one just needs to get started somewhere), but combined they cast serious doubts on the validity and usefulness of this work. Put bluntly, it is unclear to me why the chosen methodology is appropriate for what is being investigated and what scientific advance is being made.
I believe these concerns may possibly be addressed by substantially increasing the number of catchments in the analysis, some re-thinking of how to quantify success, and inclusion of a more conceptual discussion about the purpose of blended models and this investigation in particular, but I think the amount of changes needed go beyond a simple revision.
Citation: https://doi.org/10.5194/hess-2023-69-RC1
- AC3: 'Reply on RC1', Robert Chlumsky, 03 Jul 2023
  
  We wish to thank reviewer #1 for taking the time to review our manuscript and provide valauble feedback.
  An overall response to major comments is provided in the attached response letter; a repsonse to individual comments on the manuscript are provided in the same format as initially provided in the second attachment (see subsequent comment and attachment).
  
  Citation: https://doi.org/10.5194/hess-2023-69-AC3
- AC4: 'Reply on RC1', Robert Chlumsky, 03 Jul 2023
  
  Specific manuscript comments addressed in this second attachment.
  
  Citation: https://doi.org/10.5194/hess-2023-69-AC4
RC2:
'Comment on hess-2023-69', Janneke Remmers, 09 May 2023

This manuscript details the further development of a blended hydrological model in the modular model framework Raven. They tested multiple new configurations to determine what can be an improved set-up. It was a pleasure to read it. As I am familiar with Raven, I appreciate reading about Raven’s capabilities and their further developments. I found it an interesting and well-set-up study. They clearly delineate the importance of their research and their methods are constructed well. I have some questions for clarification and some suggestions regarding the figures. Finally, I have a few textual remarks, among others some typos. I have included this in a separate pdf due to formatting issues.
Kind regards,
Janneke Remmers

Citation: https://doi.org/10.5194/hess-2023-69-RC2
- AC2: 'Reply on RC2', Robert Chlumsky, 12 Jun 2023
  
  We wish to thank reviewer #2 (Janneke Remmers) for taking the time to review our manuscript and provide valauble feedback.
  An organized response to each comment is provided in the attached PDF file.
  
  Citation: https://doi.org/10.5194/hess-2023-69-AC2

Status: closed

AC1: 'Public repository link', Robert Chlumsky, 24 Apr 2023

The supporing repository has been made public to bolster the interactive discussion and review of the manuscript. The repository can be found at the following Github link, and does not require a Github account to view and download files.
https://github.com/rchlumsk/blendedmodel_update_2022
Cheers,
Rob

Citation: https://doi.org/10.5194/hess-2023-69-AC1
RC1:
'Comment on hess-2023-69', Anonymous Referee #1, 02 May 2023
Review of “Advancement of a blended hydrologic model for robust model performance” by Chlumsky et al.
Overview
This paper describes work done to test multiple configurations of a so-called blended hydrological model. A blended model does not use a single equation to calculate a given model flux, but calculates the flux as a weighted average of multiple equations that each estimate this flux in slightly different ways. The goal of this paper is to improve on the original blended model, which uses a specific selection of processes to be modelled and equations by which to model them. The multiple model configurations tested in this paper are calibrated and evaluated in various ways, using data from 12 MOPEX catchments. One of these new configurations is selected as the new version to use.
This paper is fairly easy to follow, though various clarifications could be made. Suggestions for these can be found in the annotated pdf. What I like is that the authors bring in independent data from new catchments to evaluate their findings. Unfortunately, I don’t think the paper’s main contribution (“delivering an improved blended model as version 2 with a demonstrated increase in calibration and validation performance [..]” – l. 395-397) is supported by the analysis shown in the paper. What this paper needs is a much clearer description of the larger picture this work falls into, why the methods employed are justified, and what the scientific contribution of this work is. I apologize in advance for the length of what is to come, but given that I think this paper cannot be published in its current form I want to make clear where I’m coming from.
Detailed comments
To the best of my understanding, the blended model works as follows:
For a given process (such as lateral flow from a soil reservoir), multiple equations are available that describe this process and each of these equations comes with a (set of) parameter(s).

During a simulation, on every time step the estimated process (e.g. lateral flow) is a weighted average of the process estimates of each individual equation (i.e. a blended process).

In this application, both parameters of the process equations as well as the weights used to combine the process equations into a single estimate are calibrated.

In other words, even though the blended process is constructed from multiple different equations, the blended equation is effectively a single storage-flux relationship. The semantics of this approach can be debated, but the result is, in my opinion, a modeling approach that effectively calibrates a single process equation within the bounds provided by the multiple equations that are available for a given process. This approach falls therefore somewhere in between pure machine-learning (where the model structure is only very loosely defined) and traditional modeling (where the model structure is completely fixed, and only parameter values are left to be estimated/calibrated). The main issue I see here is that the methods used in this paper seem to stem mainly from machine-learning approaches and are thus most appropriate for large data samples, whereas the data used (12 MOPEX catchments with another 12 catchments used for testing) stem from hydrological investigations of individual basins. This leads to friction in various aspects of the methodology:
1. Selection of process equations to test are poorly justified
The small sample of catchments suggest that the process equations used in the modeling framework will be selected based on our understanding of which processes are important in these basins, and which equations are appropriate to model these processes. There is some mention in the manuscript that expert knowledge informed at least some of the changes made to the Raven framework used in this paper, but what this knowledge was and how it informed some of these decisions is unclear. More bluntly, the approach largely seems to come down to testing various options for no other reason than that they are available in the Raven framework and seeing what sticks.
If this approach is meant to bridge the gap between machine-learning and traditional models, it must be clarified why and how the modeling equations that are being tested were chosen to be tested in the first place.
This also applies to Section 2.2.4 “Non-blended structural changes”. It is unclear to me why these changes were implemented and how it was decided that they were appropriate to make.
It also needs to be justified why changes are introduced in the order they are introduced in. Perhaps introducing the changes in a different order would lead to different outcomes.
2. Metrics for success are poorly justified
The manuscript uses multiple metrics to assess the success of different model configuration, but it is unclear to me why the main one of these (average KGE scores across the samples) is appropriate. The number of catchments is quite small and performance changes for any of the model configurations seem a mixed bag at best: performance goes up in some catchment and goes down in others. Given that (1) there is considerable sampling uncertainty in these scores at the best of times, and (2) that it is known that KGE scores are difficult to compare between different flow regimes, it must be justified why looking at mean KGE changes is an appropriate thing to do.
This applies especially strongly to sections 3.4 and the conclusions, where the selection of model configuration 36 as the new blended model is declared a success based on very mixed, but on average slightly positive, changes in KGE scores.
3. Results are strongly conditional on practical limitations of the calibration algorithm
The stated goal of this paper is to find a new blended model configuration to use as a default starting point for further work. Hence multiple possible new blended models are calibrated and tested, to select this new best one. The authors are open about the fact that the calibration algorithm sometimes struggles to find the optimum solution during the combined calibration of weights and parameters (line 281-284). However, I think simply stating this is insufficient in this case. According to Table A1, the model selected as the new blended model configuration to use (#36) is a subset of another model configuration (#24). In other words, model configuration #24 has all the capabilities of model configuration #36 and then some, yet it is not the one selected. The reason for this is that model config #24 performs worse than config #36 during calibration (Figure 3) and on the chosen evaluation metrics (Figure 5). Unless I misunderstand something, logically there can be no other reason for the lacking performance of #24 than that the calibration algorithm failed to find the proper weights and parameters.
I do not think that for a study such as this such a weakness in the calibration part of the work can be brushed aside. The small number of catchments adds to this problem, because the calibration outcomes seem somewhat chance-based, and it is impossible to know if these findings would hold across much larger samples.
Summary
None of the items above are necessarily bad on their own (I understand that sometimes one just needs to get started somewhere), but combined they cast serious doubts on the validity and usefulness of this work. Put bluntly, it is unclear to me why the chosen methodology is appropriate for what is being investigated and what scientific advance is being made.
I believe these concerns may possibly be addressed by substantially increasing the number of catchments in the analysis, some re-thinking of how to quantify success, and inclusion of a more conceptual discussion about the purpose of blended models and this investigation in particular, but I think the amount of changes needed go beyond a simple revision.
Citation: https://doi.org/10.5194/hess-2023-69-RC1
- AC3: 'Reply on RC1', Robert Chlumsky, 03 Jul 2023
  
  We wish to thank reviewer #1 for taking the time to review our manuscript and provide valauble feedback.
  An overall response to major comments is provided in the attached response letter; a repsonse to individual comments on the manuscript are provided in the same format as initially provided in the second attachment (see subsequent comment and attachment).
  
  Citation: https://doi.org/10.5194/hess-2023-69-AC3
- AC4: 'Reply on RC1', Robert Chlumsky, 03 Jul 2023
  
  Specific manuscript comments addressed in this second attachment.
  
  Citation: https://doi.org/10.5194/hess-2023-69-AC4
RC2:
'Comment on hess-2023-69', Janneke Remmers, 09 May 2023

This manuscript details the further development of a blended hydrological model in the modular model framework Raven. They tested multiple new configurations to determine what can be an improved set-up. It was a pleasure to read it. As I am familiar with Raven, I appreciate reading about Raven’s capabilities and their further developments. I found it an interesting and well-set-up study. They clearly delineate the importance of their research and their methods are constructed well. I have some questions for clarification and some suggestions regarding the figures. Finally, I have a few textual remarks, among others some typos. I have included this in a separate pdf due to formatting issues.
Kind regards,
Janneke Remmers

Citation: https://doi.org/10.5194/hess-2023-69-RC2
- AC2: 'Reply on RC2', Robert Chlumsky, 12 Jun 2023
  
  We wish to thank reviewer #2 (Janneke Remmers) for taking the time to review our manuscript and provide valauble feedback.
  An organized response to each comment is provided in the attached PDF file.
  
  Citation: https://doi.org/10.5194/hess-2023-69-AC2

Robert Chlumsky, Juliane Mai, James R. Craig, and Bryan A. Tolson

Viewed

Total article views: 2,116 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,511	522	83	2,116	87	99

HTML: 1,511
PDF: 522
XML: 83
Total: 2,116
BibTeX: 87
EndNote: 99

Views and downloads (calculated since 23 Mar 2023)

Month	HTML	PDF	XML	Total
Mar 2023	293	82	2	377
Apr 2023	138	37	3	178
May 2023	66	23	6	95
Jun 2023	37	12	2	51
Jul 2023	48	15	5	68
Aug 2023	17	9	1	27
Sep 2023	30	24	1	55
Oct 2023	15	15	0	30
Nov 2023	13	11	0	24
Dec 2023	8	8	1	17
Jan 2024	15	7	0	22
Feb 2024	5	11	2	18
Mar 2024	22	13	3	38
Apr 2024	15	4	11	30
May 2024	12	11	3	26
Jun 2024	37	9	2	48
Jul 2024	8	5	2	15
Aug 2024	16	6	0	22
Sep 2024	11	3	1	15
Oct 2024	11	3	0	14
Nov 2024	10	3	0	13
Dec 2024	9	5	0	14
Jan 2025	11	8	1	20
Feb 2025	6	7	0	13
Mar 2025	14	5	2	21
Apr 2025	7	11	0	18
May 2025	20	21	0	41
Jun 2025	26	8	3	37
Jul 2025	18	19	2	39
Aug 2025	74	19	4	97
Sep 2025	309	12	2	323
Oct 2025	32	18	2	52
Nov 2025	45	24	5	74
Dec 2025	24	32	5	61
Jan 2026	27	9	9	45
Feb 2026	37	2	1	40
Mar 2026	25	7	1	33
Apr 2026	4	1	5

Cumulative views and downloads (calculated since 23 Mar 2023)

Month	HTML	PDF	XML	Total
Mar 2023	293	82	2	377
Apr 2023	138	37	3	178
May 2023	66	23	6	95
Jun 2023	37	12	2	51
Jul 2023	48	15	5	68
Aug 2023	17	9	1	27
Sep 2023	30	24	1	55
Oct 2023	15	15	0	30
Nov 2023	13	11	0	24
Dec 2023	8	8	1	17
Jan 2024	15	7	0	22
Feb 2024	5	11	2	18
Mar 2024	22	13	3	38
Apr 2024	15	4	11	30
May 2024	12	11	3	26
Jun 2024	37	9	2	48
Jul 2024	8	5	2	15
Aug 2024	16	6	0	22
Sep 2024	11	3	1	15
Oct 2024	11	3	0	14
Nov 2024	10	3	0	13
Dec 2024	9	5	0	14
Jan 2025	11	8	1	20
Feb 2025	6	7	0	13
Mar 2025	14	5	2	21
Apr 2025	7	11	0	18
May 2025	20	21	0	41
Jun 2025	26	8	3	37
Jul 2025	18	19	2	39
Aug 2025	74	19	4	97
Sep 2025	309	12	2	323
Oct 2025	32	18	2	52
Nov 2025	45	24	5	74
Dec 2025	24	32	5	61
Jan 2026	27	9	9	45
Feb 2026	37	2	1	40
Mar 2026	25	7	1	33
Apr 2026	4	1	5

Viewed (geographical distribution)

Total article views: 2,095 (including HTML, PDF, and XML) Thereof 2,095 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 06 Apr 2026

Short summary

A blended model allows multiple hydrologic processes to be represented in a single model, which allows for a model to achieve high performance without the need to modify its structure for different catchments. Here, we improve upon the initial blended version by testing more than 30 blended models in twelve catchments to improve the overall model performance. We validate our proposed, updated blended model version with independent catchments, and make this version available for open use.


Total:	0
HTML:	0
PDF:	0
XML:	0