Technical note: Evaluation and bias correction of an observation-based global runoff dataset using streamflow observations from small tropical catchments in the Philippines

Ibarra, Daniel E.; David, Carlos Primo C.; Tolentino, Pamela Louise M.

doi:https://doi.org/10.5194/hess-25-2805-2021

Articles | Volume 25, issue 5

https://doi.org/10.5194/hess-25-2805-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

https://doi.org/10.5194/hess-25-2805-2021

© Author(s) 2021. This work is distributed under
the Creative Commons Attribution 4.0 License.

Articles | Volume 25, issue 5

Technical note

|

26 May 2021

Technical note |

| 26 May 2021

Technical note: Evaluation and bias correction of an observation-based global runoff dataset using streamflow observations from small tropical catchments in the Philippines

Daniel E. Ibarra, Carlos Primo C. David, and Pamela Louise M. Tolentino

Download

Final revised paper (published on 26 May 2021)
Supplement to the final revised paper
Preprint (discussion started on 24 Feb 2020)
Supplement to the preprint

Interactive discussion

Status: closed

AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment

- Printer-friendly version

- Supplement

RC1: 'Technical Note: Evaluation and bias correction of an observations-based global runoff dataset using historical streamflow observations from small tropical catchments in the Philippines', Jaivime Evaristo, 13 Apr 2020
- AC1: 'Response to Evaristo', Daniel Ibarra, 05 Jul 2020
RC2: 'Review comments', Anonymous Referee #2, 07 Jun 2020
- AC2: 'Response to Anonymous Referee #2', Daniel Ibarra, 05 Jul 2020

Peer-review completion

AR: Author's response | RR: Referee report | ED: Editor decision

ED: Publish subject to revisions (further review by editor and referees) (17 Jul 2020) by Wouter Buytaert

AR by Daniel Ibarra on behalf of the Authors (27 Aug 2020) Author's response Manuscript

ED: Referee Nomination & Report Request started (18 Sep 2020) by Wouter Buytaert

RR by Anonymous Referee #2 (17 Nov 2020)

Suggestions for revision or reasons for rejection

It was nice to read this manuscript again. I still think that the data are unique and that the comparison with the GRUN dataset is useful – even if to just show the errors in blindly using these model outputs. However, I still have several main comments. These include 1) the need for some more comparisons of the observations and GRUN output (e.g., flow duration curves or flow percentiles) to strengthen the analysis, 2) the need to discuss the effects of errors in the rainfall data used in the GRUN ‘simulations’, 3) the mismatch between the interpretations regarding hydrological processes and the monthly time scale of the data, 4) the way the bootstrapping for the bias correction factor is implemented, and 5) the need for rewording some of the text to make it more accurate and improve readability.

Specific comments:
L 100: I still do not understand the description of the data quality categories. What is meant by “the actual gauge height vs height computed”? I assume that you are looking at the rating curves here. Do you mean that xx% of the level observations are within the range of the measurements used to create the rating curve? or something else?
Figure 1: The font is very small – particularly when the figure is rescaled to the journal pages.
L189-197: This section is a mixture of explanations on what is discussed and shown in the next sections and some initial results. I would include the results in section 3.1 and significantly shorten the remainder of the section or remove it completely so that you can use the “word space” to show more comparisons or more thoroughly discuss the results.
L190: Is a VE of 0.50 really a reasonable prediction? Doesn’t it suggest an error of about 50%!!
Section 3.1. Mention the range (and average) of the NSE and NSE(log) values. What are they and for how many of the catchments is it better than 0 or 0.5? This would actually tell me if GRUN has some skill in predicting the flow across a region.
Section 3.1 and 3.2: The structure of these sections is a bit confusing and leads to repetition. It is probably better/more logical to first discuss the pooled data as you do in 3.0, then the range of NSE and NSE(log) values for the individual catchments, the prediction of the average and median flows, the prediction of the interquartile range, and then finally the prediction of the peak flows and minimum flows.
Most of the comparison of the data (and section 3.1) focuses on the peak flows. This is interesting but since this is the absolute peak it is also prone to errors in the data or just a mismatch between the GRUN and this one month with the highest flows. What about also adding a comparison of some other metrics that describe the overall peakflow fits, such as the 95th or 99th percentile of the flow or the 5-year return period monthly flow? I would have liked a comparison of the flow duration curves as well. Overall, there could have been more analyses than just the mean, max and min flow that is currently included. I think that adding a few more comparisons would strengthen the manuscript.
L226: A RMSE of 4.55 mm/d seems very large. Please put these values into perspective. How does this compare to the average flow?
L246-249: The explanation given here seems not plausible. It would be fine if we looked at hourly or daily data but here monthly data are used. It seems very unlikely that for the larger catchments (which are still not very big) the flood events last multiple months! Routing simply isn’t that slow. As far as I know there are also not that many very large lakes in the Philippines that could buffer all this water for the larger catchments. Does it rather mean that small catchments are more dominated by fast flow pathways, such as ssf, and larger catchments by slower pathways, such as groundwater flow? Although I think that streambed infiltration is important in some areas, I am not sure if in such a wet country like the Philippines, there is really that much loss of water from the stream into the aquifers to delay the streamflow response by several months. I like the attempt to describe the differences in terms of hydrological processes (here and on L275-280) but think that the monthly time scale of the data aren’t fully considered in these interpretations. Yes, whether runoff is generated as overland flow or subsurface stormflow has a huge effect on the hourly or 5-min peakflows but for the monthly runoff values, this effect should be fairly small as both flow pathways will transport the water to the stream within the monthly timescale.
The larger issue is likely the rainfall. For larger catchments, the average rainfall intensity and variation in the rainfall is less (due to the averaging over larger areas) and perhaps better predicted or represented by the GSWP precipitation data that are used in GRUN. Add some discussion on what is known about the bias in the GSWP precipitation data – and bias in the variability of precipitation. There is currently no information on how any bias in precipitation for the Philippines in GSWP may have caused the huge bias in the GRUN streamflow. Considering the need for significant rescaling of the GRUN streamflow predictions. It seems that there must be a bias in the input (i.e., rainfall data) used for the streamflow predictions. Otherwise, the mass balance can’t work out. I think that more discussion on this is needed.
L287: I thank the authors for taking up the idea of bootstrapping but think that it is not done correctly here. Taking out individual months from a range of catchments is likely not so helpful because of the large amount of ‘redundant data’ in long time series. The question is how sensitive the bias correction factor is to the choice of the catchments or the number of catchments for which data are available. Thus instead of randomly taking out data points (from different times and different catchments), it would be better to exclude all the data from a certain number of catchments and to then determine how this affects the bias correction factor and the uncertainty in the bias correction factor. In fact, I would suggest that the authors do not only take out a fraction of the catchments for the bootstrapping but also test what the uncertainty of this factor would be if they had only data for one (or two or three or five) catchments per climate zone. This would be helpful for readers from other countries who may not have access to data from so many catchments to determine a bias correction factor.
L291-293: This requires some rewriting as the text and the logic are difficult to follow.
L324: This sentence is not clear. Are you really suggesting that even though the GRUN database was not intended to be used for predicting flow for individual catchments, it can be used that way after bias correction? I don’t think that you can conclude this based on your results!!

Referee Report: PDF

Hide

ED: Publish subject to revisions (further review by editor and referees) (18 Nov 2020) by Wouter Buytaert

AR by Daniel Ibarra on behalf of the Authors (30 Dec 2020) Author's response Author's tracked changes Manuscript

ED: Referee Nomination & Report Request started (19 Jan 2021) by Wouter Buytaert

RR by Anonymous Referee #2 (04 Mar 2021)

Suggestions for revision or reasons for rejection

The manuscript has been significantly improved. I like the new boot strapping results but I don't think that the new analysis of the flow duration curves is entirely correct. There are some other aspects of the manuscript that can be changed to improve the manuscript and to make it more impactful/useful for other researchers.

1) I like the new addition of the comparison of the flow duration curves but I don’t think that you can pool the runoff data from all the catchments and then calculate one flow duration curve from that pool of data. This automatically results in a more extreme flow duration curve (i.e., will be steeper) than the flow duration curves of the catchments in the region as it includes both the most extreme peak flow (from the most extreme catchment) and the most extreme low flow (which probably is for a different catchment). The better analysis would be to calculate the flow duration curve for each catchment individually (as you have already done) and to then calculate for each percentile the mean or median of the values of the flows from the different catchments for that percentile. Alternatively, one could plot the width of the band that spans all the different flow duration curves for the observed data and the GRUN data in a region.

2) Table 2: I suggest to also add the range of VE and NSE values when they are calculated for the individual catchments – and perhaps the mean or median value as well. This could be useful for other researchers and helps to understand the variability in the results. I have a hard time getting my head around what for example the pooled NSE values mean and how much they are biased to some catchments with either really poorly estimated flows or catchments with overall very high runoff. Furthermore, I think that it would be useful if the metrics for the bias – corrected data were given in the same order as for the “raw” GRUN data (or perhaps in several rows below the results for the “raw” data). To understand the RMSE values, it would be useful to know the average or median monthly flows for the different climate zones, so perhaps give them in the caption.

3) At some places it is said that the GRUN dataset is not really meant to be used to predict the discharge for individual catchments but can be used for larger scale questions, such as how much runoff reaches the ocean. This should be mentioned much earlier in the text (see suggestion in the annotated pdf). Furthermore, one of the things that is lacking is therefore a comparison of the total observed flow from all your catchments for the 10 or 20 year period with the most data and the GRUN estimates for this period and in how far the variation in the GRUN estimates of total annual flow from these catchments reflects the variation in the observed annual flow (i.e., what is the VE, r2 and NSE for total flow from all catchments combined per year for this period).

4) I don't understand why for the bias correction, first a constant is subtracted from the simulated runoff, and then a linear regression is fit through the scatter plot of the observed and simulated data and both the constant (intercept) and slope are applied to corrected the simulated data. Would the intercept not take care of the bias that you try to fix with the subtraction of the constant first?

5) L265: I don’t understand why averaging to monthly values would increase the bias. I would rather expect it to decrease the bias as a day with too much flow would be compensated by a day that has too little flow (regression towards the mean). I expect averaging to monthly values to lead to less extreme values and less bias than for e.g., daily values. A bit more explanation on your reasoning for this would be useful.

6) L275-280: This part doesn’t fit well here. Merge with the text in the conclusion. But above all, try to be more realistic and don’t oversell the fit for the GRUN data.

7) More technical or editorial suggestions to further improve the text are given in the annotated pdf. Note that I don’t expect a formal response to these suggestions – nor that you implement all of them.

Referee Report: PDF

Hide

ED: Publish subject to minor revisions (review by editor) (15 Mar 2021) by Wouter Buytaert

AR by Daniel Ibarra on behalf of the Authors (02 Apr 2021) Author's response Author's tracked changes Manuscript

ED: Publish subject to technical corrections (04 May 2021) by Wouter Buytaert

AR by Daniel Ibarra on behalf of the Authors (06 May 2021) Author's response Manuscript

Short summary

We evaluate a recently published global product of monthly runoff using streamflow data from small tropical catchments in the Philippines. Using monthly runoff observations from catchments, we tested for correlation and prediction. We demonstrate the potential utility of this product in assessing trends in regional-scale runoff, as well as look at the correlation of phenomenon such as the El Niño–Southern Oscillation on streamflow in this wet but drought-prone archipelago.