the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Bayesian updating framework for calibrating hydrological parameters of road network using taxi GPS data
Jiawen Yang
Ke Xu
Bo Dong
Shan Jiang
Abstract. Hydrological parameters should pass through a careful calibration procedure before aiding decision-making. However, great difficulties are encountered when applying calibration methods to regions where runoff data are inadequate. To fill the gap of hydrological calibration for the ungaged road network, we proposed a Bayesian updating framework to calibrate hydrological parameters based on taxi GPS data. Hydrological parameters are calibrated by adjusting their values such that the runoff generated by the acceptable parameter sets could yield the road disruption period during which no taxi points are observed. The method is validated through 10 flood-prone roads in Shenzhen, and the result reveals that the trends of runoff could be correctly predicted for 8 out of 10 roads. This study shows that integration of hydrological model and taxi GPS data suggests viable alternative measures for the model calibration, and provides actionable insights for flood hazard mitigation.
Xiangfu Kong et al.
Status: final response (author comments only)
-
RC1: 'Comment on hess-2023-7', Jeffrey M Sadler, 05 Apr 2023
This manuscript presented a framework for calibrating a hydrologic model based on taxi data. Â The concept is quite clever, in my opinion. Â There of course is a need to calibrate hydrologic models and, at the same time, a general lack of data needed to calibrate. Â Using taxi data for the calibration is a neat idea and I think the authors did a good job showing the reader the feasibility of this. Â Overall, I think the manuscript is clear, well written, and technically sound. Â There are a few items I think should be addressed before being accepted for publication.Â
Major points:
I have four main questions/concerns:
1 - why are you calibrating time of concentration and catchment area? These are parameters that I would not typically see calibrated. It seems like you could estimate catchment area from a DEM. Similarly, there are many methods for estimating time of concentration from catchment characteristics. Because these two parameters are relatively reasonable to calculate/estimate, I'd like to understand the author's reasoning for calibrating them.
2 - why is a curve number of 85 used for every case? This seems pretty consequential since the CN could vary between catchments. Should this be a calibrated parameter?
3 - Assuming it is reasonable to calibrate the catchment area and time of concentration, I question whether it's reasonable to have uniform priors for those parameters. Maybe you can't exactly know what the area of a catchment is going to be, but would you have enough of a guess to make a reasonable prior distribution? I'm guessing you'd know if a road segment has a relatively large or small catchment. Knowing this, it doesn't seem right to keep the prior distribution uniform.
4 - It may be that I didn't understand correctly, but how did you account for time of day/ day of week when considering whether or not a taxi would be passing? Or did you? For example, let's say that at a given roadway segment, there is a day and time of the week that there are hardly any taxis. Can you take that into account in your calibration scheme so that a lack of taxis then does not suggest to the model that the roadway is flooded?Minor points:
- Figure 5 - does it make sense to have intermittent "have taxis" and "no taxi" times after a large rain event? I guess I'm just wondering at graph (C) in particular where it looks like there is just one taxi between 16:15-16:20. Does that mean that one taxi is just really willing to risk it and drive through the water? If it's just one taxi, should it really be counted as "have taxi"?
- Table 4 - if you had 171 flood gaging sites, why did you only pick 10 to test the model on? Why not test it on all 171?
- l355 - how did you make a rating curve for each road? How did you get the flow data to relate the stage data to?Â
- Section 4.2 - I personally don't think you need this section. While it's interesting to see how you applied the framework, I don't think it is needed. I think it is enough to have described (section 2), illustrated (section 3), and validated (section 4) the method.Â
- L54: You might consider citing the following since they are related to this topic (full disclosure: I am an author on both):
  - Sadler, J. M., Goodall, J. L., Morsy, M. M., & Spencer, K. (2018). Modeling urban coastal flood severity from crowd-sourced flood reports using Poisson regression and Random Forest. Journal of hydrology, 559, 43-55.
  - Zahura, F. T., Goodall, J. L., Sadler, J. M., Shen, Y., Morsy, M. M., & Behl, M. (2020). Training machine learning surrogate models from a high-fidelity physics-based model: Application for real-time street-scale flood prediction in an urban coastal community. Water Resources Research, 56, e2019WR027038. https://doi.org/10.1029/2019WR027038
- Figure 10: Could you explain why for some runoff values there is more than one level value? For empirically derived rating curves, each runoff value corresponds to only one water level.Editorial comments:
- l23 - suggest changing "metropolis" to plural "metropolises"
- l31 - suggest changing "false" to "incomplete" or "over-simplified"
- l60 - suggest changing "critical" to "useful"
- l87 - I do not think you need to define a hydrograph. I think you can safely assume HESS readers will know what a hydrograph is.
- l141 - "can absorb *a* light shower" (add "a")
- l154 - I suggest changing "converts rainfall excess to direct runoff" to "converts rainfall excess to a temporal distribution of direct runoff" or something like that to communicate that it is a distribution of runoff over time.
- l160 - "the paucity of runoff" instead of "the paucity of the runoff"
- l161 - "sparkled" is probably not the right word here. Maybe "sparked" or "motivated"
- l191 - "road" instead of "rood"
- l195 - "equals the probability" instead of "equals to the probability"
- l197 - suggest "impossible" instead of "difficult" because I think it is actually impossible to "obtain precise knowledge of all taxi-flooded intersections"
- Table 1: Is it correct to have the "/"s for Feature in several of the rows? If so, maybe you should define that means.
- l295 - suggest changing "a little bit" to "slightly" or something similar. "a little bit" is imprecise and colloquial
- l308 - "waterlogging" is not a term I typically hear. Do you mean something like "flood-prone?"
- Figure 9 - is the x-axis "Time of Concentration?" If so, please change. I didn't know what "Time" meant.
- l396 - suggest replace "great" with "good"
- l434 - suggest remove "great" to read "This study illustrates the potential ... "Citation: https://doi.org/10.5194/hess-2023-7-RC1 - AC1: 'Reply on RC1', Xiangfu Kong, 06 May 2023
-
RC2: 'Comment on hess-2023-7', Anonymous Referee #2, 07 Apr 2023
The paper presents a novel approach for calibrating an urban rainfall-runoff model using taxi GPS data. This is an original idea that seems to have potential as demonstrated in this study. I also commend the authors for making available their data and code.
Comments:
-What's the reason for modeling the taxi data as pass/no pass instead of directly modeling the number of taxis passing? The latter somehow seems more obvious since the original data are taxi counts, while your approach first requires converting taxi counts to 0/1 values, which introduces a potential loss of information. Please better justify this modeling choice.
-An alternative approach would be to use the Poisson distribution to directly model the number of taxis passing (rather than arriving). Have you considered this? This would be more like a Poisson regression model, but perhaps leveraging your road disruption function to model lambda instead of the usual Poisson link function.
-Did you check whether the Poisson model for the number of taxis arriving at a road is a good assumption for your data?
-A limitation is that all variables are treated as discrete random variables whereas the hydrological model parameters are continuous. Why discretize the parameters?
-Does the model account for other (non-flooding) factors that may affect the number of taxis in a road, e.g. time of day (rush hour)?
-Curve number CN is kept fixed even though it is also uncertain.
-Section 2.1: a more common/general way is to write Bayes equation directly in terms of parameters theta, as in p(theta|X) \propto p(theta)*p(X|theta) or p(theta|X) \propto p(theta)*L(theta|X). The likelihood on the rhs of eq. 4 in the paper would then be written in terms of a function omega(theta) given by your eq. 16.
Edits:
-eq. 11: please define x and y
-L23: metropolis --> metropolises or metropolitan areas
-L40: "calibrated on runoff data alone" - there are many studies that calibrate on other data as well
-L47: ungaged vs ungauged: pick one spelling
-L83 (and other places): equals to --> equals
-L90: arriving --> arrival
-L99: does index i refer to road i?
-L132: instantization --> instantiation
-suggest to proofread entire manuscript to fix issues with use of EnglishCitation: https://doi.org/10.5194/hess-2023-7-RC2 - AC2: 'Reply on RC2', Xiangfu Kong, 06 May 2023
- AC1: 'Reply on RC1', Xiangfu Kong, 06 May 2023
Xiangfu Kong et al.
Data sets
Data and code used in the article titled " A Bayesian updating framework for calibrating hydrological parameters of road network using taxi GPS data" Xiangfu Kong https://doi.org/10.5281/zenodo.7294880
Model code and software
Data and code used in the article titled " A Bayesian updating framework for calibrating hydrological parameters of road network using taxi GPS data" Xiangfu Kong https://doi.org/10.5281/zenodo.7294880
Xiangfu Kong et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
391 | 69 | 13 | 473 | 2 | 2 |
- HTML: 391
- PDF: 69
- XML: 13
- Total: 473
- BibTeX: 2
- EndNote: 2
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1