the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A Bayesian updating framework for calibrating hydrological parameters of road network using taxi GPS data
Jiawen Yang
Ke Xu
Bo Dong
Shan Jiang
Abstract. Hydrological parameters should pass through a careful calibration procedure before aiding decisionmaking. However, great difficulties are encountered when applying calibration methods to regions where runoff data are inadequate. To fill the gap of hydrological calibration for the ungaged road network, we proposed a Bayesian updating framework to calibrate hydrological parameters based on taxi GPS data. Hydrological parameters are calibrated by adjusting their values such that the runoff generated by the acceptable parameter sets could yield the road disruption period during which no taxi points are observed. The method is validated through 10 floodprone roads in Shenzhen, and the result reveals that the trends of runoff could be correctly predicted for 8 out of 10 roads. This study shows that integration of hydrological model and taxi GPS data suggests viable alternative measures for the model calibration, and provides actionable insights for flood hazard mitigation.
 Preprint
(1572 KB)  Metadata XML
 BibTeX
 EndNote
Xiangfu Kong et al.
Status: closed

RC1: 'Comment on hess20237', Jeffrey M Sadler, 05 Apr 2023
This manuscript presented a framework for calibrating a hydrologic model based on taxi data. Â The concept is quite clever, in my opinion. Â There of course is a need to calibrate hydrologic models and, at the same time, a general lack of data needed to calibrate. Â Using taxi data for the calibration is a neat idea and I think the authors did a good job showing the reader the feasibility of this. Â Overall, I think the manuscript is clear, well written, and technically sound. Â There are a few items I think should be addressed before being accepted for publication.Â
Major points:
I have four main questions/concerns:
1  why are you calibrating time of concentration and catchment area? These are parameters that I would not typically see calibrated. It seems like you could estimate catchment area from a DEM. Similarly, there are many methods for estimating time of concentration from catchment characteristics. Because these two parameters are relatively reasonable to calculate/estimate, I'd like to understand the author's reasoning for calibrating them.
2  why is a curve number of 85 used for every case? This seems pretty consequential since the CN could vary between catchments. Should this be a calibrated parameter?
3  Assuming it is reasonable to calibrate the catchment area and time of concentration, I question whether it's reasonable to have uniform priors for those parameters. Maybe you can't exactly know what the area of a catchment is going to be, but would you have enough of a guess to make a reasonable prior distribution? I'm guessing you'd know if a road segment has a relatively large or small catchment. Knowing this, it doesn't seem right to keep the prior distribution uniform.
4  It may be that I didn't understand correctly, but how did you account for time of day/ day of week when considering whether or not a taxi would be passing? Or did you? For example, let's say that at a given roadway segment, there is a day and time of the week that there are hardly any taxis. Can you take that into account in your calibration scheme so that a lack of taxis then does not suggest to the model that the roadway is flooded?Minor points:
 Figure 5  does it make sense to have intermittent "have taxis" and "no taxi" times after a large rain event? I guess I'm just wondering at graph (C) in particular where it looks like there is just one taxi between 16:1516:20. Does that mean that one taxi is just really willing to risk it and drive through the water? If it's just one taxi, should it really be counted as "have taxi"?
 Table 4  if you had 171 flood gaging sites, why did you only pick 10 to test the model on? Why not test it on all 171?
 l355  how did you make a rating curve for each road? How did you get the flow data to relate the stage data to?Â
 Section 4.2  I personally don't think you need this section. While it's interesting to see how you applied the framework, I don't think it is needed. I think it is enough to have described (section 2), illustrated (section 3), and validated (section 4) the method.Â
 L54: You might consider citing the following since they are related to this topic (full disclosure: I am an author on both):
Â Â  Sadler, J. M., Goodall, J. L., Morsy, M. M., & Spencer, K. (2018). Modeling urban coastal flood severity from crowdsourced flood reports using Poisson regression and Random Forest. Journal of hydrology, 559, 4355.
Â Â  Zahura, F. T., Goodall, J. L., Sadler, J. M., Shen, Y., Morsy, M. M., & Behl, M. (2020). Training machine learning surrogate models from a highfidelity physicsbased model: Application for realtime streetscale flood prediction in an urban coastal community. Water Resources Research, 56, e2019WR027038. https://doi.org/10.1029/2019WR027038
 Figure 10: Could you explain why for some runoff values there is more than one level value? For empirically derived rating curves, each runoff value corresponds to only one water level.Editorial comments:
 l23  suggest changing "metropolis" to plural "metropolises"
 l31  suggest changing "false" to "incomplete" or "oversimplified"
 l60  suggest changing "critical" to "useful"
 l87  I do not think you need to define a hydrograph. I think you can safely assume HESS readers will know what a hydrograph is.
 l141  "can absorb *a* light shower" (add "a")
 l154  I suggest changing "converts rainfall excess to direct runoff" to "converts rainfall excess to a temporal distribution of direct runoff" or something like that to communicate that it is a distribution of runoff over time.
 l160  "the paucity of runoff" instead of "the paucity of the runoff"
 l161  "sparkled" is probably not the right word here. Maybe "sparked" or "motivated"
 l191  "road" instead of "rood"
 l195  "equals the probability" instead of "equals to the probability"
 l197  suggest "impossible" instead of "difficult" because I think it is actually impossible to "obtain precise knowledge of all taxiflooded intersections"
 Table 1: Is it correct to have the "/"s for Feature in several of the rows? If so, maybe you should define that means.
 l295  suggest changing "a little bit" to "slightly" or something similar. "a little bit" is imprecise and colloquial
 l308  "waterlogging" is not a term I typically hear. Do you mean something like "floodprone?"
 Figure 9  is the xaxis "Time of Concentration?" If so, please change. I didn't know what "Time" meant.
 l396  suggest replace "great" with "good"
 l434  suggest remove "great" to read "This study illustrates the potential ... "Citation: https://doi.org/10.5194/hess20237RC1  AC1: 'Reply on RC1', Xiangfu Kong, 06 May 2023

RC2: 'Comment on hess20237', Anonymous Referee #2, 07 Apr 2023
The paper presents a novel approach for calibrating an urban rainfallrunoff model using taxi GPS data. This is an original idea that seems to have potential as demonstrated in this study. I also commend the authors for making available their data and code.
Comments:
What's the reason for modeling the taxi data as pass/no pass instead of directly modeling the number of taxis passing? The latter somehow seems more obvious since the original data are taxi counts, while your approach first requires converting taxi counts to 0/1 values, which introduces a potential loss of information. Please better justify this modeling choice.
An alternative approach would be to use the Poisson distribution to directly model the number of taxis passing (rather than arriving). Have you considered this? This would be more like a Poisson regression model, but perhaps leveraging your road disruption function to model lambda instead of the usual Poisson link function.
Did you check whether the Poisson model for the number of taxis arriving at a road is a good assumption for your data?
A limitation is that all variables are treated as discrete random variables whereas the hydrological model parameters are continuous. Why discretize the parameters?
Does the model account for other (nonflooding) factors that may affect the number of taxis in a road, e.g. time of day (rush hour)?
Curve number CN is kept fixed even though it is also uncertain.
Section 2.1: a more common/general way is to write Bayes equation directly in terms of parameters theta, as in p(thetaX) \propto p(theta)*p(Xtheta) or p(thetaX) \propto p(theta)*L(thetaX). The likelihood on the rhs of eq. 4 in the paper would then be written in terms of a function omega(theta) given by your eq. 16.
Edits:
eq. 11: please define x and y
L23: metropolis > metropolises or metropolitan areas
L40: "calibrated on runoff data alone"  there are many studies that calibrate on other data as well
L47: ungaged vs ungauged: pick one spelling
L83 (and other places): equals to > equals
L90: arriving > arrival
L99: does index i refer to road i?
L132: instantization > instantiation
suggest to proofread entire manuscript to fix issues with use of EnglishCitation: https://doi.org/10.5194/hess20237RC2  AC2: 'Reply on RC2', Xiangfu Kong, 06 May 2023
 AC1: 'Reply on RC1', Xiangfu Kong, 06 May 2023
Status: closed

RC1: 'Comment on hess20237', Jeffrey M Sadler, 05 Apr 2023
This manuscript presented a framework for calibrating a hydrologic model based on taxi data. Â The concept is quite clever, in my opinion. Â There of course is a need to calibrate hydrologic models and, at the same time, a general lack of data needed to calibrate. Â Using taxi data for the calibration is a neat idea and I think the authors did a good job showing the reader the feasibility of this. Â Overall, I think the manuscript is clear, well written, and technically sound. Â There are a few items I think should be addressed before being accepted for publication.Â
Major points:
I have four main questions/concerns:
1  why are you calibrating time of concentration and catchment area? These are parameters that I would not typically see calibrated. It seems like you could estimate catchment area from a DEM. Similarly, there are many methods for estimating time of concentration from catchment characteristics. Because these two parameters are relatively reasonable to calculate/estimate, I'd like to understand the author's reasoning for calibrating them.
2  why is a curve number of 85 used for every case? This seems pretty consequential since the CN could vary between catchments. Should this be a calibrated parameter?
3  Assuming it is reasonable to calibrate the catchment area and time of concentration, I question whether it's reasonable to have uniform priors for those parameters. Maybe you can't exactly know what the area of a catchment is going to be, but would you have enough of a guess to make a reasonable prior distribution? I'm guessing you'd know if a road segment has a relatively large or small catchment. Knowing this, it doesn't seem right to keep the prior distribution uniform.
4  It may be that I didn't understand correctly, but how did you account for time of day/ day of week when considering whether or not a taxi would be passing? Or did you? For example, let's say that at a given roadway segment, there is a day and time of the week that there are hardly any taxis. Can you take that into account in your calibration scheme so that a lack of taxis then does not suggest to the model that the roadway is flooded?Minor points:
 Figure 5  does it make sense to have intermittent "have taxis" and "no taxi" times after a large rain event? I guess I'm just wondering at graph (C) in particular where it looks like there is just one taxi between 16:1516:20. Does that mean that one taxi is just really willing to risk it and drive through the water? If it's just one taxi, should it really be counted as "have taxi"?
 Table 4  if you had 171 flood gaging sites, why did you only pick 10 to test the model on? Why not test it on all 171?
 l355  how did you make a rating curve for each road? How did you get the flow data to relate the stage data to?Â
 Section 4.2  I personally don't think you need this section. While it's interesting to see how you applied the framework, I don't think it is needed. I think it is enough to have described (section 2), illustrated (section 3), and validated (section 4) the method.Â
 L54: You might consider citing the following since they are related to this topic (full disclosure: I am an author on both):
Â Â  Sadler, J. M., Goodall, J. L., Morsy, M. M., & Spencer, K. (2018). Modeling urban coastal flood severity from crowdsourced flood reports using Poisson regression and Random Forest. Journal of hydrology, 559, 4355.
Â Â  Zahura, F. T., Goodall, J. L., Sadler, J. M., Shen, Y., Morsy, M. M., & Behl, M. (2020). Training machine learning surrogate models from a highfidelity physicsbased model: Application for realtime streetscale flood prediction in an urban coastal community. Water Resources Research, 56, e2019WR027038. https://doi.org/10.1029/2019WR027038
 Figure 10: Could you explain why for some runoff values there is more than one level value? For empirically derived rating curves, each runoff value corresponds to only one water level.Editorial comments:
 l23  suggest changing "metropolis" to plural "metropolises"
 l31  suggest changing "false" to "incomplete" or "oversimplified"
 l60  suggest changing "critical" to "useful"
 l87  I do not think you need to define a hydrograph. I think you can safely assume HESS readers will know what a hydrograph is.
 l141  "can absorb *a* light shower" (add "a")
 l154  I suggest changing "converts rainfall excess to direct runoff" to "converts rainfall excess to a temporal distribution of direct runoff" or something like that to communicate that it is a distribution of runoff over time.
 l160  "the paucity of runoff" instead of "the paucity of the runoff"
 l161  "sparkled" is probably not the right word here. Maybe "sparked" or "motivated"
 l191  "road" instead of "rood"
 l195  "equals the probability" instead of "equals to the probability"
 l197  suggest "impossible" instead of "difficult" because I think it is actually impossible to "obtain precise knowledge of all taxiflooded intersections"
 Table 1: Is it correct to have the "/"s for Feature in several of the rows? If so, maybe you should define that means.
 l295  suggest changing "a little bit" to "slightly" or something similar. "a little bit" is imprecise and colloquial
 l308  "waterlogging" is not a term I typically hear. Do you mean something like "floodprone?"
 Figure 9  is the xaxis "Time of Concentration?" If so, please change. I didn't know what "Time" meant.
 l396  suggest replace "great" with "good"
 l434  suggest remove "great" to read "This study illustrates the potential ... "Citation: https://doi.org/10.5194/hess20237RC1  AC1: 'Reply on RC1', Xiangfu Kong, 06 May 2023

RC2: 'Comment on hess20237', Anonymous Referee #2, 07 Apr 2023
The paper presents a novel approach for calibrating an urban rainfallrunoff model using taxi GPS data. This is an original idea that seems to have potential as demonstrated in this study. I also commend the authors for making available their data and code.
Comments:
What's the reason for modeling the taxi data as pass/no pass instead of directly modeling the number of taxis passing? The latter somehow seems more obvious since the original data are taxi counts, while your approach first requires converting taxi counts to 0/1 values, which introduces a potential loss of information. Please better justify this modeling choice.
An alternative approach would be to use the Poisson distribution to directly model the number of taxis passing (rather than arriving). Have you considered this? This would be more like a Poisson regression model, but perhaps leveraging your road disruption function to model lambda instead of the usual Poisson link function.
Did you check whether the Poisson model for the number of taxis arriving at a road is a good assumption for your data?
A limitation is that all variables are treated as discrete random variables whereas the hydrological model parameters are continuous. Why discretize the parameters?
Does the model account for other (nonflooding) factors that may affect the number of taxis in a road, e.g. time of day (rush hour)?
Curve number CN is kept fixed even though it is also uncertain.
Section 2.1: a more common/general way is to write Bayes equation directly in terms of parameters theta, as in p(thetaX) \propto p(theta)*p(Xtheta) or p(thetaX) \propto p(theta)*L(thetaX). The likelihood on the rhs of eq. 4 in the paper would then be written in terms of a function omega(theta) given by your eq. 16.
Edits:
eq. 11: please define x and y
L23: metropolis > metropolises or metropolitan areas
L40: "calibrated on runoff data alone"  there are many studies that calibrate on other data as well
L47: ungaged vs ungauged: pick one spelling
L83 (and other places): equals to > equals
L90: arriving > arrival
L99: does index i refer to road i?
L132: instantization > instantiation
suggest to proofread entire manuscript to fix issues with use of EnglishCitation: https://doi.org/10.5194/hess20237RC2  AC2: 'Reply on RC2', Xiangfu Kong, 06 May 2023
 AC1: 'Reply on RC1', Xiangfu Kong, 06 May 2023
Xiangfu Kong et al.
Data sets
Data and code used in the article titled " A Bayesian updating framework for calibrating hydrological parameters of road network using taxi GPS data" Xiangfu Kong https://doi.org/10.5281/zenodo.7294880
Model code and software
Data and code used in the article titled " A Bayesian updating framework for calibrating hydrological parameters of road network using taxi GPS data" Xiangfu Kong https://doi.org/10.5281/zenodo.7294880
Xiangfu Kong et al.
Viewed
HTML  XML  Total  BibTeX  EndNote  

471  89  16  576  5  5 
 HTML: 471
 PDF: 89
 XML: 16
 Total: 576
 BibTeX: 5
 EndNote: 5
Viewed (geographical distribution)
Country  #  Views  % 

Total:  0 
HTML:  0 
PDF:  0 
XML:  0 
 1