04 Feb 2021
04 Feb 2021
Bias-correcting individual inputs prior to combined calibration leads to more skillful forecasts of reference crop evapotranspiration
- Department of Infrastructure Engineering, The University of Melbourne, Parkville 3010, Australia
- Department of Infrastructure Engineering, The University of Melbourne, Parkville 3010, Australia
Abstract. Reference crop evapotranspiration (ETo) is calculated using a standard formula with temperature, vapor pressure, solar radiation, and wind speed as input variables. ETo forecasts can be produced when forecasts of these input variables from numerical weather prediction (NWP) models are available. As raw ETo forecasts are often subjective to systematic errors, calibration is necessary for improving forecast quality. The most straightforward and widely used approach is to directly calibrate raw ETo forecasts constructed with the raw forecasts of input variables. However, the potential predictability of ETo may not be fully explored by this approach, which ignores the non-linear interactions of input variables in constructing ETo forecasts. We hypothesize that reducing errors in individual inputs as a precursor to ETo forecast calibration will lead to more skillful ETo forecasts. To test this hypothesis, we evaluate two calibration strategies, including (i) calibration directly applied to raw ETo forecasts constructed with raw forecasts of input variables, and (ii) bias-correcting input variables first, and then calibrating the ETo forecasts constructed with bias-corrected input variables. We calibrate ETo forecasts based on weather forecasts of the Australian Community Climate and Earth System Simulator G2 version (ACCESS-G2). Calibrated ETo forecasts with bias-corrected input variables (strategy ii) demonstrate lower bias, higher correlation coefficient, and higher skills than the calibration based on raw input variables (strategy i). This investigation indicates that improving raw forecasts of input variables could enhance ETo forecast calibration and produce more skillful ETo forecasts. This calibration strategy is expected to enhance future NWP-based ETo forecasting.
- Preprint
(1033 KB) -
Supplement
(3520 KB) - BibTeX
- EndNote
Qichun Yang et al.
Status: open (until 01 Apr 2021)
-
RC1: 'Comment on hess-2021-69', Anonymous Referee #1, 24 Feb 2021
reply
Review to Yang et ., 2021, Bias-correcting individual inputs prior to combined calibration leads to more skillful forecasts of reference crop evapotranspiration. HESSD. In this study, the authors investigated a critical issue in the forecasting of short-term reference crop evapotranspiration (ETo) based on NWP outputs. It is getting popular that weather forecasts from NWP models are used to predict water loss through evapotranspiration. Such information is highly valuable for the effective management of water resources, particularly in arid/semi-arid regions. This investigation develops a new methodology that effectively corrects errors in ETo forecasts, and adds extra skills to statistical calibration. I believe this new post-processing strategy could benefit future NWP-based ETo forecasting. To improve this work, the authors should pay special attention to the following key issues: 1, Presentation of the results could be improved. Currently, the authors use maps to show/compare results from different model experiments. These figures could demonstrate the spatial patterns of modeling results. However, it might be more useful if the authors could summarize regional results in a different way, such as using box-plots. I believe that will better show readers the overall statistical information across the whole country than simply plotting the results as maps. 2, Implications for ETo forecasting at the monthly or seasonal scales should be further discussed. ETo forecasting based on monthly or seasonal climate forecasts from GCMs is also widely performed. This study develops the new strategy for short-term forecasts. The applicability of this method to ETo forecasting based on GCM forecasts should be briefly discussed, to benefit a broader range of readers. Specific comments: Line 20, rewrite this sentence. Not clear Line 74 Calibrate->calibrate Line 80 compiled as the inputsâ¦.. Line 95 10m -> 10 m. Line 107-108, need to clarify what the anomaly and climatological mean are referring to Line 165 consider rewriting this sentence. Does not read well. Line 172, what is specific month Figures in Results: shouldnât the figures be centralized? Line 360, not calibrate directly, should be without correcting forecasts of the inputs Line 365, consider rewriting this sentence Line 377-378, two âcalibration modelsâ consider to rewrite Line 385, in the calibrated forecasts Line 386, consider making it shorter and clearer
-
RC2: 'Comment on hess-2021-69', Anonymous Referee #2, 24 Feb 2021
reply
Comments on âBias-correcting individual inputs prior to combined calibration leads to more skillful forecasts of reference crop evapotranspirationâ by Yang et al. This study evaluated two calibration strategies for simulating reference crop evapotranspiration. The two strategies are (1) calibration directly applied to raw ETo forecast constructed with raw forecast of input variables; (2) bias-correcting input variables. The bias-correcting algorithm has been proved to be more feasible. Although this study is of significance, improvements and revision can make the study stronger and more compelling. Core of my concerns is the results presentation and discussion, many sections are superficial; the results are simply described, more insightful explanation and discussion are needed. See below for my suggestion. A moderate revision can easily address these comments. So I suggest a moderate revision. Lines 11, fully implemented. Line 27, âdivergentâ emphasizes completely different assumption, you can just use replace it different to ensure a general term. Line 38, physical processes of the atmosphere, it is unclear, atmospheric circulation or atmospheric wind formation, or physical processes in the atmosphere Section 3.1, 3.2, the authors described the results in the figures. However, most of those text are vague, please provide more specific (quantitative) information to support your statement. When you compare different results or method, it is better to report some statistic results (p value, r2, etc). for example, line line 223-225, you report the overprediction in Tmax, and underpredict in Tmin in different regions. If it is underprediction, what is the range of that underprediction, same for overprediction, are these different statistically significant? There are many similar issues in other sections. In the discussion section, I would be willing to see a comparison with other studies with different algorithms for the ETo simulation. Some quantitative comparison to elucidate the better performance of the new bias-correction algorithm needs to be done. I believe it will prove the reliability of the new algorithm. Line 388, feasible or reliable ETo forecasting. Line 390, short-term ETo forecasting provides highly valuable information for real-time decision making on water resource management and planning farming practices. This study proved the bias-correction approach is a feasible method for a more robust calibration of the NWP-based ETo forecasting.
-
RC3: 'Comment on hess-2021-69', Anonymous Referee #3, 04 Mar 2021
reply
Title: Bias-correcting individual inputs prior to combined calibration leads to more skillful forecasts of reference crop evapotranspiration
Author(s): Qichun Yang et al.
MS No.: hess-2021-69
This paper focuses on the comparison of two calibration strategies to provide short-term reference crop evapotranspiration (ETo). ETo forecasting is still a relatively new area of research, in Australia and elsewhere, and has received more attention in the past few years. Skilful ETo forecasts in Australia would help support efficient water use and water management. Two strategies to calibrate ETo forecasts have emerged: i) the calibration of raw ETo forecasts and ii) bias-correcting input variables first before calibrating ETo forecasts. Little work to date compares the two approaches, it is unclear which method might be more advantageous or skilful. This paper therefore addresses a topical subject with a large audience interest.
I have some reservations regarding some methodological choices and justifications (purpose and inclusion of experiment 3 and 4), as well as a lack of interpretations of the results overall. I recommend revision to strengthen this paper.
The authors re-grid the weather forecast variables of ACCESS-G2 to match the timeframe and resolution of the gridded data AWAP. They perform four experiments: experiments 1) and 2) are based on the ETo anomaly and climatological mean, whereas experiment 3 and 4) use the ETo values directly. Furthermore, experiment 1) and 3) use raw inputs to calculate and calibrate ETo forecasts whereas experiments 2) and 4) first bias-correct inputs before ETo calibration. The SCC calibration method is used for ETo forecast while a quantile mapping method is used to bias-correct input forecasts. The authors evaluate the forecasts using three metrics for the theoretical assessment of bias, reliability and accuracy. Overall results suggest that the second strategy (bias-correction of inputs before ETo calibration) provides more skilful forecasts.
Major comments:
Methodology:
- P4 section 2.3: Why not compare the calibration method used SCC to other methods tested in the literature which would enable to place this work in context to other studies on ETo forecasting?
- Presentation of summary statistics. Why not use boxplots to present overall statistics and across lead times (for example next to figure 4 and so on)? Reliability diagrams for particular ETo thresholds would be helpful to communicate when the forecasts are reliable.
- Authors present experiments 1-4 in the method but then only present some results one experiment 3) and 4) in the last section of results (CRPSS in 3.5). No explanation are provided of why calibration 3) and 4) are only briefly introduced. Why is there a big gap with no results on calibration 3) and 4) on the bias and reliability results? Could the authors please expand on the purpose of including these at all in? At p17 l350-354, ‘a further evaluation based on a different way of implementing the calibration demonstrate similar improvements in calibrated ETo forecasts with the adoption of bias-correction to input variables’. Is the purpose of including experiment 3) and 4) to test the generalisation of the method? If so, it needs to be clearly stated and justified earlier.
Methodological choices for evaluation:
- P7 l 180-185 : why choosing the absolute bias and over a relative measure e.g. percentage bias? This choice makes it difficult to compare the magnitude of the errors in the results across different variables and studies. For example, figure 1 shows a bias between -2 to 2mm/day which does not seem like much compared to other input variables such as precipitation. Figure 3 with a range of -0.1 to 0.1 seems very small. Conversely, percentages are used for the correlation coefficient in Figure 6 so why not use it for the bias?
- P8 l205-2015: why is climatology used as reference forecast for the skill score? In hydrological forecasting persistence is typically used for short lead times, whereas climatology would be used for longer lead times, see fore example (Pappenberger, Ramos et al. 2015). Could you please expand and justify the choice of reference forecast used and implication of interpretation of results?
- P8 l214. Why is the definition of CRPSS using percentage? As far as I am aware, most studies do not present the CRPSS in terms of percentage, could you please comment on the reason of this choice with references that also use percentages and if there is any advantages?
Analysis and interpretation of results:
- P11 l259-261: why the higher difference in bias in approaches for the Nothern Territory? How does this relate to the biases, errors and assumptions of the NWP? Is it correlated to the biases of specific input variables? How is it correlated to the nonlinear relationship in calculatint ETo? Why are the biases most pronounced for shorter lead times? Please comment.
- P13 l282-285: Why lowest score of correlation coefficient in northern Territory? Is it linked to the NWP (and if so how?) or is it linked to observations? E.g. differneces in observations compared to rest of country?
- P14 l294-297: The geographical patterns of the correlation performance is very similar to the patterns of the bias performance. Could you please comment why and if the reasons are the same? Are these related to either the NWP or observations?
- P16 l320-328. Please comment on why the accuracy has larger differences in terms of geographical patterns than for the bias and PIT performance which had very strong localised performance.
- P16 l329: Results on calibration 2 and 4: what is the comparison between 2 and 4? Why are these only addressed in the evaluation of forecast accuracy section? Why is there no mention of these for the bias and reliability evaluation? I suggest changing the section order and moving this section first. Then, add a sentence in the bias and reliability section to explicitly communicate what results of experiment 3) and 4) are not presented and why.
Discussion:
- There are little to no direct comparison of results and calibration work presented here to any previous methods or studies ( which were mentioned in the introduction). To address a research closure, please put the work presented in this paper in context with other studies applying strategy 1 and strategy 2.
- It is unclear whether authors recommend the use of experiment 2) or 4), when and why. In that sense, I question again the inclusion of these experiments without further elaborating and discussing these results.
Structure:
- The introduction is well structured and appropriately present previous work studies and existing strategies.
- The title is a bit lengthy, authors could consider shortening it.
- As noted above, I suggest authors consider the order of results presented in the context of results from experiment 3) and 4).
Minor comments:
P4 l106: I suggest adding a diagram clearly explaining steps and differences of procedure between the calibration experiments.
P3 l68: ‘…pressing need to investigate.’ Please expand why it is pressing?
P3 l74: Calibrate should be calibrate with small cap letter.
P3 l80-84: There are many efforts to develop downscaling methods, please comment on what was been done here to downscale ACCESS-G2 to the AWAP grid. Why not scaling AWAP to the match the forecast grid?
P4 l100: please add a comment that SCC model will be described in section 2.3.2
P5 l134 climatological means or mean? Please rephrase and clarify this sentence.
P6 l165: Why are only 100 members drawn, is there any difference with a varying number of ensemble members for forecast reliability? Is there a need or a reason to verify accumulated Eto forecast values across lead times (as is often the case for streamflow forecasting)? Please comment.
P8 l225: ‘wind speed is higher than 1m/s than the reference in Australia’. Could you please translate that in terms of percentage so that this statement can be more easily compared to other locations.
P18 l380 ‘NWP outputs have been increasingly used for ETo forecasting.’ For which applications? Please finish the sentence.
P18 l385 Addition ‘of’ in … skill ‘of’ the calibrated ETo forecasts.
References:
Pappenberger, F., M. H. Ramos, H. L. Cloke, F. Wetterhall, L. Alfieri, K. Bogner, A. Mueller and P. Salamon (2015). "How do I know if my forecasts are better? Using benchmarks in hydrological ensemble prediction." Journal of Hydrology 522: 697-713.
Qichun Yang et al.
Qichun Yang et al.
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
219 | 42 | 7 | 268 | 19 | 3 | 1 |
- HTML: 219
- PDF: 42
- XML: 7
- Total: 268
- Supplement: 19
- BibTeX: 3
- EndNote: 1
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1