In the second draft, the authors of "Impact of bias nonstationarity on the performance of uni- and multivariate bias-adjusting methods'' took into account part of my comments, as well as comments from the second reviewer. In particular, it results in evaluating the performance of 2 univariate and 4 multivariate BC methods under climate change conditions in order to determine the influence of bias nonstationarity on results. Instead of evaluating their results over the whole year (as in the first draft), the authors followed the advice from the second reviewer and performed a seasonal evaluation. They conclude that non-stationarity can have an important influence on the performance of the MBC methods and the propagation of biases in impact models. The authors found that the importance of the influence varies depending on seasons and variables. They finally advise to account for seasonality for a robust bias adjustment under bias-nonstationarity, and advice to use univariate methods instead of multivariate ones until it becomes more clear how MBCs perform under bias nonstationarity.
While I appreciate the work done by the authors to modify the initial draft and take into account the comments from the reviewers, I think that several major limitations still remain in the present study and can be improved.
General comments:
1) The design experiment does not permit to assess properly the influence of non-stationarity on the performance of the MBC methods and should be modified. If I understood well, the 2 univariate BC (QDM and mQDM) are applied over a 91-day moving window, while the 3 multivariate BC (MBCn, R2D2 and dOTC) are applied over the full time period. Applying the 3 multivariate BC (MBCn, R2D2 and dOTC) over the full time period and evaluating them at a seasonal scale is not appropriate, and hence presents a major issue for the interpretation of the results. Indeed, as pointed out by the authors in Section 4.1, biases can vary considerably depending on the season: for example, a climate model can present very little bias in winter and be drastically biased in summer. Thus, the correction of the statistical properties of the model (such as mean, variance, correlations) can be very different depending on the season. By applying MBC methods such as MBCn, R2D2 and dOTC that do not include seasonal components in their procedure over the full time period, it generates data that potentially present bias introduced by the design experiment: the MBC methods would correct the different seasons by applying a similar statistical transformation. Consequently, model data for seasons with strong biases won’t be corrected enough, whereas model data for seasons with little biases could be deteriorated. In the study, applying these 3 methods (MBCn, R2D2 and dOTC) as such potentially introduces a bias by construction, placing them at a disadvantage in the intercomparison study compared to QDM and mQDM. Moreover, it makes it impossible to identify if bad performances from these MBCs are due to either bias nonstationarity or artefacts from the design experiment. This problem is known by the authors, as discussed in L638 (« It is thus unclear whether the poor seasonal performance obfuscates the effect of nonstationarity, or if the similar performance is a sign of robustness. ») and L640 (« Hence, the set-up does not allow to clearly discern between the various categories of multivariate bias-adjustment, such as the ‘marginal/dependence’ or ‘all-in-one’ categories. »). But assessing the effect of nonstationarity on the performance of MBCs is initially the main objective of the study. The design experiment must be established in order to isolate the effect of nonstationarity as much as possible. For example, it can be done by applying all the BC methods that do not include seasonal components in their correction procedure (QDM, mQDM, dOTC, R2D2 and MBCn) over the same seasonal period (e.g. over winter or summer, but separately), and then performing seasonal evaluations to fairly compare them. It would permit the intercomparison of BC methods, all things being equal.
2) The new implementation of a MBC method, named Rank Resampling for Distributions and Dependences (R2D2, Vrac et Thao, 2020), is unclear. This MBC method relies on an analogue-based technique for which some conditioning information is required to adjust dependence structure of the simulated time series. The conditioning information can be multivariate, by considering a set of variables at a given time t. It can also be extended to ranks sequences, i.e. conditioning by not only one but several lagged time steps. The choice of the conditioning information is crucial to interpret the results from R2D2, as it can have impacts on marginal, inter-variable and/or temporal properties. This information, and its influence on bias-corrected data from R2D2, is not precisely given in the paper. Consequently, results from this MBC method cannot be analyzed in an appropriate way by the readers. Moreover, at L362 is indicated: « Each variable (precipitation, evaporation and temperature) was in turn used as the reference dimension. ». This implies that 3 bias-corrected data were produced for R2D2, but, surprisingly, only one result for R2D2 is presented in the study. Thus, further clarification is required to better present the results from the R2D2 method.
3) The Section 4.1 ‘Bias change’ is hard to read. Is a table missing? The authors describe index values for bias change between the calibration and projection period, but a table seems necessary to better present the results and facilitate reading.
4) I would like to thank the authors for providing the results for the calibration period. However, linked with my first comment, it highlights that MBCn, dOTC and R2D2 methods are not applied in an appropriate manner compared to QDM and mQDM: For example in Table 4, PSS values indicate poor performances for these 3 MBC methods during the calibration period, principally because they are applied over the full time period and evaluated by seasons. If MBC methods do not produce good results on the calibration period on indices that are supposed to be adjusted, then MBC methods are not well “calibrated” and no good results can be expected for the validation period. This point should be considered if my first comment is taken into account.
5) Also, linked with 4), the advantages of the “marginal/dependence” methods such as MBCn is that, for evaluation criteria on marginal properties such as PSS in Table 4, same performances must be obtained between QDM and MBCn (trend preservation), by construction. It would be nice to consider retrieving these results on marginal properties before analysing other indices, such as correlation or discharge.
Specific comments:
L318: In this study, dOTC is not the most recent method used, but R2D2 2.0
L526 « Both the univariate and the multivariate bias-adjusting methods can adjust the simulated biases well » and L530 « the good adjustment by univariate methods is trivial: they will adopt the correlation of the simulations and only slightly adjust this by adjusting the marginals. » I was wondering if you can rephrase these sentences in order to avoid saying that univariate bias-adjusting methods adjust correlations. Improvements of correlations are only due to an indirect effect of the adjustments of marginal properties.
L527: « The univariate methods will adopt the dependence structure of the raw simulations » I am not sure if it is true for mQDM, that will have exactly the same rank correlation structure than the observations by construction (at least for the calibration period).
Table 2: As requested, a table is introduced in order to summarize the different characteristics of the MBC methods. This table can be very useful for the readers, but the actual one presents some formulations that are not clear enough or misleading. Some examples: for the row « Temporal properties » and column « dOTC », the information « Future, adjusted » is misleading. dOTC is not designed to adjust temporal properties and must be clearly indicated. Another formulation must be used instead to add more nuances and to specify that potential unexpected behaviors of temporal properties can be obtained with dOTC. For the column « R2D2 », the information « Shuffle based on observations » is not clear enough: temporal properties of the bias-corrected data depend on the conditioning information used (see my second point). For the column «MBCn», the information « Shuffle based on observations » is wrong: temporal properties from the model are modified in an uncontrolled manner by the decorrelation/recorrelation procedure and the univariate correction. However, empirical findings in François et al., 2020 indicate that MBCn (and hence the decorrelation/recorrelation procedure) tends to conserve partially the rank sequences from the model, in particular in the context of bias-correction of a small number of statistical dimensions. Moreover, it might be necessary to change the order of the rows in order of importance. I find it odd to have the row « Temporal properties » at the beginning of the table and « Statistical technique » almost at the end of the table.
Bibliography
Vrac, M. and Thao, S.: R2D2 v2.0: Accounting for temporal dependences in multivariate bias correction via analogue ranks resampling, Geosci. Model Dev., 2020, 1–29, https://doi.org/10.5194/gmd-2020-132, 2020.
François, B., Vrac, M., Cannon, A. J., Robin, Y., and Allard, D.: Multivariate bias corrections of climate simulations: Which benefits for which losses?, Earth Syst. Dyn., 2020, 1–41, https://doi.org/10.5194/esd-2020-10, 2020. |