the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Sources of skill in lake temperature, discharge and ice-off seasonal forecasting tools
Leah Jackson-Blake
Daniel Mercado-Bettín
Muhammed Shikhani
Andrew French
Tadhg Moore
James Sample
Magnus Norling
Maria-Dolores Frias
Sixto Herrera
Elvira de Eyto
Eleanor Jennings
Karsten Rinke
Leon van der Linden
Rafael Marcé
Download
- Final revised paper (published on 29 Mar 2023)
- Supplement to the final revised paper
- Preprint (discussion started on 08 Sep 2022)
- Supplement to the preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on hess-2022-312', Anonymous Referee #1, 05 Oct 2022
This article treats the important and so-far under developed field of seasonal forecasts (here 4 months into the future) of lake and drainage area properties including water temperature, ice-off, and river discharge. The authors combined two lake models (simulating surface and bottom temperature and ice cover, two lakes each) with four Hydrologic models (simulating discharge, one drainage area each). The method was applied at four lake-river system located in Norway, Spain, Australia, and Germany. Modeled systems include lakes spanning 19 o 60 meters depth and with a retention time from 0,2 to 1,1 years.
The coupled model setup was calibrated towards measurements (lake temperature and river discharge) and forced with reanalyzed data from ERA5, I would define this as a general circulation model (GCM). Hydrological-lake model performance was evaluates with KGE, NSE and RSM. Thereafter calibrated models was spun-up during one year and forecasted discharged and lake surface and bottom temperature during four months (one month initialization) spanning 13 years (1993-2016). Future forcing comes from 25 forecasts from the global forecasting tool SEAS5. SEAS5 was bias corrected and downscaled (grid adjustment) towards ERA5 to enable comparison.
The correctness of the forecasts (Lake_F) was evaluated trough a sensitivity analysis, comparison of Lake_F towards in-situ measurements and towards daily pseudo-observations (Lake_PO, daily output from the coupled hydrological-lake model setup forced with ERA5). The end product of this manuscript consist in an evaluation (sensitivity analysis), of forecasting correctness for each river-lake system and a evaluation of forcing parameters influence on forecasts.
The manuscript show potential but is lacking in some areas which I list hereunder.
Chosen drainage areas and lakes
The authors put forward that seasonal predictions work best next to the equator and worsen with increased latitude (line 50 to 58). Yet, no system was chosen in this region, Spain being the closest. The manuscript could still benefit from an analysis of latitudinal effects for the used forecasting method to improve forecasting towards the North/South pole.
Additionally, the river-lake systems chosen contain lakes with very short retention time, i.e. big impact of rivers on water constituents, including temperature. The model method used include the effect of changing lake volume, but not the effect of heat being transferred into the lakes by upstream drainage area (input temperature I could not find). Thereby it is reasonable to assume that the lake models (through calibration) had a better connection between surface and deep waters than is the in-situ case. Could this show up in your analysis of forcing parameter importance (“Tracing of forecasting skill” section 3,4, Fig. 4)? This needs to be addressed/analyzed since you link forcing to lake processes, which in fact could be caused by upstream heat fluxes in the drainage area and not in the lakes themselves.
Data
This manuscript use ERA5 reanalysis as a stand in for in-situ measurements. Why is this, due to large spatial extent of drainage areas? If possible, show how this influence your modelling locally, or refer to documents where the reader can find this comparison between ERA5 and in-situ measurements, in best case for the regions being analyzed.
Clarity
The manuscript could benefit greatly from an index defining the many acronyms used, as well as improved description of tables and figures. Ex Table 4 and 5 is hard to understand.
Furthermore, I could not find/understand if the drainage area and lake models are coupled in time (run simultaneously), or if the drainage area models where run in advance to provide discharge for the lake models.
The language
Certain words in the manuscript cause some confusion. Bellow I have stated some that might need to change
Skill – is associated with people. A fast car (a tool) has no skill it has performance, the driver on the over hand has skill. That said, I know skill is used more commonly to describe models (tools) in meteorology than hydrology. So I suggest that you define what you mean by skill if you want to keep this formulation.
Climate & climate prediction – studies involving effect of climate focus on longer time periods (>30 years) than what is the focus in this study (<1.5 years). Both SEAE5 and ERA5 comes from global GCM models, which could be used for climate studies. But in the context of this manuscript I do not think this is the right phrase describing the models you used.
Hindcasts – is usually used in the setting of running models with data from past events, close to reanalyze with the aim to improve said models. Here this word is used in combination with SEAE5 forecast simulations. The authors have adjusted these to ERA5 (real data proxy) but the intention is still to use SEAE5 as forecasting forcing. Therefor consider other alternatives in the manuscript, or define this word in the context of your manuscript.
Water quality – for drinking water and the biosphere, temperature is considered an important water quality parameter. Here we do not look at lakes and rivers in this sense, water quality one would assume here to entail dissolved constituents (nutrients, oxygen…). To avoid misunderstanding, consider using something else.
Line 19 : “as previously presented”. Avoid need for reference in abstract.
Line 67 : Consider adding the following reference https://doi.org/10.1016/j.watres.2020.115529
Line 72 to 74: partly untrue, air2water can run perfectly with seasonal forcing as you do here (only air temperature as forcing), and ice-off is currentl available indirectly.
Section 2,1,1 : The reader do not know where the lakes and drainage area (rivers) under investigation is situated. Add a map showing the global location and regional extent of each drainage area-lake system (rivers and lakes). Additionally add these system details (names, stations, etc.) where appropriate, ex. Table S1.
Line 111 or 112 : add reference: SEAS5: the new ECMWF seasonal forecast system. Stephanie J. Johnson, (2019), https://doi.org/10.5194/gmd-12-1087-2019
Line122 to 123 : „Climate data where downloaded….”. What do you mean in this sentence, ERA5 and/or SEAS5?
Line 134 to 135 : If true that hydrological and lake models were chosen due to local lake/river conditions as I understand from your text, state which conditions and why. I suspect that local infrastructure, i.e. local people (authors) knowledge/experience with chosen models, determined which was used (which is a valid reason if that is the case). Using multiple models is a strong supporting point for this manuscript.
Line 139 “)” missing
Line 156 to 157 : Add details (equations and ex. RMSE) of this linear regression between in- versus outflow.
Figure 2. consider showing mean of SEAS5 predictions and ERA5 at the same time (i.e. continue black lines into transition and target season).
Line 185 : RPSS looks to be missing from table 2 and table 3.
Line 235 to 238 : something is missing here, hysteresis should make linear relationship between ex. air temperature and water temperature rather bad. Describe how good these linear fits were (in appendix). And/or show with figure and improve explanation.
Line 243 the reader are not familiar with the contributions of local heat fluxes at the chosen locations. Before disregarding for example cloud cover from the analysis, show the reader in numbers (or preferably figure as appendix) for each lake the seasonal heat budget contributions. I.e. uptake and emission of infrared longwave radiation, evaporation + condensation, sensible heat flux and uptake of surface downward solar radiation. Throughflow you only have the outflow (at some lakes?) since inflow temperature is missing.
Line 250 : RMSE not consistent with RMSE/sd in Table S2. What is RMSE/sd? Use the same in text as in Table S2.
Table 5. move description of asterisk under table and improve the site representation. Now you can not see what belongs to which system. And define the season duration.
Figure 3 and 4 : missing Germany and Australia, add or explain.
Figure 4 : Something do not add up in your analysis. Top row for Spain – Bottom temperature, and Norway - Surface temperature appear to be to large compared to the individual season values taken together. I.e. if the impact is small most seasons, I do not see how it could e much larger on an annual basis.
Figure 5 : Why so many data gaps? Consider showing seasons where significance is worse (higher) but clearly state which significance level you trust.
Line 468 : add author contributions. Who did what?
Citation: https://doi.org/10.5194/hess-2022-312-RC1 -
AC1: 'Reply on RC1', Francois Clayer, 31 Oct 2022
Thank you for this thorough and helpful review. And I apologize for this rather late reply, I hope this can still trigger some further discussion upon interest. Below and some preliminary responses for the main points the reviewer raises and some more specific answers.
Chosen drainage areas and lakes
Yes we agree that the rationale behind our site selection is not well presented, especially regarding skills of SEAS5 predictions. We selected those sites to "test out" the applicability of SEAS5 outside the tropics. This point will be emphasized in the revised introduction. Also what do you mean by "an analysis of latitudinal effects for the used forecasting method to improve forecasting towards the North/South pole"?
I'm not sure I understand what you expect here.Very good points on water temperatures. Thank you for raising those. We indeed overlooked the impact of the water temperature of the inflows on the lake/reservoir heat budget, although it is mainly based on air temperature with some time lag. Allowing for some time for additional analysis, we will be able to include that in section 3.4, as well as the connection between surface and deep waters.
Data
We have used ERA5 data to indeed cover all the catchment areas, but most importantly to avoid using observations with data gaps. Several case studies had issues gathering continuous timeseries of observed weather data. So in some cases, forcing the models with observations for a long-enough period of time (over 20 years to have at least 20 seasons) to allow for probabilistic outputs was not even possible.
A second reason is also to keep our workflow transferable to other sites.
We will make sure these points are clearer in the manuscript
Clarity
Yes we can add an index for clarity.
The models were run one after the other, the catchment model provide inflows to the lake/reservoir model. We will make that point clearer in the manuscript.
The language
We agree and apologize for our lack of precision on the terminology and any impact it might have had on your understanding of this study. This is a point that the other reviewer has also raised so we will make sure to use the common terminology in the forecasting community, and define any terms that are not common when necessary.
Thank you for your detailed comments on each line. We will add a map and make sure we respond to all your concerns. Below are some preliminary answers for a few specific points.
About, L. 235-238, you were concerned about “hysteresis should make linear relationship between ex. air temperature and water temperature rather bad” (…)
Pearson partial correlation coefficients (PPCC; first described here l. 235-238 and displayed in Fig. 5) are calculated from seasonal means, and not daily values, which likely yielded much cleaner correlation than expected from daily values. This point will be made clear in the revised MS.
Line 243 the reader are not familiar with the contributions of local heat fluxes at the chosen locations.
Agreed, we will include some additional figures showing each lake heat budget to back up our assumptions here.
Figure 3 and 4 : missing Germany and Australia, add or explain.
We apologize for this oversight. The German and Australian case-studies were left out of this analysis because of the resource and time-consuming process running these experiments and because both sites showed only one or two skillful season-tercile-variable combinations. This point will be included in the MS in the method section.
Figure 4 : Something do not add up in your analysis. Top row for Spain – Bottom temperature, and Norway - Surface temperature appear to be to large compared to the individual season values taken together. I.e. if the impact is small most seasons, I do not see how it could e much larger on an annual basis.
We agree with you that it can appear quite surprising that the sensitivity calculated on an annual basis is much larger that over each single season. However, Norway’s climate is subjected to strong seasonal cyclicity which can be well captured over annual scales, but still have low correlation for a given season. We can better explain this point in the MS.
Figure 5 : Why so many data gaps? Consider showing seasons where significance is worse (higher) but clearly state which significance level you trust.
Many were not significant with p-value > 0.1, so not show on the plot. We can still add these on the figure.
Citation: https://doi.org/10.5194/hess-2022-312-AC1 -
AC2: 'Reply on AC1', Francois Clayer, 31 Oct 2022
Just a short addition to my comment:
Note that l. 214-215, we mentioned that the time-consuming sensitivity experiments whose outputs are shown in Figures 3 and 4, are only performed for the spanish and norwegian site:
"Sensitivity analyses were only carried out for Spain and Norway because of the low number of windows of opportunity at the two other sites and considering the resources needed to execute these hindcast experiments."Citation: https://doi.org/10.5194/hess-2022-312-AC2
-
AC2: 'Reply on AC1', Francois Clayer, 31 Oct 2022
-
AC5: 'Reply on RC1', Francois Clayer, 04 Nov 2022
Just a final addition to our responses to RC1 comment below:
"Additionally, the river-lake systems chosen contain lakes with very short retention time, i.e. big impact of rivers on water constituents, including temperature. The model method used include the effect of changing lake volume, but not the effect of heat being transferred into the lakes by upstream drainage area (input temperature I could not find). Thereby it is reasonable to assume that the lake models (through calibration) had a better connection between surface and deep waters than is the in-situ case. Could this show up in your analysis of forcing parameter importance (“Tracing of forecasting skill” section 3,4, Fig. 4)? This needs to be addressed/analyzed since you link forcing to lake processes, which in fact could be caused by upstream heat fluxes in the drainage area and not in the lakes themselves."At least for the Norwegian case study, water temperature of the inflows was simulated using a simple temperature model based on a time-lag behind air temperature (hence highly correlated with air temperature). Hence we will be able to show with a simple analysis whether in-lake surface or bottom temperature is sensitive to that, e.g., with partical correlation coefficient. We will clarify how the water temperature of the inflows was dealt with for each case study and provide an analysis on the impact of these setups on the lake heat budget.
Citation: https://doi.org/10.5194/hess-2022-312-AC5
-
AC1: 'Reply on RC1', Francois Clayer, 31 Oct 2022
-
RC2: 'Comment on hess-2022-312', Anonymous Referee #2, 20 Oct 2022
Review of “Inertia and seasonal climate prediction as sources of skill in lake temperature, discharge and ice-off forecasting tool” by Clayer et al
The study assesses whether seasonal (meteorological) predictions can be used within a hydrometeorological forecasting tool to produce skillful lake temperature, discharge and ice-off forecasts. In general, the topic can be of interest for the community and gives a nice example of how seasonal (meteorological) predictions can be combined with hydrological models to produce application-relevant and user-specific outputs that provide a baseline for decision makers.
However, I struggle with the manuscript in its current form. Hydrometeorological prediction systems are per definition interdisciplinary as it combines meteorological and hydrological models. It is further complicated when the output is further used by decision makers. When reading the manuscript I got the impression that the authors do not have a strong background in hydro-meteorological forecasting systems and use terminologies that are very uncommon in this field. These imprecise formulations make the understanding of the manuscript very difficult. I suggest the authors to (further) familiarize with the literature body about hydrometeorological forecasting systems and to follow more closely the terminology used in standard literature.
In general, the language should be much more precise and terminology should be used from the community. Overall, it seems a bit like the manuscript is a composite of multiple different text/styles. I suggest unifying the manuscript to enhance its readability. Use commonly used terminologies throughout the manuscript (abbreviations, model descriptions). Delete obsolete/imprecise sentences or specifically mention the related numbers and where they can be found (e.g. 158-159: “…scores for hydrological and lake modlling were calculated.” But there is no indication where in the manuscript the scores are provided)
Necessary clarifications:
windows of opportunity: From the manuscript I do not understand how “windows of opportunity” are defined. As this is a crucial baseline for the analysis, the authors should make an effort to properly describe how these windows are selected and what exactly the temporal resolution (or aggregation) of the forecast data is that they use to calculate the scores.
Hindcasts: Fore each forecast that is produced, the prediction system calculates as well the hindcasts for the same date in the past XX years. It is not clear to me, which hindcasts have been used explicitly by the authors as they mention hindcasts from years (1994-2016). I assume they thus use all hindcast from the forecasts produced during 2017. In addition, on the same line it is written that the hindcasts from 1993-2016 are use, what might just be a typo.
Another point is the source of predictability. Sources of predictability are from my point of view physical processes and/or connections within the atmospheric/hydrological system. E.g. a source of predictability are sea surface temperature that influence the large scale dynamical patterns, such as for example in the ENSO or NAO, or as a hydrological example, initial conditions of snow or soil moisture can be a source of predictability for river discharges thanks to the memory of the system. I struggle with the terminology used in this manuscript that assigns a seasonal forecasting system as a source of predictability. It is rather the boundary condition, provided by the seasonal forecast system as an input to the hydrological model, that can be seen as the source of predictability. This is already a problem in the title. I suggest the authors to carefully revise the manuscript and the title. The analysis rather aims at determining if seasonal predictions can be used to produce skillfull lake temperature, discharge and ice-off forecasts.
SEAS5 system does have a resolution of 1° whereas ERA5 has a resolution of 0.25 degree. The Authors mention that ERA5 data was used to bias correct the SEAS5 hindcast. Due to the mismatch in resolution this correction inherently exhibits a downscaling step. Please elaborate what the actual hindcasts are that are used to run the hydrological models and the lake models.
Furthermore, how is the bias correction implemented? Do you use a leave-one-year-out methodology? There is not enough information about this pre-processing step in the manuscript.
More specific comments
Introduction
Line 24-26: two model outputs are compared: seasonal lake hindcasts (forced with SEAS5) and pseudo observations (forced with ERA5). In the next sentence it says “the seasonal lake hindcasts was generally low but higher then SEAS5 climate hindcasts”. These sentences are confusing, what is analyzed in the SEAS5 climate hindcasts exactly. Do you mean the skill of the meteorological predictions (SEAS5 outputs) are worse then the skill of the hydrological predictions?
Line 53-55 “Hence, seasonal climate forecasts are usually not the main source of predictability outside the tropics, at least for stream flow (Greuell et al., 2019; Harrigan et al. 2018; Wood et al. 2016)”
I struggle again with this formulation: Sources of skill can be the initial conditions (e.g. snow, soil moisture) or the forcing variables (e.g. temperature forecasts that determine the skill of evapotranspiration). But the seasonal forecast itself cannot, at least from my point of view, be seen as a source of predictability. I suggest that a careful reformulation of this (and similar) sentences throughout the manuscript. In particular, the publications referred to in this sentence can be used as a starting point and the formulations and terminologies used there should be used as well in the current manuscript.
Line 70 “…temperature predictions and forecasts” from a meteorological point of view forecasts and predictions are basically synonyms. Thus, I suggest not using both terms to make the manuscript more readable.
Line 72:
“…it doesn’t take seasonal climate forecast ensembles i.e. climate data products specifically designed for seasonal forecasting…” this formulation is again confusing for me. The output of seasonal forecasts are seasonal forecasts and not a climate data product. I suggest to avoid such sentences to make the manuscript more clear.
Line 76 & 77: The authors mention “When forecasting river flow, for example, predictability can originate from two main sources: (i) catchment water stores of initial soil moisture, groundwater, and snowpack, which are directly linked to the water residence time; and (ii) seasonal climate prediction (Greuell et al., 2019).” I agree that there a two main sources (1) the initial conditions and (2) the boundary conditions, i.e. the relevant variables from the driving meteorological forecasts. Again the formulation that “seasonal climate predictions” are a source of predictability is misleading and is to my knowledge not used in literature, as it is a very vague formulation. This makes me feel, without being rude, that the authors should invest more time to familiarize with the commonly used terminology in hydrometeorological forecasting and try to carefully revise the manuscript.
Line 80-84: “When dealing with standing water bodies, antecedent conditions are also likely to provide significant predictability, given that the water storage in lakes and reservoirs is large compared to river channels, providing higher inertia. Water residence time is thus expected to exert a strong influence on water flow predictability. Water temperature, on the other hand, is influenced by multiple meteorological variables, e.g., wind, and radiation, in addition to water stores which can affect the source of its predictability.”
I do not understand what the authors try to argue, what is meant by “water flow predictability”, do you mean discharge of the rivers? Lake level heights? Can you make this sentence clearer?
Line 90. I suggest to quickly introduce what ice-off is.
Line 90-91: You mention that you quantified the forecasting skill of each meteorological variable. This is later provided in Table 4. However, this table is hard to understand and does not provide quantified values of the fair RPSS and the ROCSS. Per definition a skill score ranges from one (perfect forecast) to zero (= no skill of the forecast) and negative values indicating that the forecast is less skillful then the reference. Here the authors just indicate if a given skill score is significant for the given variable and tercile. Why don’t you provide the actual skill scores and the numbers? And what is your definition of a significant skill? Is everything above 0 a significant skill? Furthermore, the abbreviations of the variables are non-intuitive and seem to be directly from a model output. I suggest using more common/readable abbreviations (eg. rsds = rad or solar radiation). In addition, the variable rlds is only given in this table but nowhere in the manuscript described, what is the meaning of this variable?
Methods:
Line 101: Introduce the definition of ice-off already in the introduction section, this helps the reader to understand from the beginning what you aim at.
Table1: Maybe it helps to add a map, where these reservoirs are located. Are they in complex terrain? In arid or humid climate regions?
Line 109: Climate data
I suggest changing the name of this paragraph to e.g. meteorological input data. Seasonal forecast data is not really a climate dataset, it is in a forecast dataset.
Line 111: what do you mean with “relatively homogeneous spatial and temporal coverage”? Both datasets, the reanalysis and the forecast data are datasets produced by global earth system models and thus provide a global coverage and are continuous in time as defined by the model integration steps.
Line 114-115: Pleas add a bit more detail about the bias correction. Did you use a leave-one-year-out approach? What quantile mapping approach did you use? How do you deal with values above the 99th percentile? Do you use an additive or a multiplicative method? How do you account for intervariable dependencies? Although there are some more information in the supplement of the indexed publication, I think some more information within this manuscript will help the reader to understand what you have done in this study.
Line 116: Here you use the term “impact models” and “impact variables”. Are these the water quality models and variables? In the rest of the manuscript the terms “catchment models”, “hydrologic models”, “lake models”… It makes it hard to follow, when you jump between the different models. I suggest unifying these terms and use one definition throughout the manuscript.
Line: 125
Here you give more information about the EQM (referred to my comment for line 114-115). I suggest referring to these lines already at line 114. Nevertheless, it is crucial how the bias correction of the seasonal forecasts is done, I suggest adding some more information about how it is actually done. In addition, did you look at the performance of the bias correction? How do skill scores change before/after the correction? In forecast verification, a specific score only shows part of the story, it could be fruitful to look at additional scores and measures, such as e.g. reliability diagrams to get more information about the full ensemble.
Line 127-128:
Here the variable rlds is missing which is shown in Table 4 later. In addition, the abbreviations of the variables are non-intuitiv. I suggest using standard, or more readable abbreviations (e.g. temperature =T, pressure = p or even use there full names in the text). In addition, do you use daily values or lower temporal resolution? Is the daily air temperature a mean daily air temperature? Are these the parameters you really use from SEAS5 and ERA5? Is the air pressure the surface air pressure? You mention wind speed here, but use the u and v wind components, that’s completely fine, but then I suggest only mentioning the wind components and not the wind speed, as wind speed is a different variable then (i.e. a combination of) u and v.
Line 130-131: Please describe the data you used and where it was measured? Is the station directly at the lake/reservoir inlet? Is it an official observational station? Who is responsible for the measurements (public agency? Scientific group?)? why are there so many data gaps? How trustful are these observations? What method is used to estimate the discharge? What temperature do you use (daily mean, daily max, daily min)? There is no need to include all this information, but at least a reference about how trustworthy the measurements are would be necessary.
Line 142: Here you use the variable abbreviations, as mentioned before, I suggest using more intuitive abbreviations or writing the full names of the variables, to improve the readability of the manuscript.
Line 143-145:
The authors mention that all hydrological models have been validated against observations. Do you have any publication you can refer to? What is the performance in terms of NSE for the calibration? Were the models calibrated for the same time period/locations? Was the same observational dataset used? Please give some more information.
Line 152: Maybe it is worth to introduce these performance measures in a separate chapter, e.g. together with the skill scores.
158-159: “Most common statistical goodness-of-fit parameters, e.g., Kling-Gupta efficiency (KGE), NSE and RMSE, for hydrological and lake modeling were calculated.”
What is the results in terms of these scores? Specifically mention the numbers, otherwise this sentence is obsolete. In addition, I struggle with the formulation “goodness-of-fit paramters”, KGE, NSE, RMSE are performance criteria or scores to evaluate the model performance. I suggest revising this formulation and use more commonly used formulations from the field of forecast verification.
Line 167-168: What do you mean with “ total predictions skill” and what observations have been used to assess the performance? Do you here refer to the hydrological part I assume? I suggest a more careful formulation to better discriminated between the meteorological and the hydrological parts.
Line 173-174: ”Over the initialization month, the 25 members of SEAS5 progressively diverge from ERA5 to their respective SEAS5 member.”
How is this transition done? In the text you say initialization month, whereas in Figure 2 you mention transition month, I suggest unifying the terminology to avoid confusion.
Line 174-175:
“Model outputs for the final 3 months, i.e., the target season, were selected and used to calculate the probabilistic forecasts of seasonal summary statistics.”
This sentence illustrates what I mean with imprecise language. “The model outputs for the final three months” are already the probabilistic forecasts and for these forecasts, the scores are calculated. I suggest carefully revising this (and similar) formulations. In addition: Does it mean that you only use 3 month lead times? The seasonal forecasts provide predictions up to 13 moths, why do you use only 3 months lead time? I think this should be mentioned in the very beginning of the manuscript (and already in the abstract), as it is important to know what lead time horizon you are focusing on.
In addition, it is not clear to me how you do your verification. Is it based on daily values? Weekly averages? Monthly averages? The full 3 month period? Do you do a lead time dependent verification as well?
Line 191:
“Both skill scores are expressed as relative to a reference forecast, i.e., climatology.”
It would be worth to more carefully explain the concept of skill scores. A Skill Score is a comparison of two scores, one calculate for the forecast of interest and one for a reference forecast. Thus again this sentence is somehow imprecise and should be revised. Hence, for calculating the skill scores, what climatology did you use? I assume you used the climatology based on the pseudo observation experiments, is that correct? If so, please mention this in the manuscript
Line 194-197: I do not understand the concept of “windows of opportunity”. Can you better explain how these windows of opportunity were selected? And why only these windows were taken into account for the evaluation?
Table2 and full paragraph:
I struggle with understanding your procedure. Is the “Evaluation data” in table 2 the data used to compute the reference forecast for the skill scores? If so, please change the name from evaluation data to reference forecast data or similar. For evaluation of a forecast, you anyway need explicit data at the same temporal resolution as your forecasts (being real or pseudo observations). If you use a climatology (based on real or pseudo observations) as your reference forecast, how is the climatology constructed? How long is the timeseries that you take into account and how many data gaps do you allow for constructing your climatology based on real observations. If you mention that there are large gaps, it might be difficult to construct a reliable climatology.
Line 205-207: these two sentences are a repetition. I suggest reformulating it.
Line 200-227: I struggle a lot to follow the explanation. First of all what is ROCSSs/ ROCSSw/ ROCSSw+t
Refer to table 3 early in this paragraph, then it becomes already a bit clearer!
Line 209-230: Again the formulation makes it very difficult to follow and the abbreviations further reduce the comprehensibility. I suggest to reformulate this paragraph and emphasize on better understanding.
Maybe it would help to choose simpler or more descriptive titles for the paragraphs. E.g. 2.2.2 could be sensitivity analysis to initial conditions and meteorological forcing (input periods); and 2.2.3 Sensitivity to individual input variables.
Line 231: “..each Lake_PO variable” do you mean with each output variable of the Lake_PO experiment? State that explicitly, it will be much easier to understand!
Line 240-244:
Is this now based on your analysis? Or are these theoretical considerations? This is not entirely clear to me. In addition, the formulation is a bit misleading, does it mean that the variables that are retained are the variables that are used from seasonal meteorological predictions to run the hydrological models?
Results
Line 251: you refer to table S2 in the supplement. I suggest adding part of the table in the main manuscript and refer to “additional information” in the supplement.
Line 262: Here you mention fair RPSS, which accounts for a limited number of ensemble size. However, this is not introduced in the methodology section, there you should at least mention that the fair (or debiased) RPSS is used and what it accounts for.
Again, what do you mean with “significant fair RPSS”? When do you assume an RPSS to be significant? How do you test the significance of a skill score?
Line 269-270:
“Only 0 to 10% of the SEAS5 climate 270 hindcasts are skillful, on average”.
I struggle understanding this result. The RPSS (or any score) is usually determine based on a large sample forecasts (or hindcasts). Of course, for an individual forecast this might be poor but the full picture of forecast performance can only be revealed when the Scores for many issued forecasts are investigated.
I think here you should add a figure with the results where the reader can see how the hindcast performance actually is. From the text and the table alone, it is rather difficult to follow your argumentation. In addition, what is the definition of a skillful hindcast in your context? Is every hindcast with a ROCSS>0.5 skillful or do you use other thresholds? It would be crucial to mention the numbers at least once in the results section as well.
Line 278-280: How many seasons are discarded due to missing observations? Please indicate the exact numbers such that the reader knows how many samples (hindcasts) are actually used for the analysis. Or at least refere to the table where the numbers are listed.
Line 283-286: You mention that the ROCSS does not capture the same as the “goodness of fit statistics” do. This is by definition the case as they do not look at the same properties of a forecast. Therefore, for forecast verification multiple scores should be taken into account to properly assess the forecast performance. Can you rephrase the sentence to make it clearer what you actually mean?
Table 4
Again general comments: abbreviations are non intuitive. FRPSS is not introduced before. I do not get the message of this table. I would prefer to see the skill scores (e.g. as boxplots) over different seasons for the variables. It is not clear to me what temporal aggregation is the baseline of this analysis.
Table 5
Here you use abbreviations for the seasons (SP, WI, AU, SU), although it can be inferred I suggest to avoid introducing additional abbreviations here. Again, how do you determine the significance of the ROCSS?
Line 305:
Fig. 31 should be Fig. 3
Can you elaborate how the ROCSS is determined, what exact values are taken into account? Do you use daily values to calculate the scores or weekly/seasonal aggregated values? This is still unclear after reading the manuscript.
Fig 3: it is confusing that in the plot description and the text you mention ROCSSs etc. but on the x axis S-SA W-Sa etc are displayed. I suggest to unify all and make the plot more readable.
Line 320-322:
Maybe because I do not understand what the windows of opportunity are, I do not get the message here. I suggest to more explicitly formulate what the impact is of changing the initial conditions and the forecast input. The result indicates that it is not worth using seasonal forecasts at all, which is hard to believe. Please elaborate such that the message gets clearer.
Line 323: again the title does not reveal to me what will be considered in this paragraph. I suggest to use a more appropriate title for this paragraph.
Figure 4: It is hard to follow what is shown here. What is the relative sensitivity. Maybe it helps if you refer to the exact paragraph number in the label of the figure. In addition, is there a reason why you use a color coding and a size coding? I suggest either using color or size, otherwise it seems that multiple aspects are coded.
Line 347-348: “Hence, a significant fraction of predictability is originating from the SEAS5 dataset although the largest source remains ERA5 data over the warm-up”. This sentence illustrates what I think makes the manuscript complicated to follow. The reader himself must make the connection what this means. If I am correct, it shows that the initial conditions are more important than the driving meteorological predictions. Is this correct? It would make the manuscript much more readable if you directly refer to formulations what it actually means in addition to just give such “cryptic” explanations.
366-368: … sources of seasonal water quality skill... You mean water quality forecast skill, water quality itself does not really have skill, does it?
Line 394-395:
Literature on streamflow hindcasts broadly shows that beyond the transition month, climatology-driven hindcasts are typically 395 more skillful than hindcasts driven by seasonal climate predictions (Arnal et al., 2018; Bazile et al., 2017; Greuell et al., 2019).
This is a misleading interpretation, when you say “climatology driven hindcasts”. In all three publications a well established ESP (ensemble streamflow prediction) approach is used. This can be seen as a climatology driven hindcast, but for a scientific publication I would expect to have a clearer formulation. In addition, in these papers there is no transition month mentioned, a concept I do not understand. All papers mention the first lead time month. What exactly is the transition month in your analysis?
Citation: https://doi.org/10.5194/hess-2022-312-RC2 -
AC3: 'Reply on RC2', Francois Clayer, 31 Oct 2022
Thank you for this thorough and very complete review. I apologize for the lack of precision in our terminology and any impact it might have had on your understanding of this study. We still hope there is some time for discussion if there is additional interest.
Yes, it is true that Hydrometeorological prediction was not our main research interest at the start of our project, although some of our co-authors and project partners are highly experienced scientists in weather forecasts.
We will make sure to use the standard terminology used in this field. Thank you for picking those multiple loose threads that needed further attention. We will make sure the terminology and writing style is unified throughout the manuscript.
Below are some preliminary responses to specific points the reviewer raised (“excerpts from the reviewer’s comments are underlined for clarity”).
windows of opportunity: these windows of opportunity are indeed not properly defined in the text. Mentioned in the introduction l. 89-93. A window of opportunity will be defined as a skillful forecast for a given variable’s tercile during a given season. In other words, a trustful forecast that a stakeholder can use as a decision support information. For example, the lower tercile of surface temperature in Spring at our Norwegian site is a window of opportunity (Table 4). In other words, our workflow is able to forecast when surface temperature will be lower than normal in Spring at the Norwegian site. We focus on those because the whole project and study here is to be seen from a stakeholder point of view, and we would like to focus only on forecasts that we are confident are the most trustful. This point will be made clearer from the start, to also respond to your comment on l. 194-197.
Hindcasts: Sorry for the confusion here. Yes we used hindcasts for the 92 three-month seasons (11/1993 to 11/2016) (see L. 170). We refer to 1994-2016 for simplicity, but we will make sure that everything is clear and consistent throughout the manuscript, also when the hindcasts were produced.
Another point is the source of predictability. We agree that the SEAS5 forecasting system cannot be seen as a source of predictability, it likely further transfers predictability from e.g., ENSO and NAO, as you mentioned, to our catchment and lake model. We will consider this point very carefully as it impacts the title as well. Thank you for picking this up. We will make sure to avoid these confusions throughout the manuscript (e.g., L. 53-55; L. 76-77 …) and any other confusion caused by unprecise terminology related to forecasting (e.g., l. 167-168; l. 173-174; L. 174-175; l. 191; l. 347-348; l. 366-368). We will avoid any cryptic formulation.
SEAS5 system and ERA5 resolution and downscaling issues: We have used standard tools and have colleagues in the group that are experienced with this type of issues, so I know we have bias-corrected, down-scaled ERA5 and SEAS5 data in an acceptable way. We will provide further detail on these pre-processing steps in the revised version of the manuscript. For now, you can have a look at Mercado-Bettin et al. (2021; https://doi.org/10.1016/j.watres.2021.117286)
I’m really grateful to the reviewer for the quality and completeness of this review, Thank you for these specific and relevant comments. I’m trying to provide preliminary (for now) answers to some of the comments below:
More specific comments
Introduction
Line 24-26: Do you mean the skill of the meteorological predictions (SEAS5 outputs) are worse then the skill of the hydrological predictions?
Yes this is what we mean here. Meteorological predictions were worse than the skill of the discharge (to only some extent) and lake water temperature predictions.
Line 70 “…temperature predictions and forecasts” and L. 72
Yes thank you, we will make sure to have a consistent terminology
Line 80-84: I do not understand what the authors try to argue, what is meant by “water flow predictability”, do you mean discharge of the rivers? Lake level heights? Can you make this sentence clearer?
Yes we meant river discharge. We will make that clearer.
Line 90. We will introduce ice-off.
Line 90-91 and Table 4 (showing actual values of ROCSS and FRPSS) and Table 5
The aim of this table (Table 4) was rather to quickly show where and when the forecasts were skillful, and compare the skills of SEAS5 forecasts (climate predictions) with lake forecasts. We can provide all values in the supplementary to avoid overloading the table. Note that we haven’t considered all forecasts with ROCSS and FRPSS higher than 0 as “skillful”. We have described this L. 194-197:
“Threshold RPSS and ROCSS values above which RPSS and ROCSS are significant at 95% confidence are calculated by built-in VisualizeR functions and were used to identify windows of opportunity (i.e., combinations of seasons, variables and terciles for which forecast performance was significantly better than the reference). In our case, these thresholds typically range between 0.47 and 0.55.”Admittedly, our formulation is lacking clarity. We will provide more details in the revised MS in the text and the table caption. This is also linked to your comment on L. 262 on how fair we determine a RPSS is significant and Table 5.
Methods:
Line 101 to 111: Thank you for your nice suggestions. We will incorporate those.
Line 114-115: As stated above, we will provide more detail on the bias-correction method used (also from L. 125).
Line 116: term “impact models” and “impact variables”.
Yes these are the water quality models and variables. We will unify these terms and them throughout the manuscript.
Line: 125
Unfortunately, we did not do any forecast verification with different bias-correction. Given the frame of the project and the resources it takes, it will unfortunately not be possible. On the other hand, we can look at additional scores and measures.
Line 127-128:
We apologize for the messy abbreviations of variables here, we will make sure variables' names are spelled out, clearly defined and consistent throughout the manuscript. Given that we applied different models at the four sites, sometimes the forcing weather variable were slightly different. We will make sure to describe that and harmonize everything.
Line 130-131: Yes we can provide further detail on the observations, these were collected by our institutes, or published datasets.
Line 143-145:
All hydrological models were calibrate against local observations, this is described in Mercado-Bettin et al. (2021; https://doi.org/10.1016/j.watres.2021.117286) but will be briefly described here again for clarity. Note that models were calibrated and validated against two different time periods following best-practices.
Line 152: Thank you, we will do.
158-159: “Most common statistical goodness-of-fit parameters, e.g., Kling-Gupta efficiency (KGE), NSE and RMSE, for hydrological and lake modeling were calculated.”
We will explicitly describe the values of these performance criteria.
Line 174-175: Aggregation prior to forecast verification
We don’t do any lead-time dependent verification, we only aggregated the forecast to seasonal means (1 value for the whole 3-month period for each year between 1993 and 2016) and used that in our verifications and calculations of skill scores. We will make sure to clarify this point in the manuscript.
Table2 and full paragraph:
Thank you for your suggestions here. Yes pseudo-observations were used as the reference forecast. In addition, we used also observations as the reference forecast, when and where it was possible (when data gaps where below a given threshold). We apologize again, this threshold was not described properly. Note however, that the “Obs coverage” in table 5 highlight the cases for which there are enough observations. This point will clarified.
Line 200-227: I struggle a lot to follow the explanation. First of all what is ROCSSs/ ROCSSw/ ROCSSw+t
Thank you for your suggestions on rephrasing the paragraph, confused formulations and updating the paragraphs’ titles. These ROCSS are the skill scores calculated from the forecasts of the various sensitivity analyses (SA) described just before, where forcing data over the target season (S), the warm-up period (W) and warm-up plus transition periods (W+T) was replaced by random data. As described l. 216-217: “The outputs of S-SA, W-SA, and W+T-SA were used to produce tercile plots and calculate ROCSS. The comparison of the ROCSS values (ROCSS_i) obtained for the various SAs” We will clarify this formulation to make sure this point is not confused.
Line 240-244:
This will be clarified. In fact, we will provide lake heat budgets for each site to support this, as it was also raised by the other reviewer. This is supported by our data analysis and also shown in the literature (e.g., Blottiere, 2015).
Results
Line 251: We will add part of Table S2 into the main text.
Line 262: Sure, we will provide more background on fair RPSS and what it accounts for.
Line 269-270:
“Only 0 to 10% of the SEAS5 climate 270 hindcasts are skillful, on average”.
I struggle understanding this result. The RPSS (or any score) is usually determine based on a large sample forecasts (or hindcasts). Of course, for an individual forecast this might be poor but the full picture of forecast performance can only be revealed when the Scores for many issued forecasts are investigated.
Here RPSS and ROCSS are determined based on the 23 seasonal means (spring, summer, autumn, winter) from 1993 to 2016. As described l. 189-191, Briefly, the RPSS provides a relative performance measure on how well the probabilistic ensemble is distributed over the lower, middle and upper terciles, while the ROCSS provides a relative measure of discriminative skill for each category.
I think here you should add a figure with the results where the reader can see how the hindcast performance actually is. From the text and the table alone, it is rather difficult to follow your argumentation. In addition, what is the definition of a skillful hindcast in your context? Is every hindcast with a ROCSS>0.5 skillful or do you use other thresholds? It would be crucial to mention the numbers at least once in the results section as well.
We were struggling to find a clear and concise way to create a figure showing hindcast performance but we were looking for this. So we would be very happy if you have a suggestion? Maybe a figure from a published paper than we can be inspired from. How would you like to see this information plotted?
Regarding skillful forecasts, we will clarify this. We basically consider that all hindcast with significant ROCSS are skillful forecasts (where the threshold is in between 0.47 and 0.55 (as described above).
Line 278-280: How many seasons are discarded due to missing observations? Please indicate the exact numbers such that the reader knows how many samples (hindcasts) are actually used for the analysis. Or at least refere to the table where the numbers are listed.
We calculated ROCSS_obs only when >50% of the seasons (i.e., 12 seasons) were represented by some data. But in practice, most of the variables for which we calculated ROCSS_obs had 100% (i.e., 23 seasons) covered by observations. See also “Obs coverage” in Table 5. We will introduced this notion of observation coverage earlier in the manuscript and make sure everything is clear how we calculated ROCSS_obs, including the number of seasons.
Line 283-286: Yes we will rephrase here for clarity.
Table 4
Again general comments: abbreviations are non intuitive. FRPSS is not introduced before. I do not get the message of this table. I would prefer to see the skill scores (e.g. as boxplots) over different seasons for the variables. It is not clear to me what temporal aggregation is the baseline of this analysis.
The main message here is to show that there is limited skill of the SEAS5 forecasts (climate predictions) but still some skill in the lake forecasts. In many cases, those are not synchronous, e.g., in Summer in Spain, 5 variables’ tercile are associated with significant ROCSS, what we call “windows of opportunity”, whereas there are only 2 weather climate variables’ terciles with significant ROCSS. Looking at this table, we were hoping that the reader would say, “Oh, there is forecasting skill coming from somewhere else that SEAS5, e.g., inertia”
Regarding temporal aggregation, this is again based on seasonal means. We apologize for the oversight, and will include that in the caption.
Line 305:
Fig. 31 should be Fig. 3 This is Fig. 3l with “l” for “Luke Skywalker” for panel “l”. We will use capitals to avoid misunderstanding.
Can you elaborate how the ROCSS is determined, what exact values are taken into account? Do you use daily values to calculate the scores or weekly/seasonal aggregated values? This is still unclear after reading the manuscript.
ROCSS are determined from seasonal means. We will make that point clear throughout the manuscript.
Fig 3: it is confusing that in the plot description and the text you mention ROCSSs etc. but on the x axis S-SA W-Sa etc are displayed. I suggest to unify all and make the plot more readable.
We will replace “S-SA”, “W-SA” and “W+T-SA” with their respective ROCSS_i expressions.
Line 320-322:
Maybe because I do not understand what the windows of opportunity are, I do not get the message here. I suggest to more explicitly formulate what the impact is of changing the initial conditions and the forecast input. The result indicates that it is not worth using seasonal forecasts at all, which is hard to believe. Please elaborate such that the message gets clearer.
We will describe better what is the impact of changing the initial and boundary conditions. We suggest that there is a lot of skill that originates from legacy effects from the catchment model, and form inertia of the lake/reservoir systems themselves. Only at the norwegian site for specific variables and seasons, there are signs that using seasonal weather predictions provide further skill. For the other sites, we cannot see any sign of this. So, yes for most of the forecasts that were significantly skillful, it was not worth using seasonal forecasts to force the models. We will make this point clearer.
Line 323: again the title does not reveal to me what will be considered in this paragraph. I suggest to use a more appropriate title for this paragraph.
Thank you, we will provide a more relevant title.
Figure 4: It is hard to follow what is shown here. What is the relative sensitivity. Maybe it helps if you refer to the exact paragraph number in the label of the figure. In addition, is there a reason why you use a color coding and a size coding? I suggest either using color or size, otherwise it seems that multiple aspects are coded.
Yes, we will refer to the exact paragraph number for the description of how relative sensitivity is estimated, see L. 229-234, section 2.1.1. And you are right, color and size show the same information. We will keep only the size coding.
Line 347-348: “Hence, a significant fraction of predictability is originating from the SEAS5 dataset although the largest source remains ERA5 data over the warm-up”. This sentence illustrates what I think makes the manuscript complicated to follow. The reader himself must make the connection what this means. If I am correct, it shows that the initial conditions are more important than the driving meteorological predictions. Is this correct? It would make the manuscript much more readable if you directly refer to formulations what it actually means in addition to just give such “cryptic” explanations.
Yes you understood this sentence right, and again we apologize for the cryptic formulations. Learning the forecasting terminology as we go. We will improve that.
Line 394-395:
Literature on streamflow hindcasts broadly shows that beyond the transition month, climatology-driven hindcasts are typically 395 more skillful than hindcasts driven by seasonal climate predictions (Arnal et al., 2018; Bazile et al., 2017; Greuell et al., 2019).
This is a misleading interpretation, when you say “climatology driven hindcasts”. In all three publications a well established ESP (ensemble streamflow prediction) approach is used. This can be seen as a climatology driven hindcast, but for a scientific publication I would expect to have a clearer formulation. In addition, in these papers there is no transition month mentioned, a concept I do not understand. All papers mention the first lead time month. What exactly is the transition month in your analysis?
We will be more precise when referring to these studies. The main point here is that hindcasts driven by seasonal climate predictions are not necessarily more skillful.
We will replace the “transition month” with lead month 0, in agreement with Greuell et al., 2019, i.e., the month following the date on which the forecast would have been issued. If the forecast was issued on February 1st, 2016, everything before that date is part of the warming period, February 2016 is what we used to call the “transition month”, lead month 0, and the target season is the lead month 1 to 3: spring 2016, march to May 2016. This clarification will be included in the method section.
Again, thank you for this thorough review and for giving us the opportunity to improve by kindly pointing to our lack of precision in our terminology. Most of reviewers would just not bother to provide such constructive feedback.
Citation: https://doi.org/10.5194/hess-2022-312-AC3 - AC4: 'Reply on RC2', Francois Clayer, 31 Oct 2022
-
AC3: 'Reply on RC2', Francois Clayer, 31 Oct 2022