Comment on hess-2021-604

investigates the performance of three continental- / global-scale streamflow forecasting systems (HTESSEL, EFAS and E-Hype) in predicting inflows to Lake Ijssel – a major surface water reservoir in the Netherlands. All three forecasting systems are driven with ECMWF SEAS5 seasonal climate forecasts but differ in their underlying hydrological modelling approach. The authors applied bias correction to streamflow forecasts using Quantile Mapping and subsequently assessed the skill of raw and post-processed forecasts for predicting inflows. A particular focus was placed on dry conditions, i.e. the predictability of low-flow events.

I would like to thank the editor for the opportunity to review this manuscript.
This study investigates the performance of three continental-/ global-scale streamflow forecasting systems (HTESSEL, EFAS and E-Hype) in predicting inflows to Lake Ijssel -a major surface water reservoir in the Netherlands. All three forecasting systems are driven with ECMWF SEAS5 seasonal climate forecasts but differ in their underlying hydrological modelling approach. The authors applied bias correction to streamflow forecasts using Quantile Mapping and subsequently assessed the skill of raw and post-processed forecasts for predicting inflows. A particular focus was placed on dry conditions, i.e. the predictability of low-flow events.
In my view, the overall focus of the study to compare three leading continental-or globalscale streamflow forecasting systems for predicting streamflow at the local-scale is interesting and I believe the study has applications and implications that may be relevant to a wider audience, e.g. how to best translate outputs from continental-or global-scale streamflow forecasts into local-scale applications. However, my main concern relates to the novelty of the study. In its current form, the manuscript is written similar to a Technical Report that describes methods and results of forecast post-processing and verification applied to one single study location, without sufficiently linking it to the research context, and may be of interest to a very local audience. I am missing an attempt to generalise the findings and place them into a broader context and emphasise implications that are applicable in other regions too (using Lake Ijssel inflows as a case study). If the authors addressed this, I believe the study would be of interest to a wider audience.
The study design and methodology (including post-processing and forecast verification) are robust. While more advanced streamflow post-processing methods exist and could be investigated, the authors clearly show an improvement in skill for the study location of interest. The application of multiple verification metrics (continuous ranked probability skill score, mean error, Brier skill score and reliability diagrams) to 23 years of hindcast data, in a cross-validation approach, is appropriate and thorough.

Major comments:
Novelty and relevance to a wider audience: As outlined above, the novelty of the study and relevance to a wider audience is not very clear. I would ask the authors to place the work into a broader research context and interpret the results more broadly, highlighting implications for researchers and practitioners in other regions. While the introduction places the research into a wider context to some extent, there is no clear link between the introduction and the rest of the manuscript which describes the results in a very detailed way, focusing on one individual location. It would be great if the authors could come back to the broader research gap in their discussion and conclusions, and interpret the results more broadly, e.g. what are key messages for the hydrological community who aim to apply regional-or global-scale seasonal streamflow forecasts for individual catchments? How does this study compare to similar systems implemented in other parts of the world? What are advantages and disadvantages of the described approach? Results and Discussion: The results section is very comprehensive and describes the results of the forecasting approach in a very detailed way, focusing on a range of forecast performance metrics. However, I am missing an interpretation that goes beyond simply describing the results. The discussion section itself is very short and immediately starts with limitations of the study and further research, without an actual discussion and interpretation of the study findings. I would ask the authors to add more discussion of their results -linking them back to their research aims or questions that are outlined in the introduction, placing them into the context of the existing literature and highlighting implications or applications of the findings (followed by limitations and further research, as is already included). Potential points of discussion: What can we learn from these results that is relevant in other regions? Can some of the differences in results be traced back to differences in the underlying specifications of the hydrological models (e.g. consideration of routing) and what does it mean for other locations? Could a multi-model ensemble approach be useful? Readability: I had difficulties following some sections. The manuscript would benefit from revision with the aim of re-wording sections and sentences to be clearer and more concise. Additionally, there are some typos throughout the text (I have included a few examples under "minor comments" but it applies to the manuscript overall).

Other comments / suggestions:
Some of the results and figures are very detailed and could be presented in a more concise or synthesised way. A few suggestions and observations in relation to the figures: Figures 4-6: Could you present the results for each forecasting system (raw and bias corrected outputs) and each lead time, aggregated over all months (April to September), to be able to compare the systems more generally? One possibility could be to add an additional sub plot that presents aggregated results across all months. Figure 10: Similar as above, it would be great to see the overall RPSS scores aggregated over all years and months, to compare the three systems directly with each other. Would it be possible to add another sub plot that presents the results aggregated across all years and months? Figure 11: I found it a little unusual to see the years on the x-axis, but in order of dryness / wetness. Would it be possible to use annual streamflow as x-axis (i.e. presenting forecast skill as a function of average streamflow)? Figure 12: It is not clear to me what the right column of Figure 12 shows. It would be good to provide more explanation in the text and/or caption -how could this be interpreted and used?