Changes in the simulation of instability indices over the Iberian Peninsula due to the use of 3DVAR data assimilation

Peninsula due to the use of 3DVAR data assimilation Santos J. González-Rojí1,2, Sheila Carreno-Madinabeitia3,4, Jon Sáenz4,5, and Gabriel Ibarra-Berastegi6,5 1Oeschger Centre for Climate Change Research, University of Bern, Bern, Switzerland. 2Climate and Environmental Physics, University of Bern, Bern, Switzerland. 3TECNALIA, Basque Research and Technology Alliance (BRTA), Parque Tecnológico de Álava, Vitoria-Gasteiz, Spain. 4Department of Applied Physics II, University of the Basque Country (UPV/EHU), Leioa, Spain. 5Plentzia Itsas Estazioa, PIE, University of the Basque Country (UPV/EHU), Plentzia, Spain. 6Department of NE and Fluid Mechanics, University of the Basque Country (UPV/EHU), Bilbao Engineering School, Bilbao, Spain. Correspondence: Santos J. González-Rojí (santos.gonzalez@climate.unibe.ch)

In their study, González-Rojí et al. investigate three different convective parameters obtained from two dynamically downscaled WRF model runs over the Iberian Peninsula. Over a 5-year period, the convective parameters from the WRF runs are quantitatively evaluated with sounding data and spatially investigated for different seasons. In addition, the spatial distribution and variability of the convective parameters is investigated and related to certain precipitation characteristics from the literature. The authors found that WRF runs with 4Dvar assimilation best reflect the convective situation.
Overall, the work is well structured and written with a good balance of text and figures. My main concern is that large parts of the paper are rather descriptive in the sense that mainly the figures are described and not interpreted. Reasons for the discrepancies found between the data sets are not given -although that would be most interesting and would increases the scientific value of the paper. In the current version, the benefit of the work for a larger community remains unclear. In the following you find a list of major and minor points as well as some suggestions for editing.

Major revision points:
1.) After reading the paper, more questions arise than answers or new scientific insights are given. This is because the paper mainly describes the figures, but does not provide explanations. Questions are: Why do the assimilation runs perform better compared to the simple WRF downscaling? Since the convective parameters considered depend on both temperature gradient and moisture, hat is better reproduced? On which levels/layers? Depending on the location (sounding station) and the season? Why are the differences between the models greater at some stations than at others (depending on the parameter)? What is the relation between CAPE and TT index? 2.) The main conclusion of the paper is that the assimilation run performs better compared to the run without assimilation. But is this not to be expected if soundings are assimilated for which the comparison is made afterwards? What would be the result if you left out some of the soundings for the assimilation and made the comparison for these locations? 3.) Are you sure that ERA-Interim did not originally assimilate the eight soundings you considered? It does not make sense to assimilate any data set twice. 4.) Either there is a general misunderstanding of convection triggering or the formulations are clumsy. Convective instability and sufficient moisture at lower levels are necessary but not sufficient conditions for the development of convective storm. Convection initiation requires additionally a lifting mechanisms that either reduces CIN or lift a parcel to the level of free convection (LFC). High CAPE/TT values neither trigger convection nor can they directly be related to precipitation as written several times throughout the manuscript. 5.) CIN works only in conjunction with CAPE. In case of zero CAPE, CIN doesn't matter for convective initiation or development. Analyses of the mean values or the spatial distribution of CIN are useful only when considering days with a certain amount of CAPE (or instability in general). 6.) Using only the nearest grid point to a sounding station neglects the horizontal drift of the radiosoundings. A better choice would be to consider the average value of several grid points. 7.) No reference is made on the original ERA-Interim fields. Thus it is not possible to assess the added value of the downscaled model runs and the need for higher resolutions of the data. 8.) The last section "Conclusions" is only a summary without any (general) conclusions. Tell us what other scientists may learn from your study. 9.) A thorough language check is necessary (e.g., "…observations in the stations…" or "obtained in stations" or similar formulations used throughout the manuscript are incorrect/weird).

Minor revision points:
1. Explain why you have selected CAPE, CIN (note my comment above), and TT and not others, in particular indices that either estimate potential or conditional instability or dynamical properties (deep layer shear, storm-relative helicity; or an index combining thermodynamical and dynamical properties). Is there any cross-correlation between those parameters (CAPE vs. TT)? Also explain why you have only considered a 5-year period, which is far from being representative for the general climate. 2. It's very difficult to compare the different sub-figures due to different axis ranges. I suggest to using the same scaling within one figure. 3. When describing the general convective situation over the IP / over Europe, you should consider also more recent literature. 4. Why have you created your virtual WRF soundings only from one grid point? As correctly stated in the text, the soundings may drift over some distance during the ascent. Using an array of 3 x 3 grid points or so would have been a better choice. Please add a comment on that. 5. L1 (see major point above): Instability does not trigger convection. 6. L2 (also L29-30): CAPE/CIN are measures of the energy and not instability indices. 7. Shorten the abstract and focus on the essentials. 8. L14: "the ingredients for the development of convective precipitation": As alluded to previous, you investigated only the convective environment, thus only one part of the ingredients. 9. L22-23: Do you mean warm fronts? Note that cold fronts especially during summer frequently trigger convective storms by cross-circulations. Thus, classifying precipitation into frontal and convective does not make sense. Convective precip is not triggered by convective instability (see major point 4). 10. L24: "The latter is usually associated with extreme events due to their intensity and short duration". Convection is per se not extreme. And you may add here "high intensity". But the short duration is not the reason why convection may become extreme (or rather related precip and wind). 11. L24-26: The limited skill of NWP models to reliably simulate convective precipitation is not because of their low resolution (note that several European weather services run their models already at 1 km resolution), but partly caused by forecast errors on the synoptic scale, which drive the predictability of convection initiation, and various sources of uncertainty on small scales such as limitations in the assimilated observables or microphysical schemes. There exist a bunch of literature on that. 12. L37: It is impossible to estimate the life cycle or the intensity of convective storms from thermodynamic quantities solely. For organized convective storms, which represent the most intensive storms, you require sufficient vertical wind shear -speed shear (crosswise vorticity) for multicells, and directional shear (streamwise vorticity) for supercells. 13. L41-42: You state that "a (high; include) spatial and temporal resolution is important" for resolving vertical lifting, and thus regional simulations are needed. But in your study, you investigate only the convective environment and not the mechanisms relevant for convective initiation. So I do not see why you need higher resolved met. fields. 14. L43-44: The reason why convection peaks in the afternoon is related to solar irradiation. This is a fact and not "suggested by previous studies". 15. L44-45: Van Delden used only Synoptic stations with a 6-hourly resolution for their statistics.
He found that "most thunderstorms occur at 18 and 24 UTC". 18 means the period from 12 to 18. Thus thunderstorms are most frequent between 12 and 18 UTC! But: It would be better to cite more recent studies based on lightning detections such as, for example, Piper and Kunz, 2017 (Nat. Haz. Earth Syst. Sci.; Fig. 4 Consider to move it to the introduction. 26. Section 2.2: Why haven't you considered IGRA sounding data? 27. L135: For readers outside of Europe it would be helpful to include here also local times (approximately). 28. L146: what is meant by "…the analysis increments are stronger at 12 UTC…"? And by "Strong increments are observed during summer…" in L148? Also the relation to the cold-bias in L149 is unclear. 29. L150: "…the effect of the assimilation is not restricted only to the station location". This is a very crucial point. Unfortunately, you did not show that.  see major point 2 30. Sect. 2.3.1: Please explain briefly how you compute the lifting curve from the surface/mixed level to the LCL to the LFC and to the LNB (including quantification of e). 31. L158: Do you have any reference for the statement that soundings "take many minutes to measure the profile of the atmosphere"? The multiplicity of soundings I performed in the past took ~ ½ hour to reach the LNB. 32. L169: What is "an isobaric precooling" and why was it applied? 33. L172: As TT relies on temperature differences, the unit (°C, K) does not matter. : "…small differences in initial conditions…"; can you be more specific here (also with regard to TT, as already alluded above)? 44. Figure 3 (CAPE) shows very large differences of the standard deviation between the different models and for some of the stations. Any idea on that? 45. L250: Could you be more specific? 46. L254: "N tends to overestimate the variability in every season and for most of the stations…" Why? 47. L255: "..presents the largest values during winter, which agrees with the fact that the northern and northwestern IP receives greats amount of rain during that season". Is winter rainfall really dominated by convective precipitation? I cannot find any statements in the cited literature. 52. L281 and following: As already mentioned above (see major comment 5), CIN is relevant for convection only in combination with CAPE (An example: imagine a day with zero CAPE and zero CIN; another day with CAPE = 3000 J/kg and CIN = 300 J/kg. None of the days would have the right conditions for deep moist convection to occur. The average of the two days would give CAPE = 1500 J/kg and CIN = 150 J/kg. Fair values for DMC). You could simply fix that by considering CIN only on days for CAPE in excess of 50 or 100 J/kg. 53. L305: "…lowest values are observed near the coastal valleys…" why? 54. Figure 8/9: The spatial distribution of TT and CAPE in most of the cases is contrary, i.e.
regions with higher CAPE have lower TT values and vice versa. Any explanation of this apparent contradiction? 55. L326 and following: See major comment 5 and minor 52. 56. L336-345: The relation to "dynamics" does not fit here as the paper solely has a thermodynamical perspective. Be careful with the relation between convective conditions and precipitation. Edits: