Hydrometeorological evaluation of two nowcasting systems for Mediterranean heavy precipitation events with operational considerations

. Heavy precipitation events and subsequent ﬂash ﬂoods regularly affect the Mediterranean coastal regions. In these situations, forecasting rainfall and river discharges is crucial especially up to six hours, which is a relevant lead time for emergency services in crisis time. The present study investigates the hydrometeorological skills of two new nowcasting systems: a numerical weather model AROME-NWC and a nowcasting system blending numerical weather prediction and extrapolation of radar estimation called PIAF. Their performance is assessed for 10 past heavy precipitation events that occured in southeastern 5 France. Precipitation forecasts are evaluated at a 15 min time resolution and the availability times of forecasts, based on the operational Météo-France suites, are taken into account when performing the evaluation. Rainfall observations and forecasts were ﬁrst compared using a point-to-point approach. Then the evaluation was conducted from an hydrologic point of view, by comparing observed and forecast precipitation over watersheds affected by ﬂoods. In general, the results led to the same conclusions for both evaluations. On the very ﬁrst lead times, up to 1h15/1h30 of forecast, the performance of PIAF is higher 10 than AROME-NWC. For longer lead times (up to 3h) their performance are equivalent in general. An assessment of river discharges simulated with the ISBA-TOP coupled system, which is dedicated to Mediterranean ﬂash-ﬂood simulations, forced by AROME-NWC and PIAF rainfall forecasts, was also performed on two exceptional past ﬂash ﬂood events. The results obtained for these two events show that using AROME-NWC or PIAF rainfall forecasts is promising for ﬂash-ﬂood forecasting in terms of peak intensity, timing, and ﬁrst rise of discharge, with an anticipation of these phenomena that can reach several 15 hours. The present study investigates the hydrometeorological skill of AROME-NWC ( ? ) and PIAF ( ? ), two new nowcasting systems operationally used at the French meteorological service Météo-France. The main objectives are to compare and evaluate their performance for 10 past Mediterranean heavy precipitation events and to suggest some ”best practices” for real-time forecasting. The effect of spatial resolution, lead time and precipitation intensity on forecast skill is studied at a 15 min time 65 resolution. Precipitation forecasts are ﬁrst evaluated from a meteorological perspective with a synoptic-scale veriﬁcation using a point-to-point approach. Rainfall observations and forecasts are compared at each grid point of an area covering southeastern France. Then an evaluation on scales relevant to hydrology is performed, by comparing observed and forecast rainfall averaged over watersheds affected by past ﬂoods. An assessment of river discharges simulated with ISBA-TOP, a model dedicated to Mediterranean ﬂash-ﬂood simulations ( ?? ), forced by AROME-NWC and PIAF rainfall forecasts was also conducted on two 70 French exceptional recent ﬂash ﬂood events. The performance of the forecasts is assessed in terms of intensity and timing of the ﬂood peak.

occurrence of such intense rainfall over small areas and catchments up to a few hundreds of square kilometers often trigger devastating flash floods. These events threaten people as well as property (??) and result in direct economic losses valued at hundreds of millions of euros each year. Even if significant progress has been realised last decades, very localised and high intensity rainfall events such as those generated with the Mediterranean precipitating systems are difficult to forecast (???).

Among the difficulties encountered :
It :: is ::::::: difficult : to forecast heavy precipitation events with accurate intensity, chronology 25 and location, the most important . ::::::: Among ::: the ::::::::: difficulties ::::::::::: encountered :::: there : are the complex features and variability of deep convection and the associated small space-time scales that are hardly predictable. Nowcasting systems suit these scales with high spatial and temporal resolution short-term forecasts (usually up to a few hours). They can be based on extrapolation of observation :::::::::: observations, or rely on mesoscale numerical weather prediction or else combine these two approaches. The concept of extrapolation of radar echoes for short-term precipitation forecasting was first developed in 1953 (?). This approach has been 30 deepened by the search for fast-computing algorithms or methods allowing the best forecasts of the evolution of precipitating cells (in terms of displacement, size and intensity) from the radar imagery data. Among them there are those based on crosscorrelation (eg ?) and those based on individual radar echo-tracking (??). These methods were later improved to ensure the consistency of the velocity field reconstructed after extrapolation (???). Although widely used, the accuracy of extrapolation methods is limited because they are not able to forecast convective storm initiation, growth and decay (?). Beyond one or 35 two hours, the use of numerical weather prediction is necessary to depict rapidly changing conditions. The latest generation of mesoscale models allows to reach kilometer horizontal resolutions and is able to reproduce fine-scale boundary layer and convection processes. Numerical weather prediction systems have been configured to meet the requirements of nowcasting, i.e. to have very short-term forecasts, updated with the latest observations, in the fastest possible time. These systems are based on the same kilometric horizontal resolution models as the forecast models used for forecasting the weather over the 40 next 24-48 hours, but their assimilation frequency and windows are adapted to allow the forecasts to be refreshed frequently with new observations while ensuring short forecast delivery times (?). AROME-NWC (?), which is a system operationnaly used at the French meteorological service Météo-France, is one of them. Thus, several nowcasting system types with various skills coexist at lead times between 1 and 3-4 hours. Methods have been developed to combine extrapolation methods and numerical prediction systems. Seamless forecasts can be obtained by weighting precipitation fields from radar extrapolation 45 and numerical weather prediction forecasts. The first approach in blending nowcasting was introduced by ? with an heavy weight for the extrapolation nowcasting during the first hour and the heavier weights transitioning to the numerical forecasts with increasing lead time. A new nowcasting system called PIAF (?) blending numerical weather prediction and extrapolation of radar estimation has recently been developed by the nowcasting department at Météo-France. Its skills for rainfall forecasting still need to be assessed.

50
The short hydrological response times ranging from few minutes to few hours after heavy downpoors are a major issue for notifying at-risk populations and planning the intervention of emergency services in crisis time. The use of rainfall nowcasting allows to extend the lead time of hydrological forecasts by a few hours compared to the mere use of observed precipitation data that limits the forecast lead time by the catchment response time.?, ? and ? have demonstrated the benefit of using deterministic rainfall forecasts obtained from radar-based extrapolation as input to hydrological models. Other studies explored the 55 potential of probabilistic rainfall nowcasting for flash-flood forecasting (??). Nowcasting flash floods provides an anticipation time sufficient enough to boost the preparedness of people and civil protection and sometimes a valuable time to prevent the authorities from being completely unprepared for the occurring or upcoming event (?). The availability time of the rainfall forecast is thus crucial for real-time streamflow forecasting. Whereas operational hydrological models are often fastrunning (i.e. finishing in seconds to minutes) weather forecasts require more time to be delivered. Therefore, this delay is taken into 60 account in this study to consider the operational real time constraints.
The present study investigates the hydrometeorological skill of AROME-NWC (?) and PIAF (?), two new nowcasting systems operationally used at the French meteorological service Météo-France. The main objectives are to compare and evaluate their performance for 10 past Mediterranean heavy precipitation events and to suggest some "best practices" for real-time forecasting. The effect of spatial resolution, lead time and precipitation intensity on forecast skill is studied at a 15 min time 65 resolution. Precipitation forecasts are first evaluated from a meteorological perspective with a synoptic-scale verification using a point-to-point approach. Rainfall observations and forecasts are compared at each grid point of an area covering southeastern France. Then an evaluation on scales relevant to hydrology is performed, by comparing observed and forecast rainfall averaged over watersheds affected by past floods. An assessment of river discharges simulated with ISBA-TOP, a model dedicated to Mediterranean flash-flood simulations (??), forced by AROME-NWC and PIAF rainfall forecasts was also conducted on two

70
French exceptional recent flash flood events. The performance of the forecasts is assessed in terms of intensity and timing of the flood peak.
This paper is organized as follows. Section 2 describes the case studies, the nowcasting systems, the hydrological system, and the evaluation ::::::::: verification : methods. The results of the different verifications are presented and discussed in Section 3. The conclusions are reported in Section 4.

75
2 Materials and methods

Case studies
The selected case studies concern recent heavy precipitation events which occured in South-East France between October 2015 and November 2018. A rectangular verification zone was defined to investigate the performance of the nowcasting systems during these events. It encompasses the regions along the Mediterranean coast most favorable to intense events (black rectangle 80 in Figure 1) such as the eastern Pyrenees, the southern Alps, and the Cévennes-Vivarais region. This area is characterized by a pronounced topography with steep slopes and narrow valleys. Within this large zone (110000 km 2 ), 19 catchments with areas ranging from 19 to 1100 km 2 and short response times were selected for the hydrological evaluation (Table 1).Watersheds numbered 1 to 8 in Figure 1 are all tributaries of the Aude river in the North-East of the Pyrenees. Watersheds numbered 9 to 12 are located in the Cévennes region. The other watersheds numbered 13 to 19 are located in the French Riviera.

85
To assess the performance of the nowcasting systems, 10 recent heavy precipitation events were considered (first and second columns of Table 2). These rainy episodes are representative of the variety of rainfall intensities and durations and the hydrological responses of the rivers encountered in the French Mediterranean coastal regions. In particular, the two events that occurred on 3 October 2015 in the French Riviera and 14-15 October 2018 in the Aude region are among the latest major tragic flash-flood events that have affected metropolitan France. They represent together 34 deaths and more than 800 million euros 90 of damage. More details about the October 2015 event and the October 2018 event were respectively given by ? and ?.
2.2 The nowcasting systems 2.2.1 AROME-NWC AROME-NWC (?) is a configuration of the French numerical weather prediction system AROME-France (??) especially designed for nowcasting purposes. AROME-NWC is a mesoscale and non-hydrostatic model. Its horizontal resolution is of 1.3 95 km × 1.3 km and its vertical grid has 90 levels ranging from 10 to around 30000m above the ground. The deep convection is explicitly resolved and the microphysical processes are governed by the ICE3 one-moment bulk microphysical scheme (?). AROME-NWC is thus able to forecast mesoscale convective systems that caused heavy rain in the Mediterranean area. Boundary conditions are provided by the analysis of ARPEGE global operational numerical weather prediction model. The AROME NWC initial conditions are provided by a three-dimensional variational (3D-Var) data assimilation of observations available 100 within a 20 minutes windows centered on the analysis time, each hour. The observations are primarily radar (reflectivity and radial velocity) data, screen level measurements, and to a lesser extent, aircraft, sounding and satellite data. Arome-NWC is run every hour and provides short-range forecasts up to 6 hours with a time step of 15 minutes on a domain covering France and adjacent areas. Forecasts were available within 35 minutes at the time of the study.

105
Very recently the nowcasting department of Météo-France has developped a new nowcasting system PIAF (for "Prévision Immédiate Agrégée Fusionnée" in French, ?) which is a data fusion product between radar extrapolation and numerical prediction (the rainfall forecast by AROME-NWC). For the extrapolation of radar quantitative precipitation estimation, radar data are proccessed as follows: rainy cells are identified by windows surrounding areas of connected pixels above a given threshold, the displacement of each cell is determined using the previous image (highest correlation), a gridded motion field is computed 110 from the movement vectors of the cells with different threshold values and applied to the cells to extrapolate them in the future.
The ISBA-TOP configuration used in this study is the one suggested by ? for flash-flood simulations, based on SRTM data 140 for orography, Land use/cover area frame statistical survey topsoil data (?) for soil texture, ECOCLIMAP-II (??) data for land cover, and a spatial resolution of 300m for ISBA.
The ISBA-TOP coupled system has been run during the HyMex special observing periods for real-time prediction of discharges for watersheds in the Cévennes-Vivarais region and the Mediterranean coastline of southeastern France. Its performance was also assessed for Italian watersheds (?). ISBA-TOP is used in operations in the National Institute of Meteorology   145 and Hydrology of Bulgaria for the Arda river flood forecasting (?). In addition to simulating river discharges during flash floods, ISBA-TOP is also able to simulate intense runoff phenomena (??).
2.4 Evaluation :::::::::: Verification : methods To evaluate the quality of the precipitation forecast provided by AROME-NWC and PIAF, the 1 km 2 quantitative precipitation estimates ANTILOPE (?) at a 15 min time resolution, which merged observations from the Météo-France radar and the rain 150 gauge network, were used as reference data, called observed data or observation hereafter. Two verification methods were applied for the rainfall nowcasting evaluation process. The first method, commonly used in the meteorological community, was based on point-to-point comparisons of the forecasts and observations. Comparisons were performed at each grid point of a common one-kilometre resolution grid over a large area of 110000 km 2 covering southeastern France (see the black rectangle of the Figure 1). The rainfall fields were downscaled over the common grid by using a nearest-grid-point interpolation 155 method. The second evaluation was carried out from a hydrological point of view, by comparing observed and forecast rainfall averaged over the surface of watersheds affected by floods. For both evaluations, the available forecasts covering 10 recent heavy precipitation events occurring in southeastern France between 2015 and 2018 were considered ( Table 2).
The observations and forecasts used in this study have different time resolutions (15 min for ANTILOPE and AROME-NWC, and 5 min for PIAF). The comparisons have been carried out by using a common accumulation time step of 15 min.

160
This time step allows to characterize the high rainfall temporal variability, notably for convective situations.

Point-to-point evaluation of the rainfall nowcasting
The results of the evaluation are presented for AROME-NWC and PIAF separately before comparing both systems. Scores used here are described in Appendix A.
The mean and root mean square errors are shown as a function of lead time in Figure 4 for 15-min rain accumulation forecasts. In general the AROME-NWC root mean square error increases slightly with lead time. The mean error resulting 185 from the difference between forecasts and observations, is negative indicating that precipitations are underestimated by the model on average. Four standard categorical verification scores are presented in Figure 5 to summarize the performance of 15-min rain accumulation forecasts from AROME-NWC and PIAF. These are the hit rate, the false alarm rate, the Heidke Skill Score (HSS) and the frequency bias as functions of lead time for two thresholds characterising precipitation occurence and more intense precipitation: 0.5mm/15min and 3mm/15min. The hit rate and HSS slightly decrease with increasing lead 190 time. The false alarm rate of AROME-NWC depends little on the forecast lead time regardless of the threshold. For the higher rainfall intensities (3mm/15min) the frequency bias is greater than 1, indicating a trend to predict too frequently these rainfall accumulations at all lead time.
The PIAF root mean square error increases with lead time up to 2 hours and decreases very slightly beyond (Figure 4b).
During the very early forecast period it increases very quickly. This can be explained by the quality of the extrapolation of 195 radar data, which deteriorates quickly with the lead time. Indeed the effects of advection errors accumulate and increase with successive time steps. The mean error for PIAF is also negative, indicating an underestimation of the precipitation amount on average ( Figure 4a). A quick loss of PIAF accuracy is observed in Figure 5 (decrease in hit rate and HSS) in the first hour of lead time. It is still decreasing up to 1h40 and becomes stable thereafter. As well as for AROME-NWC, too many high rainfall accumulations (3mm/15min) are forecast by PIAF ( Figure 5d).

200
Comparison of the skills of the two nowcasting systems reveals that for lead times in the range T+2 hours to T+3 hours, AROME-NWC obtains on average better results than PIAF. For lead times in the range T+1 hour to T+2 hours, the performance of the two forecasting systems is often close. At lead times less than 90 minutes or 75 minutes, depending on the intensity of the rainfall forecast, the performance of PIAF generally exceeds that of AROME-NWC.
PIAF results from the linear combination of AROME-NWC prediction fields and radar extrapolation. The weights given 205 to each predictor are adjusted according to their recent performance against observation : as ::::::::: described :: in subsubsection 2.2.2.
In general an important weight is given to the extrapolation in the first time steps and for longer forecast times the numerical prediction gains more importance to the point that the rainfall field is entirely provided by AROME-NWC at the end of the PIAF forecast. Note for example that for the latest PIAF lead times, the root mean square errors converge to AROME-NWC errors beyond 2.5 hours ( Figure 4b). However, the quality of PIAF and AROME-NWC may not be equivalent for the same 210 forecast horizon. Indeed PIAF forecasts are systematically based on the latest available AROME-NWC run and therefore the forecast lead time for the AROME-NWC run used in PIAF may be older than that indicated in the PIAF assessment. Since the AROME-NWC availability time is 35 minutes, for an AROME-NWC run initiated at round hour H, only PIAF forecasts initiated between H+40 minutes and H+95 minutes will use this run, and those between H and H+35 minutes will use the AROME-NWC run launched at H-1. Thus the same lead time of two PIAF forecasts does not necessarily rely on the same lead 215 time of the two associated AROME-NWC runs. Differences between PIAF and AROME-NWC skills at the last PIAF forecast lead times may also be explained in several cases, with the fact that PIAF does not switch completely to AROME-NWC (e.g. cases where AROME-NWC is too far away from observations over the last six hours).
The main drawback of the point-to-point verification, especially in the case of convective situations, is to give significant 220 weight to even small location errors (the so-called "double penalty", ??). To give more credit to "close" forecasts a fuzzy method was used to measure the similarity between forecasts and observations in local neighborhoods of the observations. Fraction Skill Scores (FSS, ?) were applied to compare forecast and observed coverage of rain exceeding certain thresholds in spatial windows of increasing size. The FSS were also used to compare AROME-NWC and PIAF and to analyse their performance with the forecast range. The five neighborhood scales used are 1, 5, 10, 20 et 40 km, and six 15-min precipitation thresholds 225 of 0.5, 1, 2, 3, 5 et 10 mm were selected. Figure 6 and Figure 7 show respectively for AROME-NWC and PIAF the 10-events mean FSS results for the various thresholds and window sizes at each forecast lead time. As might have been expected, the greatest skill (highest FSS values) is associated with the largest window and the smallest threshold while the lowest skill (FSS values near 0) is associated with the smaller spatial window and largest threshold. For AROME-NWC, at all lead time the FSS monotically increases with the increase in spatial scale. For PIAF, there is a rapid and significant decrease in FSS values with 230 the forecast range, up to 1h45 lead time. It is mainly due to the decrease in the quality of the extrapolation forecast with time.
The FSS computed from the PIAF precipitation forecast are generally better than those from AROME-NWC, at least for the first 90 minutes of forecast lead time.
3.2 Evaluation of the rainfall nowcasting at the catchment scale 240 To complete and verify the conclusions of the point-to-point evaluation, the rainfall forecasts averaged over the catchments were also studied. Indeed the amount and the location of rainfall forecasts at the catchment scale are essential for hydrological response forecast (??). The studied watersheds are those specified in Table 2.
Just as before, the mean and root mean square errors over watersheds are shown as a function of lead time in Figure 8 for 15-min cumulated rainfall. The root mean square errors for AROME-NWC increase with the lead time. This increase is not 245 strictly monotonous and is noisy due to the size of the sample: for example, a slight improvement in scores for AROME-NWC can be seen for lead times between 165 minutes and 210 minutes (Figure 8b). At lead times up to 4 hours and 15 minutes, AROME-NWC underestimates on average the rainfall accumulations (Figure 8a). The largest overestimates are observed for the events of 2015 and mid-October 2018, as shown by the inter-quartile ranges in Figure 9, which represents the forecast error distributions (forecast values -observed values) of AROME-NWC and PIAF. The hit rate, the false alarm and the HSS vary 250 little with the forecast lead time for the lowest threshold (0.5mm/15min) whereas the signal is more noisy for the 3mm/15min threshold with the hit rate and the HSS deteriorated with the lead time (blue markers for AROME-NWC in Figure 10). For the higher rainfall intensities, the frequency bias increases with the lead time and is significantly greater than 1, indicating that AROME-NWC over-predicts these rainfall intensities. There is a loss of PIAF forecast accuracy with increasing forecast lead time ( Figure 8b). As seen previously for the point-to-point evaluation, the root mean square errors increase very rapidly over 255 the first hour of the forecast.
On average, PIAF rain accumulation forecasts are lower than those observed. The most significant forecast errors are due to the event of October 3, 2015 and more particularly on the Brague river at Biot where the extremes of error in 15-minute rainfall forecasts are approximately +/-20 mm. In terms of categorical scores (green markers for PIAF in Figure 10), there is a visible decrease in the hit rate and HSS as a function of the forecast lead time for the two rainfall thresholds studied. The frequency 260 bias is close to 1 for the 0.5mm/15min threshold and greater than 1 for the higher threshold.
Finally, the results of both evaluation methods in general ::::::::: verification ::::::: methods :::::::::::: (point-to-point ::: and ::::::::: catchment :::: scale ::::::::::: comparisons :: of ::::::: observed :::: and ::::::: forecast ::::::: rainfall) :::::::: generally lead to the same conclusions. The performance of PIAF is very good to good over the first hour of forecasting, but it deteriorates very quickly, to reach about the same or even a lower skill than AROME-NWC beyond about 1h15/1h30 of forecasting. Between 2 and 3 hours of forecasting, AROME-NWC performs better or at the same 265 level as PIAF. It is worth mentionning that the values of the scores as a function of lead time show more variability from one lead time to the next compared to those of the point-to-point evaluation. This might be due to a smaller size of the evaluation sample.

Hydrological evaluation for two case studies
The potential of AROME-NWC and PIAF for flash-flood nowcasting is introduced through the running of hydrological simu-  (Table 2), the best anticipation of three phenomena was studied per watershed.
These are: -the start of increased discharge, defined here as an increase of at least 5 m 3 s −1 in one hour.
-the right order of magnitude of the peak flow value (meaning an error of less than 30% with respect to the reference peak discharge).

280
-the peak time which is also the start of the recession limb.
For a given discharge forecast, the anticipation of one of these phenomena is calculated as the duration between two moments in time: the starting time of the rainfall forecast and the time of the phenomenon in the reference hydrograph (rainfall estimates as input to ISBA-TOP). Results are presented in tables 4 and 5. The objective of these tables is to represent the behaviour of the best discharge forecasts for each watershed. They also provide a concise and comprehensive view of the results. For the 285 "peak time" phenomenon, the anticipation period is only considered if the forecast and reference phenomenon occur at the same time. For the phenomena "start of increased discharge" and "flood peak of the right intensity", a one-hour delay between forecast and reference is accepted. A different colour is assigned depending on the error between the reference flood peak and the forecast flood peak. If the difference between forecast and reference intensity is less than 10%, it is coloured green, if the error is between 10% and 20%, it is coloured orange and if it is between 20% and 30%, it is coloured red.

290
To facilitate the reader's understanding of the tables, the line summarizing simulations for the Fresquel river at Pezens obtained by ISBA-TOP driven by AROME-NWC forecasts (Table 3, coming from Table 4) is taken as an example and is now detailed. Associated discharge time series are shown in Figure 12. :: • : In the reference simulation (in black on the Figure 12), the flow starts increasing at 23:00UTC on October 14. To calculate the anticipation on this rise, we look for the oldest run which forecasts a rise at 22:00UTC, 23:00UTC or 00:00UTC.

295
In this case, it is the run starting at 18:00UTC, so the anticipation on the rise is 5 hours. In the column "Anticipation of the rising flow" of Table 3, we grey out 5 hours before the hour of the phenomenon. To have an idea of the number of cases for which the flow rises are not forecast at the right time, we indicate in the same column of the table the number of runs launched before the flood start time on the reference hydrograph (before 23:00UTC here) which forecast a rise, and among them the number which forecast a rise at +/-1 hour before the reference time. Here, five runs starting before 300 23:00UTC forecast a rise (those from 18:00 to 22:00UTC), four of them between 22:00 and 00:00UTC (all except the run starting at 20:00UTC which forecast a rise 2 hours later). In the column "Anticipation of the rising flow" of Table 3, "5/4" is thus written. For this forecast starting at 04:00UTC, the difference between the forecast peak and the reference peak is less than 10%, so the 2 hours before the hour H are coloured green in the column "Anticipation of the peak value" in Table 3. To have 310 an idea of the number of cases where a flood peak is forecast with an intensity close (or not) to the reference one, a triplet of values separated by slashes is indicated in the same column. These values correspond in the following order: to the total number of forecasts which forecast a flood peak, to the number of forecasts which forecast a flood peak at +/-1h of the reference peak and to the number of forecasts which forecast a flood peak at +/-1h of the reference peak and whose intensity error is less than 30%. In this case for the event of 15 October 2018, among the seventeen AROME-315 NWC forecasts used, four simulate a flood peak (runs starting between 02:00 and 05:00UTC), the four present a peak timing error of at most one hour, but only two with an intensity error of less than 30% (forecasts starting at 04:00 and 05:00UTC). In the column "Anticipation of the peak value" of Table 3, "4/4/2" is written. :: • : The peak in the reference hydrograph occurs at 06:00UTC. The first forecast that triggers a recession at the right time is that of 02:00UTC, the anticipation on the peak time is therefore 4 hours. So 4 hours are grayed in the column "An-320 ticipation of the peak timing" of Table 3. Note that the flood peak simulated with this run starting at 02:00UTC is very overestimated. The anticipation of the right intensity of the flood peak does not always coincide with the anticipation of the time of the flood peak. If, as in this example, the duration of the anticipation of the right intensity of the flood peak is shorter than that of the peak timing, it means that older runs anticipate flood peaks but with a strong over-or underestimation. On the contrary, a duration of anticipation of the good intensity of the flood peak greater than that of 325 the peak timing indicates that the predicted intensity of the flood peak was right at +/-1h but that the exact time of the flood peak was predicted only later. Note that to take into account the actual availability time of the rainfall forecasts, this time should be subtracted from the anticipation time indicated in the tables.
Among the best hydrological scenarios simulated with the AROME-NWC rainfall forecasts, the start of rising flows is anticipated at least two hours ahead, regardless of the event (Table 4). In most cases, the anticipation is more than four hours,  Table 4), the time of the first rise of discharge is most often simulated too early. This is the case of the Ognon river 335 at Pépieux where the start of the rising limb is anticipated fourteen hours in advance. Although hydrological information can be extracted from these simulations, considering them as true flood signals could yield hits but could also entail false alarms.
Among the hydrological simulations based on PIAF rainfall forecasts, the best anticipation times of the onset of increasing discharge are always greater than one hour and fifteen minutes and reach three hours and forty-five minutes on the Lauquet river at Saint-Hilaire (Table 5).

340
The colored cells in the column "Anticipation of the peak value" in tables 4 and 5 indicate the best anticipation time of flood peak intensities. For the simulations based on AROME-NWC, the maximum anticipation on the flood peak with an error on the intensity lower than 10 % (green cells Table 4), ranges from one hour to four hours according to the catchment.
Considering the forecast peaks for which the error on the peak intensity is higher (error between 10 and 30%) allows to gain up to one or two hours, as in the case of the Orbiel basin. With a tolerance of 20% error on the intensity of the flood peak, the 345 anticipation is ranging from one hour to four hours. In the simulations based on PIAF forecasts, the anticipation of the correct intensity of the flood peak (error less than 10%, green cells in Table 5) is between thirty-five minutes and two hours and fifteen minutes. Considering the anticipation on the more erroneous intensity peaks allows an additional anticipation of one hour and twenty-five minutes for the Loup river at Villeneuve-Loubet. For the simulated peaks with an intensity error of less than 30%, the possible anticipation of flood peaks with ISBA-TOP can be estimated at approximately one hour thirty/two hours, which 350 is an order of magnitude consistent with that found for a satisfactory performance of the nowcasting system during the rainfall forecasts evaluation.
For all the catchments, the peak times based on the AROME-NWC rainfall forecasts are anticipated with a maximum of one to five hours (column "Anticipation of the peak timing" of Table 4). Flood recession in the catchments affected by the 03/10/2015 case is systematically anticipated four or five hours in advance, this anticipation time is more variable for the and a half, two hours ahead at best, except for the Orbiel river at Bouilhonnac, where the advance is only twenty minutes (Table 5).
These results obtained for two major flash-flood events are therefore indicative of the promising use of AROME-NWC or PIAF rainfall forecasts for hydrological forecasting with an anticipation of peak intensity, timing, and first rise of discharge 360 that can reach several hours.

Conclusions
The Mediterranean regions are regularly exposed to heavy rainfall events, which can trigger devastating flash floods. In these situations, hydrometeorological forecasts up to a few hours are crucial for increasing the preparedness of the authorities and planning the intervention of emergency services. In the current study, the potential of two nowcasting systems, recently de-365 velopped at Météo-France, has been assessed for forecasting Mediterranean intense rainfall events and floods. Precipitation forecasts were evaluated in southeastern France for 10 past heavy precipitation events using a point-to-point approach and an areal verification over watersheds affected by floods. The availability time of the rainfall forecast, which is of non-negligible time at the nowcasting ranges (few minutes to 6h) was taken into account to consider constraints of forecasters during realtime operations. These assessments for a large area in the South-East and for the catchments affected by the floods lead to the 370 conclusion that: -PIAF is of very good quality over the first hour of forecasting, thanks in particular to the quality of the radar extrapolation. AROME-NWC is of good quality throughout its forecast, even if its skills tend to decrease slightly with the lead time.
-Heavy rain is predicted too often with AROME-NWC and PIAF.
-PIAF is of higher quality than AROME-NWC on the very first lead times. This quality deteriorates very quickly, to reach 375 a quality comparable (or even lower) than AROME-NWC beyond about 1h15/1h30 of forecast. For lead times between 2h and 3h, the performance of AROME-NWC is higher or equivalent to that of PIAF in general.
This evaluation points out the strengths and weaknesses of the nowcasting system types during extreme events, that should be considered before selecting the best method or combination for future studies or operational purposes. Depending on the accuracy of the initial radar rainfall estimation and the spatial distribution and intensity of the precipitation, radar extrapolation 380 can provide valuable nowcasting information in the very short-term forecast. However due to dynamical evolution of precipitation, especially in convective situations, there is a rapid decrease in accuracy with forecasting lead time. The above results show that blending the extrapolation of radar data with numerical prediction forecasts allows to improve the nowcasting accuracy and to extend the lead time beyond the characteristic one-hour lead time of extrapolation methods. Indeed, the use of numerical forecast helps to overcome the limitations of extrapolation at the initiation and decaying stages of convective cells and to better 385 take into account the impact of the relief on the evolution of precipitation. The use of numerical forecasts is particularly appropriate for lead times greater than 2h if the numerical prediction system has been designed for nowcasting purposes. Resulting forecasts are thus frequently refreshed with new observations and are produced with a higher time resolution and a reduced calculation time compared to the time needed to run other numerical systems.
The mean error (ME) measures the averaged error magnitude as follows: where o i and f i are the observed and forecast value respectively and N the number of forecast-observation pairs.
It ranges from −∞ to +∞ and is zero if the forecast is perfect. A positive value indicates overestimation, a negative value 410 indicates underestimation. Note with this score, strong errors in the opposite direction can compensate each other.
The root mean square error (RMSE) is defined as follows: It ranges from −∞ : 0 to +∞ and is zero if the forecast is perfect.
For binary events a categorical contingency table can be built such as Table A1. It is often used to compute categorical 415 verification scores. It ranges from 0 to 1 (1 for a perfect forecast).
-The false alarm rate or probability of false detection is defined as follows: F = P OF D = b b+d It ranges from 0 to 1 (0 for a perfect forecast).

420
-The Heidke skill score (HSS) measures the fraction of correct forecasts after eliminating the correct random forecasts: It ranges from −∞ to 1 (1 for a perfect forecast).
-Frequency bias is defined as follows: B = a+b a+c 425 It ranges from 0 to ∞.

Gerrity score
The Gerrity score (?) is based on a multi-category contingency table such as Table A2. In PIAF algorithm, six rain thresholds: 0.05 ; 0.1 ; 0.2 ; 0.4 ; 0.8 and 1.6mm/5min are considered in the multi-category contingency table.
where s ij are the elements of a matrix characterized by s ii = 1 K−1 ( It ranges from -1 to 1 (1 for a perfect forecast).

FSS
The Fractions Skill Score (FSS, ?) allows to evaluate with a certain spatial tolerance the quality of a forecast for several intensity thresholds. The idea is to compare the observed and forecast probabilities of an event in spatial windows of increasing size. The FSS is defined by: with N the number of points in the considered spatial window, p f,i the forecast fraction of grid points that exceed the threshold in this window and p o,i the observed one.
In concrete terms, the observation and forecast fields are made binary by assigning the value 1 to pixels associated with an exceedance of the studied threshold and 0 in the opposite case. A neighborhood size is set. Then, within the neighborhood of each pixel in the area of study, the fraction of observed and forecast points above the intensity threshold is calculated.
Best anticipation of the rise in flow, peak value and peak timing simulated with ISBA-TOP driven by the AROME-NWC rainfall forecast for the watersheds affected during the events of October 2015 and 2018. The green colour (respectively orange, red) indicates an intensity error on the forecast peak lower than 10% (respectively 20%, 30%) compared to the reference peak. Streamflow observations which were provided by the French HYDRO data bank (http://www.hydro.eaufrance.fr/, last access: 11 March 2020) or by HyMeX post-event surveys are for information purposes only. For more explanation on how to read the  Table 5.
Best anticipation of the rise in flow, peak value and peak timing simulated with ISBA-TOP driven by the PIAF rainfall forecast for the watersheds affected during the events of October 2015 and 2018. The green colour (respectively orange, red) indicates an intensity error on the forecast peak lower than 10% (respectively 20%, 30%) compared to the reference peak. Streamflow observations which were provided by the French HYDRO data bank (http://www.hydro.eaufrance.fr/, last access: 11 March 2020) or by HyMeX post-event surveys are for information purposes only. For more explanation on how to read the table, see subsection 3.3.