Interactive comment on “ Weather model performance on extreme rainfall events simulation ’ s over Western Iberian Peninsula ”

Abstract. This study evaluates the performance of the WRF-ARW numerical weather model in simulating the spatial and temporal patterns of an extreme rainfall period over a complex orographic region in north-central Portugal. The analysis was performed for the December month of 2009, during the Portugal Mainland rainy season. The heavy rainfall to extreme heavy rainfall periods were due to several low surface pressure's systems associated with frontal surfaces. The total amount of precipitation for December exceeded, in average, the climatological mean for the 1971–2000 time period in +89 mm, varying from 190 mm (south part of the country) to 1175 mm (north part of the country). Three model runs were conducted to assess possible improvements in model performance: (1) the WRF-ARW is forced with the initial fields from a global domain model (RunRef); (2) data assimilation for a specific location (RunObsN) is included; (3) nudging is used to adjust the analysis field (RunGridN). Model performance was evaluated against an observed hourly precipitation dataset of 15 rainfall stations using several statistical parameters. The WRF-ARW model reproduced well the temporal rainfall patterns but tended to overestimate precipitation amounts. The RunGridN simulation provided the best results but model performance of the other two runs was good too, so that the selected extreme rainfall episode was successfully reproduced.


Introduction
Both short, high-intensity and prolonged, low-intensity rainfall events can play a key role in catchment-scale runoff generation and associated phenomena such as flooding risk.Flood generation processes have been described by numerous authors (e.g.Chow et al., 1998).Infiltration-excess runoff generation, when rainfall intensity exceeds the infiltration capacity of soils, can be linked with flash floods in small headwater catchments.Saturation-excess runoff generation, when large amounts of rainfall cause soils to become saturated and prevent further infiltration, can be associated with prolonged Figures

Back Close
Full floods at larger spatial scales.The characteristic spatio-temporal scale of infiltrationexcess runoff is small, ranging from minutes to hours and 1-100 km 2 , whilst the scale of saturation-excess runoff is typically related with that of storm systems and weather fronts, ranging from hours to days and at regional scales exceeding 100 km 2 (Skøien and Bl öschl, 2003).An analysis of the rainfall events leading to flooding must therefore take these spatial and temporal scales into account.In the case of Mediterranean-type catchments, as studied here, the maximum rainfall intensity during 30 min (I30) has been indicated by several authors (e.g.Castillo et al., 2003;Kirkby et al., 2005) as critical for surface runoff generation.
Runoff studies generally use precipitation measurement data as input for analysis and modeling (e.g.Singh and Frevert, 2002).Although point rainfall measurements by ground stations are considered to be reliable, they tend to be sparse and highly variable in space as well as over time (AghaKouchak et al., 2010).In Portugal, ground rainfall stations have a spatial coverage from 7 stations per 1000 km 2 in the south to 10 stations per 1000 km 2 in the northt (http://snirh.pt/snirh/download/relatorios/redes texto sul.pdf).Geostatistical interpolation techniques have become widely used in hydrological applications to produce precipitation maps with elevated spatial resolution from much sparser rain gauge data (e.g.Dirks et al., 1998).However, advances in computer technology now allow employing numerical weather prediction (NWP) models for simulating precipitation processes with a spatial and temporal resolution that is adequate for many hydrological applications.The current NWP resolutions of around 1 km 2 and 15-30 min match the above-mentioned, critical space and time scales identified of Mediterranean-type catchments.Due to their physical basis, NWP also allow to explicitly test the current understanding of the key meteorological processes and provide a more solid foundation for the explanation of the meteorological measurements.
In the study region, the hydrological-erosion response of six experimental catchments is being monitored Fernandes et al., 2010;Rial-Rivas et al., 2011;Campos et al., 2012;Machado et al., 2012), using the Pousadas meteorological station as reference station for high-quality, local rainfall records.. Nonetheless, data gaps can Introduction

Conclusions References
Tables Figures

Back Close
Full hardly be avoided altogether, as was the case for the extreme rainfall event selected for this study (due to battery failure as a result of a prolonged period without significant recharge by the solar panel).In the present case, however, the usefulness of the two existing radar stations is somewhat doubtful.Firstly, the study area lies a considerable distance from the nearest radar station (ca.250 km), whereas the agreement between radar-based estimates and point measurements was found to decrease with increasing distance (Sebastianelli et al., 2010).Secondly, the study area is mountainous, and mountains may introduce errors in the radar-based precipitation estimates by physically obstructing the radar's effective coverage (Pellarin et al., 2002).
The motivation for this paper emerged from the need of having precipitation fields which would later be used in runoff applications.For this particular region the precipitation pattern over the mountain region is not well known partly due to the sparsely of the rain gauges, the lack of radar-based information as well the possible precipitation gradients induced by the orography.This study evaluated the model performance, in simulating an extreme rainfall episode, over a complex orographic region in an attempt to check the model's suitability for providing those estimates of precipitation fields and time series, for that, three experiments were performed to test if the data assimilation, for a defined location, or the grid nudging technique would yield better results.

Study area and case study
The study area spanned a mountainous region in north-central Portugal (Fig. 1).The climate is wet Mediterranean, with a mean annual rainfall ranging from 800 mm at the littoral zone to 2300 mm in the inland mountains due to the marked influence of topography on spatial rainfall patterns.Within the study area, especially the Águeda catchment is well-known for its flooding risk to the old city centre of Águeda (Figueiredo et al., 2009).Introduction

Conclusions References
Tables Figures

Back Close
Full The present analysis focused on the month of December 2009, combining an exceptional amount of rainfall with the occurrence of various gaps in the records of the Pousadas meteorological station.The existing rainfall stations in the region recorded monthly totals, for December, that were, on average, about 88 % above their long-term median values (INAG, 2011) and, as such, corresponded to the stations' 54 to 95 percentiles for December (Table 1).A more detailed comparison at the (sub-)daily scale was only possible for the Santa Comba D ão station (code S18SCDC2) (Table 2) but indicated that the December 2009 values corresponded to return periods of less than 2 yr for short-term rainfall durations (< 3 h) and just over 2 yr for longer durations.Thus, the high rainfall of December 2009 could be attributed to a greater number of rainfall days rather than to more intense precipitation events.

Model setup
The regional meteorological model used in this study is the Weather Research and Forecasting (WRF) Model with Advanced Research WRF (ARW) dynamic core version 2.2 (Skamarock et al., 2008).WRF is a next-generation, limited-area, nonhydrostatic mesoscale modeling system, with vertical terrain following eta-coordinate designed to serve both operational and forecasting as well as atmospheric research needs.The WRF-ARW model has been widely used for simulating precipitation processes, both in forecast (Deb et al., 2010;Weisman et al., 2008) and in diagnostic mode (Liu, 2012;Lou and Breed, 2011;Bukovsky and Karoly, 2009).It has also been successfully used in Portugal, in a test of sensitivity to parameterizations of two different model operational configurations (Ferreira et al., 2010).
The WRF-ARW model was forced with the analyzed meteorological fields of the Global Forecast System (GFS), from the United States of America's National Center for Environmental Prediction (NCEP), using the 6-hourly field for the entire December 2009 month.The GFS model has an approximated horizontal resolution of 0.5 • × 0.5 • and the vertical domain extends from a surface pressure of 1000 hPa to Introduction

Conclusions References
Tables Figures

Back Close
Full 0.27 hPa, discretized in 64 vertical unequally-spaced sigma levels, from which 15 levels are below 800 hPa and 24 levels are above 100 hPa.
The WRF-ARW model was configured with three nested domains, with resolutions of 25 km, 5 km and 1 km, for the parent, the middle and the inner domains, respectively.The finer grid domain is centered over Pousadas (40.63 • N, 8.31 • W) represented by 161 × 106 cells in west-east and north-south direction, respectively (see Fig. 1).

Experimental design
Three numerical experiments, corresponding to integrations with one month of duration, were made for December of 2009, starting at 00:00 UTC 1 December 2009 to 00:00 UTC 1 January 2010.They were forced by the GFS analysis field's comprising a continuous integration of the six hours analysis fields with a single initialization of the 00:00 UTC 1 December analysis's field, as is common practice in the numerical Introduction

Conclusions References
Tables Figures

Back Close
Full weather prediction experiments (Lo et al., 2008).In order to test for improvements in the model simulations the nudging technique was applied (Skamarock et al., 2008).
Nudging is a method that keeps simulations close to the analysis and/or observations (input fields) over the course of integrations.In the WRF-ARW, there are two types of nudging that can be used separately or combined.One is the observational or single location nudging that forces the simulation towards observational data.The other is the grid point or analysis nudging which forces the model simulation towards a series of analysis grid-point by grid-point.
In this study, nudging was carried out to individual observations over Pousadas location (RunObsN), in order to evaluate the impact of local circulation in the computation of model precipitation, as well to the grid points (RunGridN).In doing so, it was investigated the impact of 3-D analysis nudging to constrain the large-scale circulation within the mesoscale model.The nudging was applied to the entire atmospheric column, except the planetary boundary layer, to wind, temperature and humidity meteorological variables.

Rainfall measurements
To assess the model performance, a set of 27 existing rainfall stations from the SNIRH (INAG, 2011) were selected for this study (Fig. 1, Table 3).The data were checked for gross errors, like mistyped rainfall amounts and then compared with buddy stations to ensure that the rainfall amounts were consistent between stations with similar characteristics.

Assessment model performance
The WRF-ARW 1 km gridded one hour precipitation can be verified either directly against the observations, at location, or against a gridded analysis of the observations.In this study, the observations and the model precipitation are represented on a nonmatching grid.Thus, to overcome this difficulty, one chose to compare the nearest grid Introduction

Conclusions References
Tables Figures

Back Close
Full point precipitation to the station location, ignoring the correspondent error on location.
There is not a consensual strategy concerning direct verification, i.e., "truth" observations and model precipitation.Rossa et al. (2008) elaborated a survey with several different strategies for using rain gauge data, including some unreported studies which had shown that verification using the nearest grid point gives very similar overall results.In this study, direct verification was carried out point to point.This procedure was performed for the each of the precipitation grids from the three model experiments, RunRef, RunObsN and RunGridN.
Analysing the observed time evolution of the December precipitation for the entire set of stations, five precipitation episodes were identified.In Fig. 2 the December time series for the observed precipitation as well the RunRef, RunGidN and RunObsN are depicted with the wet episodes highlighted by a red box.The initial 12 h period of simulation is considered the model spin-up time and the model results were excluded from the data analysis.
Model performance is evaluated by comparing simulated with measured hourly rainfall above a minimum 0.1 mm h −1 threshold.The procedure consisted of looking in the observations series for values above 0.1 mm h −1 and then matched the respective pair in model outputs ensuring that the same number of hours is analysed.Following Murphy and Winkler (1987) and Jolliffe and Stephenson (2003), the following basic statistics were computed: mean error (ME), mean absolute error (MAE), mean squared error (MSE), and root (RMSE) and multiplicative bias (Mbias).Though, similar to the traditional measure of bias, the authors chosen to use the multiplicative bias value because the latter is best suited for quantities that have zero for lower and/or upper bound.
The multiplicative bias (Mbias) is defined as the ratio of the mean estimated value to the mean observation value.In addition, the MSE skill score (Jolliffe and Stephenson, 2003;Wilks, 2006) was calculated to relate model errors to the persistence in hourly rainfall records as evidenced by the stations' existing records.Worth stressing is perhaps that the model results were neither rescaled nor transformed.Introduction

Conclusions References
Tables Figures

Back Close
Full 3 Results and discussion

Observed and modelled precipitation characteristics
In Fig. 2 is presented the total rainfall amounts over December 2009 for all stations.
Most stations showed good agreement with the observations but the stations S02, S25 and S27 clearly depart from the observations.
The observed and modelled hourly rainfall amounts during December 2009 were shown in Fig. 2. Five rainfall periods were identified in the observed data, encompassing days 1-2, 4-6, 14-17, 19-25 and 27-31.Each one of the rainfall episodes was preceded and followed by a 12-h dry period.These five periods were reproduced well by all the three model runs but the maximum observed intensity (30 mm h −1 of precipitation at S08QUEC4 station) was not.For the RunGridN experiment the majority of the series simulated a week wet event ranging from 0 mm h −1 to 5 mm h −1 , for the 27th day, that none of the others reproduce.In general, the three model runs tended to underestimate rainfall intensities.The frequency distributions of the observed and modelled hourly rainfall amounts were shown in Fig. 3, with the whiskers of the box plot corresponding to three times the interquartile range (IQR) to emphasize extreme values.The frequency distributions were strongly asymmetric.The different model runs showed the same observed asymmetry, with median values in the range of 0.3 mm h −1 to 1.7 mm h −1 , and thirdquartile values in the range of 0.3 mm h −1 to 5.7 mm h −1 .Also, the bulk of the stations revealed quite a lot extreme values, corresponding to the points lying out in the three times the IQR area (Fig. 3).The pronounced intra-variability in the observed rainfall data reinforced the atypical nature of the 2009 December month, as mentioned earlier, and this was well represented by the three model runs.For observations only, the variability results are supported by the standard deviation values presented in Table 3, with the majority of the individual standard deviations between 1.0 mm h −1 to 2 mm h −1

intervals. Introduction Conclusions References
Tables Figures

Back Close
Full The correlations (not shown) between the observed and modelled hourly rainfall amounts were calculated for the individual stations as well for the combined stations.The correlation coefficients were low for all three model runs, ranging from 0.1 to 0.2 for the individual stations and 0.2 for the aggregate station.The weak associations between the observed and model data could be due to small spatial-time misplacing (Rossa et al., 2008).The strength among data pairs is shown in the scatterplot graph (Fig. 4).The degree of "scatter" in the plot is considerable; the data are widely spread, pointed out to a weak relationship between variables.

Model assessment
The model assessment statistics for the individual rainfall stations were shown in Fig. 5, with the average values for all stations combined being given in the respective legend boxes.The mean error for all station together (ME, Fig. 5a) was negligible in the case of the RunRef and RunObsN model runs (−0.01 and −0.02-−0.01mm h −1 , respectively) as opposed to in the case of the RunGridN model run (−0.20 mm h −1 ).In general, there was a good agreement between the ME's of the individual stations for the different runs, meaning that, all pairs of observations and model point precipitation show a similar behaviour for the different experiments.Individually, the average station error exceeds the respective experiment reference value and the one given by the control run.The Run-GridN station mean error shows a better agreement between the individual mean error value behaviour and the respective reference value (ME RunGridN = −0.20 mm h −1 ).A few stations show a positive mean error some of which with magnitudes exceeding the 1.5 mm h −1 .These results could indicate that the reference mean error for each experiment was achieved through compensating errors.This evidence is supported by the Mbias (Fig. 5b) in which is clearly a strong average error contribution from the stations mentioned early.For those stations the model rain is almost twice than the observed.This behaviour was closely followed by some stations.In Fig. 5c are displayed the average magnitude of the errors represented by the mean absolute error.The RunGridN experiment was the most accurate of the 9172 Introduction

Conclusions References
Tables Figures

Back Close
Full three experimental simulations, with a 1.55 mm h −1 of MAE when compared with the 1.62 mm h −1 for the reference run (RunRef).Three stations (S02, S25 and S27) show larger discrepancies than the remaining ones.Still, a few stations (S04, S12, S15, S16, S17 and S23) show an average magnitude error smaller than the reference run and that of the remaining stations.The average magnitude of the error departs considerably from the perfect zero value.
For the control run (RunRef) the average difference, between the forecast and the observed precipitation, was of 1.62 mm h −1 .The results for the MSE (Fig. 5d) agree with the ones for the MAE (Fig. 5c), with the same three stations as the less accurate and with the RunGridN (6.18 mm 2 h −2 ) scoring better than the RunRef (6.89 mm 2 h −2 ).
Since the MSE is more sensitive to larger errors than the MAE score it is reasonable to assume that the discrepancies between model and observations are increasingly large for S02, S25 and S27.Moreover, looking at MSE results for the remaining individual station scores it can be noted that they are more accurate than the aggregated statistic value.The RMSE (Fig. 6e, f) and the MAE (Fig. 6c) may be interpreted together to explain the variation of the model errors.Figure 5f shows the model skill in simulating the precipitation.The MSE score was chosen to construct the skill score.The skill score measures the percentage of improvement of the model over a reference system, in this case, the climatology.The model show a good percentage of improvement over the reference system but, in the case of the S01 and S02 station negative skill reveals that the model is less accurate than the persistence (Murphy, 1988).This poor skill score obtained for the aggregate Introduction

Conclusions References
Tables Figures

Back Close
Full statistics for the three experiments indicates that the model has lower accuracy than reference system.The S01, S02, S03 and S25 station poorly score which can be seem in the ME results.Clearly, the set ME and skill for the S01 station is the product of cancelation errors.For the S27 station the model performs better than the persistence with a good percentage of improvement but the mean error is too high.These results resemble the one achieved by S02.The difference could be in the persistence or in the orography.It can be explain as a transition zone in the relief.As Brooks and Doswell (1996) pointed out the RMSE rewards more consistent simulations, from two with the same MAE.This is due to sensitivity of RMSE to large errors.

Spatial representation of the error
The model performance statistics of the individual stations were plotted as isoline maps in Fig. 6.The mean errors pattern (ME) and the multiplicative bias pattern (Mbias) are similar (Fig. 6a and b).The RunRef and the RunObsN simulations, represented by the red and blue contour lines, respectively, show a resemblance pattern, whereas the nudging experience (RunGridN), represented by the green contour lines, differs markedly from the other two.However, the variation of the average magnitude of the error is the same: it is on average larger in areas of high slopes and small in the lowlands.The Fig. 6c shows the spatial distribution of the mean squared difference between the forecasts and observations as indicated by the MSE index.The pattern shows a magnitude error higher at the majority of the stations and lower in the areas of rugged terrain (top right corner).The spatial distribution of the RMSE (Fig. 6e) is similar to the one depicted by the ME and MAE feature.

Pousadas
Extending the analysis of Fig. 6 to the Pousadas location, it shows a small mean error and multiplicative bias, indicating a good agreement between the model results and what has been observed (Fig. 6a and b).The MAE range is comprised between the 0.5 to 1 mm h −1 (Fig. 6c), the MSE value (Fig. 6d) is 5 mm 2 h −2 and the RMSE contour errors (Fig. 6e) for that location is 2.5 mm h −1 .
Looking to the displayed maps on Fig. 6 it is possible to select a few surrounding locations for best inferring the precipitation at Pousadas location.The observed and predicted cumulative rainfall amounts were shown in Fig. 7 for the period without missing data, i.e. till 25 December.
In Table 4 are presented the statistical measures and errors as well the model skill applied to the Pousadas station.The predictions of the RunRef model run agreed best with the observed data in terms of both ME and Mbias, the RunObsN and especially the RunGridN model runs produced better skill values than the RunRef run.RunGridN revealed a 78 % improvement compared to the reference system based on the meteorological data during the study period).

Conclusions
The performance of the WRF-ARW model in simulating the precipitation over a high slope terrain was investigated.One simple and subjective set of measures was used to judge the performance of model for a specific time event and location.Since rainfall depends strongly on the atmospheric motion, moisture content, and physical processes, the quality of the model in reproducing the rainfall can be used as a measure of the quality of the model.
The overall bias showed a good correspondence between the mean forecast and the mean observation so the model can be considered reliable and consistent.that occurred during the December month.However, the degree of association between model rain and observations was weak.This result can be explained with small errors in the location or timing of the rain episodes.Rossa et al. (2008), stated that, for shorter accumulation periods, which is the case, small position errors can lead to the 'doublepenalty' problem.The problem arises when the verification measure tends to penalize instead of reward the ability of the model in giving information on small scales.Further, these results support the choice of the simulation domain as well achieved in simulating the month precipitation where one has to pay attention to problems with discontinuities due to orographic complexities.Citing Ballester and Mor é (2007), "On the other hand, if the areas are large enough, weather forecast uncertainties derived from spatial precipitation discontinuities can more easily be avoided, especially in areas with complex orography and a high frequency of convection phenomena".
For the majority of the indices the 3-D nudging experiment (RunGridN) scored better than the local nudging (RunObsN) and the no nudging (RunRef) run.Also, Individual station scored better than the average all stations statistics.The results obtained in the three simulations show a wet bias indicating that on average the model underestimates the observed precipitation.This feature was also present in the majority of the stations.The accuracy was best achieved in the lowlands and highlands but the area with rough terrain and deep valleys tends to be less accurate.
The single initialization of the atmospheric fields with the regular update of the lateral conditions proofs to be well applied to the study event and area.Also, this result is consistent with the one present by Lo, Yang and Pielke (2008).
The model showed skill in reproducing the majority of the individual series of precipitation thus showing its value.
The Pousadas scores and model skill values are seen as encouraging in the replacement of the missing data with the ones produce by the model, in this case by the RunGridN experiment.However, proceeding with caution is necessary.Introduction

Conclusions References
Tables Figures

Back Close
Full  Full  Full  Full  Full Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | It showed markedly higher values in the areas of steep than flat terrain.The skill pattern is represented in Fig. 6f.All of the three experiments show similar spatial features.The model was found to perform less Discussion Paper | Discussion Paper | Discussion Paper | The model system has resolution because it clearly separated the different precipitation events Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Dudhia, J.: Numerical study of convection observed during the winter monsoon experiment using a mesoscale two-dimensional model, J. Atmos.Sci., 46, 3077-3107, 1989.Grell, G. A. and Devenyi, D.: A generalized approach to parameterizing convection combining ensemble and data assimilation techniques, Geophys.Res.Lett., 29, 1693Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Screen / Esc Printer-friendly Version Interactive Discussion Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |
Both are measures of accuracy and both aggregated into a single measure the individual differences between what is simulated and what is observed.Both can be interpreted similarly to the standard deviation.Since RMSE squares the error prior to averaging, this statistics gives too much weight to large errors and it is bounded below by the MAE value.Looking for the RMSE score for the

Table 1 .
Long term precipitation monthly data for some of the meteorological stations used in this work.

Table 3 .
Rainfall stations used for assessing model performance, including a summary of their hourly rainfall records statistics, above 0.1 mm h −1 , for the entire December 2009.

Table 4 .
Statistical measures and model verification with model skill for the Pousadas location for the hourly precipitation.