A statistical approach for rain intensity differentiation using Meteosat Second Generation–Spinning Enhanced Visible and InfraRed Imager observations

This study exploits the Meteosat Second Generation (MSG)–Spinning Enhanced Visible and Infrared Imager (SEVIRI) observations to evaluate the rain class at high spatial and temporal resolutions and, to this aim, proposes the Rain Class Evaluation from Infrared and Visible observation (RainCEIV) technique. RainCEIV is composed of two modules: a cloud classification algorithm which individuates and characterizes the cloudy pixels, and a supervised classifier that delineates the rainy areas according to the three rainfall intensity classes, the non-rainy (rain rate value< 0.5 mm h−1) class, thelight-to-moderate rainyclass (0.5 mm h−1 ≤ rain rate value< 4 mm h−1), and theheavy– to-very-heavy-rainyclass (rain rate value ≥ 4 mm h−1). The second module considers as input the spectral and textural features of the infrared and visible SEVIRI observations for the cloudy pixels detected by the first module. It also takes the temporal differences of the brightness temperatures linked to the SEVIRI water vapour channels as indicative of the atmospheric instability strongly related to the occurrence of rainfall events. The rainfall rates used in the training phase are obtained through the Precipitation Estimation at Microwave frequencies, PEMW (an algorithm for rain rate retrievals based on Atmospheric Microwave Sounder Unit (AMSU)-B observations). RainCEIV’s principal aim is that of supplying preliminary qualitative information on the rainy areas within the Mediterranean Basin where there is no radar network coverage. The results of RainCEIV have been validated against radar-derived rainfall measurements from the Italian Operational Weather Radar Network for some case studies limited to the Mediterranean area. The dichotomous assessment related to daytime (nighttime) validation shows that RainCEIV is able to detect rainy/non-rainy areas with an accuracy of about 97 % (96 %), and when all the rainy classes are considered, it shows a Heidke skill score of 67 % (62 %), a bias score of 1.36 (1.58), and a probability of detection of rainy areas of 81 % (81 %).


Introduction
A wealth of techniques based on geostationary satellite IR/VIS observations have been developed in order to estimate rain rate (RR) values or confidences.A recent overview is given by Kidd and Levizzani (2011).The geostationary satellite techniques perform better over areas where rainfall originates from deep convection than in the areas where it originates from the stratiform systems.In particular, Negri and Adler (1981) examined the relation between cloud top temperature and RR by analysing Geostationary Operational Environmental Satellite (GOES) and radar data associated to a series of thunderstorms.Adler et al. (1985) proposed a thunderstorm index (TI) to give probability to observe heavy precipitation.Successively, Adler et al. (1988) extended their interest to stratiform precipitation (produced under the anvils of mature and decaying convective systems) from GOES satellite infrared data.Wu et al. (1985) used GOES data in order to estimate rainfall by means of Published by Copernicus Publications on behalf of the European Geosciences Union.

E. Ricciardelli et al.:
A statistical approach for rain intensity differentiation a pattern recognition algorithm trained and tested on different sets of RR measurements obtained from NOAA operational radars.They classify rain into three classes (non-rainy, light rainy, heavy rainy classes).Adler et al. (1993) were the first to successfully combine the advantages of both types of instrument by using matched MW and IR data.Vicente et al. (1998) introduced the auto-estimator in order to estimate rainfall from GOES measurements focusing on heavy precipitation.The auto-estimator differs from the previous IR methods for rainfall estimation because it considers other factors in addition to the IR window cloud top temperature.In particular, information about environmental moisture is used to obtain a more correct estimation of rainfall as well as for the screening of the non-rainy pixels.Ba and Gruber (2001) used the GOES visible (0.65 µm), near-infrared (3.9 µm), water vapour (6.7 µm) and window channels (10.7 and 12.0 µm) to estimate rainfall rate, distinguishing raining from non-raining clouds by taking into account the cloud top temperature, the effective radius of cloud particles and the temperature gradient.Moreover, in an attempt to give more reliable values of rain rates, Ba and Gruber (2001) used the moisture factor correction developed by Scofield (1987) and modified by Vicente et al. (1998).Other authors used artificial neural networks to derive precipitation estimates using satellite IR images (Hsu et al., 1997;Behrangi et al., 2009;Capacci and Porcù, 2009).Many authors developed techniques to determine RR from Meteosat data, both physical and statistical.Physical techniques consist of brightness temperature difference threshold tests or consider effective radius as well as cloud top height/temperature in order to determine rainfall rate and/or probability by the use of lookup tables.The look-up tables are usually built by considering rainfall measurements obtained through rain-gauge instruments or radar as well as RR values determined by MW data.An example of an IR method that uses RR values determined by MW observations was developed by Jobard and Desbois (1994), the RAin and Cloud Classification method (RACC), that used the SSM/I and Meteosat data in order to classify the Meteosat images into several categories of rain.Turk et al. (2000) proposed a blended geostationarymicrowave technique for the retrieval of RR measurements.This technique has been taken as a role model by several investigators (Kidd et al., 2003;Marzano et al., 2004), including Heinemann et al. (2002) who developed the Multisensor Precipitation Estimate (MPE) technique operating at the European agency for the deployment of meteorological satellites (EUMETSAT).The MPE product consists of the near-real-time RR maps for each Meteosat Second Generation (MSG)-Spinning Enhanced Visible and Infrared Imager (SEVIRI) image in original pixel resolution.Moreover, recently Mugnai et al. (2013) implemented the blended technique by Turk et al. (2000) among the precipitation products of the Satellite Application Facility on Support to Operational Hydrology and Water Management (H-SAF).Roebeling and Holleman (2009) proposed an algorithm for the RR estimation from the cloud physical properties (such as cloud condensed water path and cloud top height) retrieved from SEVIRI observations.Kühnlein et al. (2011) also investigated the SEVIRI potential to determine RR, assuming a relationship between RR and optical thickness as well as effective radius.In particular they have established a relation between the reflectance observations acquired at 0.6 and 1.6 µm, which give information about cloud optical thickness and effective radius, and the ground-based rainfall rate.Recently, Feidas and Giannakos (2012) proposed an algorithm that works with SEVIRI observations by combining physical and statistical methods to characterize convective and stratiform precipitation areas.They calibrated the algorithm using RR measurements derived from a substantial number of rain gauge stations in Greece.Other techniques are based on cloud motion and exploit IR observations to provide an estimate of cloud movement to be used for transporting the more direct MW rainfall observations (Joyce et al., 2004). Di Paola et al. (2012) proposed the Precipitation Evolving Technique (PET) for convective rain cell continuous monitoring.PET propagates forward in space and time the latest RR map inferred by AMSU and MHS MW observations by using SEVIRI IR brightness temperature maps.This technique is able to propagate the latest rain field available for 2-3 h.The aim of this study is to propose a technique based on a statistical classification algorithm that uses the spectral and textural features of SEVIRI IR/VIS observations to classify the cloudy pixels as non-rainy, light-tomoderate-rainy, or heavy-to-very-heavy-rainy. The technique proposed, the Rain Class Evaluation from Infrared and Visible observations (RainCEIV), operates in a fixed area, the Mediterranean Basin, approximately between 35 and 50 • N, and 20 • W and 20 • E. RainCEIV firstly discriminates cloudy from non-cloudy pixels, then it determines the rain class only for the pixels classified as cloudy.It deploys the knearest neighbour mean classifier (k-NNM) which considers as input the spectral and textural features derived from the SEVIRI VIS/IR images and the brightness temperatures differences of SEVIRI water vapour channels acquired 15, 30, and 45 min before the time of interest.RainCEIV has been validated against the radar-derived RR values obtained from the Italian Operational Weather Radar Network observations managed by the Italian Department of Civil Protection (DPC).RainCEIV is proposed as a useful tool to achieve a real-time monitoring of rainfall events, both the intense convective and the stratiform moderate ones.
Section 2 provides a description of the satellite sensors whose observations and/or products have been used for the RainCEIV implementation; Sect. 3 describes the two modules of RainCEIV (the C_MACSP cloud classification algorithm and the RainCEIV k-NNM classifier); Sect. 4 shows the statistical scores obtained by comparing RainCEIV and radar-derived RR measurements.

Instruments and data description
The spectral and textural features of MSG-SEVIRI images are used as input for both the C_ MACSP cloud classification algorithm and the RainCEIV k-NNM classifier.SE-VIRI is the main payload on board the MSG series, composed of MSG-1 (Meteosat 8), MSG-2 (Meteosat 9), MSG-3 (Meteosat 10), and future MSG-4 (Meteosat 11), planned for launch in 2015.SEVIRI is a 50 cm diameter aperture line-byline scanning radiometer and observes the earth-atmosphere system in 11 channels at (a) full disk with a 3 km spatial sampling at the sub-satellite point.In addition, the High-Resolution Visible (HRV) channel covers half the full disk with a 1 km spatial sampling at the sub-satellite point.The actual instantaneous field of view is about 4.8 km at the subsatellite point for all the channels except for the HRV channel, where it is 1.67 km.The major improvements with respect to previous sensors are its enhanced spectral characteristics, its higher temporal resolution (15 min), the improved signal-to-noise ratio, and the higher precision of data storing which ranges from 8 bits (256 levels) on Meteosat-7 to 10 bits (1024 levels) on Meteosat-8 (Schmetz et al., 2002).
The RainCEIV k-NNM classifier has been trained on the RR product from the Precipitation Estimation at Microwave Frequencies (PEMW).PEMW was developed by Di Tomaso et al. ( 2009) at the Institute of Methodologies for Environmental Analysis of the National Research Council of Italy (IMAA-CNR) to infer surface rain intensity from satellite MW LEO observations provided by the Advanced Microwave Sounding Unit-B (AMSU-B) and the Microwave Humidity Sounder (MHS) on board the National Oceanic and Atmospheric Administration (NOAA) satellites and the European Polar Satellite MetOp-A, respectively.AMSU-B and MHS are cross-track, line-scanning MW radiometers which measure radiances in five channels in the 89 GHz to 190 GHz frequency range.The centre frequencies for the two window channels are 89 GHz, 150 GHz, while the three opaque (water vapour) channels are centred at 183 ± 1, 183 ± 3, and 183 ± 7 GHz.The AMSU-B and MHS fields of view (FOV) have a circular shape (with a diameter of about 16 km) at nadir, while their shape becomes ellipsoidal away from the nadir (the axes length is 51 km for the cross-track direction and 25 km for the along-track direction at the maximum scanning angle) (Bennartz, 2000).The purpose of these instruments is to measure the radiation from different layers of the atmosphere in order to obtain global data on humidity profiles.The PEMW RR value is assigned to the SEVIRI pixel only when the latter is entirely enclosed in the corresponding AMSU-B/MHS FOV.PEMW RR values are resampled on the SEVIRI grid by calculating the area of each AMSU-B/MHS FOV on the basis of the orbital parameters described in (Bennartz, 2000).The temporal matching is carried out considering a maximum difference of 7.5 min between the acquisition time of the SEVIRI pixel and that of the AMSU/MHS FOV.For simplicity, the SEVIRI pixel, to which the PEMW-RR value is assigned, will be denominated PEMWinSEVIRI while the corresponding PEMW-RR value will be denominated PEMWinSEVIRIv.
The RainCEIV results have been validated on the basis of the RR values derived from the Italian Weather Radar Network which is coordinated by DPC (Vulpiani et al., 2008) in collaboration with the regional authorities, the research centres, the Air Traffic Control service (ENAV), and the Meteorological Service of the Italian Air Force (CNMCA).It consists of 20 microwave weather radars belonging to the regional authorities (10 C-band radars), ENAV (2 C-band radars) and DPC (6 C-band radars and 2 X-band polarimetric radars).The surface rate intensity (SRI, in mmxh −1 ) and other products such as the Vertical Maximum Intensity (VMI), the constant altitude plan position indicator (CAPPI) and the 1-hour-accumulated surface rain total (SRT, in mm), are retrieved from measured reflectivity volumes.Procedures for mitigating ground clutter, an anomalous propagation, beam-blockage effects are applied (Vulpiani et al., 2008).The SRI product is derived applying a reflectivity-rainfall (Z-R) relationship to the Lowest Beam Map (LBM), in other words the reflectivity values at the lowest level of the corrected radar volumes.The SRI product used here represents the best estimate from the radar network available for the period under analysis, and it has already been used to validate satellite rainfall estimates (Cimini et al., 2013), including EUMETSAT H-SAF products (Puca et al., 2014).Procedures to improve the quality of the SRI product, including attenuation compensation, polarimetric rainfall inversion techniques, and adaptive algorithms to retrieve the mean vertical profiles of reflectivity have recently been developed at DPC (Vulpiani et al., 2012;Rinollo et al., 2013).All the products are available on a grid of 1400 × 1400 km 2 , with a spatial resolution of circa 1 km and a temporal resolution of 15 min.For simplicity, the radar samples completely included into the SEVIRI pixels will be denominated RS samples.The collocation process of the radar-derived RR measurements into the SEVIRI grid consists in associating the RS samples to each SEVIRI pixel.If the percentage of rainy RS samples is higher than 80 %, the SEVIRI pixel is considered for the validation and classified as light-to-moderaterainy or heavy-to-very-heavy-rainy on the basis of the RS-RR value average.In some cases, the RS-RR value average is strongly influenced by the lowest RR values of the lightto-moderate-rainy RS samples, if the number of heavy-tovery-heavy rainy RS samples is higher than that of the lightto-moderate-rainy one.Because of this, when the percentage of the heavy-to-very-heavy-rainy RS samples is higher than 50 % and it is higher than that of the light-to-moderate-rainy RS samples, the SEVIRI pixel is flagged as heavy-to-veryheavy-rainy, regardless of the RS-RR value average.If the percentage of the non-rainy RS samples is 100 %, the SE-VIRI pixel is considered for the training and validation.In the other cases, the SEVIRI pixel is flagged as "uncertain" and not considered for the training and validation purposes.
For simplicity, the pixel SEVIRI, to which the radar-derived-RR value is assigned, will be denominated RADARinSE-VIRI, while the corresponding RR value will be denominated RADARinSEVIRIv.

RainCEIV description
The RainCEIV technique consists of two modules: -a cloud classification algorithm that discriminates clear from cloudy pixels and further classifies the cloudy pixels -a k-nearest neighbour mean (k-NNM) classifier that evaluates the rain class for each pixel classified as cloudy by the first module.

Cloud classification algorithm description
The threshold test involving the difference between the brightness temperature of the SEVIRI water vapour channel centred at 6.2 µm and of the SEVIRI window channel centred at 10.8 µm, TB 6.2 µm−10.8µm .This difference is very small for convective cloud as asserted by Mosher (2001Mosher ( , 2002) ) in the global convective diagnostic approach.The C_MACSP statistical (temporal) algorithm considers as input the same spectral and textural features described and listed in Sect. 3.2.1 (Sect. 3.4) and Table 4 (Table 7) The clear and cloudy pixels were selected manually after observing the spectral characteristics in SEVIRI IR/VIS images as well as in their RGB composition, a useful practice for distinguishing cloudy classes (Lensky and Rosenfeld, 2008) When both the RADARinSEVIRI pixel and the PEMWin-SEVIRI pixel are available and the relations at points 2 and 3 are not satisfied, the SEVIRI pixel is not considered for the initial training data set.The SEVIRI images listed in Table 5 of Ricciardelli et al. (2008), and in particular the ones used for the training of the Mediterranean Basin (enclosed in the areas B, C, and G of Fig. 3 of Ricciardelli et al., 2008) 5 and 6).

k-nearest neighbour mean classifier description
The classifier pattern used to evaluate the rainy class is the k-nearest neighbour mean (k-NNM) non-parametric supervised classifier proposed by Viswanath and Sarma (2011).This classifier has been chosen for its simplicity and good performance (Dasarathy, 1991(Dasarathy, , 2002;;Babu and Viswanath, 2009) and because, unlike the Bayes classifier, it does not assume any a priori known probabilities, which are estimated directly from the design samples.It implements the decision rule locally.The k-NNM classifier has demonstrated to perform better than the k-NN classifier and it is suitable for parallel implementation so as to reduce the classification time, as asserted by Viswanath and Sarma (2011).Let x be the vector of features related to the pixel to be classified and C i the rainy/non-rainy class with i = 0, 1, 2 defined as follows: For each class C i the k-NNM classifier finds the k (where k ≥ 1) nearest neighbours of x and determines the mean value d mean (xC i ) of their distances (d(xx i,j )) from x.
where d(xx i,j ) is the Euclidean distance between x and x i,j which is the j th nearest training sample for the class C i .The pixel is labelled as the class characterized by the lowest mean distance d mean (xC i ): Fig. 1 shows the scheme of the RainCEIV technique.

Features selection and description
The k-NNM classifier uses textural and spectral features estimated in 3 × 3-pixel boxes in order to associate each SEVIRI pixel to a rainy/non-rainy class.The textural and spectral features used in this study and their different weights in the grid element, where both textural and tonal features have significant values, are described in Ricciardelli et al. (2008).In detail, the spectral features used are the maximum and minimum grey levels and the ratio between them.The textural features considered are the maximum and the minimum of the Entropy (a measure of the spatial randomness of the image), the angular second moment (ASM, a measure of homogeneity of the image), the contrast (a measure of local variation of the grey-level differences) and the mean (a measure of the mean grey-level differences).The maximum and minimum values are calculated among the values calculated for the four directions (0, 45, 90, 135 • ) in the 3 × 3-pixel box.All the spectral and textural features defined for the IR/VIS SEVIRI images acquired at 0.6, 0.8, 1.6, 3.9, 6.2, 7.3, 10.8, and 12 µm were initially considered as components of x.Some of the above-listed spectral channels are usually utilized to infer information on cloud top microphysical properties.In particular, the observations acquired at 10.8 and 12.0 µm are used to provide information on cloud top temperature and cloud optical thickness, the observations at 0.6 µm are also used to get information about cloud optical thickness, while the 3.9 and 1.6 µm observations are used to infer information on the cloud thermodynamic phase and cloud effective radius.The precipitation processes are strongly related to the cloud top microphysical structure and, in particular, the rain rate confidence is high for cloud tops with large cloud droplets or in the presence of ice (Lensky and Rosenfeld, 1997).Consequently, in this study the use of features derived from spectral channels connected with cloud microphysical properties could allow for the identification of raining clouds.been normalized so as to prevent the features (x i ) characterized by the largest variance across the training data set from dominating the Euclidean distance.The normalization formula applied to each feature is xi = where x i is the ith component of the feature vector x to be normalized, xi is the ith component of the normalized x, xi and σ i are, respectively, the mean and the standard deviation for the feature x i calculated considering all the training set samples.This equation is also applied to the feature vector related to the pixels to be classified.By bearing in mind that the k-NNM classifier performance generally decreases with the dimension of the feature vector, the number of the feature vector components (x i ) has been reduced.For this purpose, the Fisher distance criterion (Ebert, 1987;Parikh, 1977), described in Appendix A, has been applied in order to evaluate the discriminatory power of the individual features.The Fisher distance has been determined for the following combinations: (C 0 , C 1 ); (C 0 , C 2 ); and (C 1 , C 2 ).The features have been ordered in a descending way on the basis of the correspondent Fisher distance value, so that the features characterized by higher Fisher distances have been chosen as components of the feature vector.The definitive values of the feature vector components d and the RainCEIV k-NNM classifier k parameter have been determined as described in the following sub-section.

Training procedure
The training data set has been built by collecting a set of SEVIRI images during the day-and nighttime, with collocated RR values inferred from AMSU-B/MHS observations processed with the PEMW algorithm (Di Tomaso et al., 2009), both over land and sea.PEMW exploits the win-dow and water vapour channel observations.PEMW estimates show a very good agreement with ground-based observations in the detection of rainfall and a reasonably good estimation of RR values.The probability of detection (POD) of precipitation is 75 and 90 % for RR greater than 1 and 5 mm h   The bootstrap samples for each class have been determined as follows: 1. the sample (y k , C j ) was selected 2. r was chosen equal to N c,j /4 and the r nearest neighbours (NN) of the sample (y k , C j ) (indicated as (y k,s , C j ) s=1,r ) were found (the NN decision rule is explained in Appendix A) 3. the ith component of the bootstrap sample was calculated by applying the equation to all the components of the (y k,s , C j ) s=1,r .For simplicity the generic ith component of the (y k,s , C j ) s=1,r is indicated as y i k,s without indicating the belonging class C j , in the same way by i k is the ith component of the bootstrap sample (by k , C j ) obtained by starting from the sample (y k , C j ).
4. points 2 and 3 were repeated for each of the following r values: r = N c,j /5, N c,j /10, N c,j /2 -8, N c,j /2 -6, N c,j /2 -4, N c,j /2 -2 5. the process restarted from point 1 with another sample and points 2, 3 and 4 were applied until all the test samples were considered for each class.5 and 6, respectively.The features used over land and over sea are the same, but in some cases they vary for different cloud classes, for example the max value of the ASM is very useful in order to determine the confidence that a low/middle cloud is precipitating, but its discriminatory power is not so high as to individuate the precipitating high thick clouds.On the contrary, the minimum and maximum values of entropy, mean and contrast give a useful contribution in detecting both light-to-moderate rainy class and heavy-to-very-heavy-rainy class for all the cloudy classes.

C_MACSP validation results
The validity of the C_MACSP algorithm has been tested by applying it to an independent data set for which each class made 300 samples taken from the SEVIRI images acquired on the following dates:  7 shows the results obtained.On the basis of the samples examined, it is possible to assert that C_MACSP is able to classify high thick clouds as well as

RainCEIV validation results
The RainCEIV results have been validated against the RR values derived from the weather radar network operated by the DPC.Table 8 lists the case studies used for validation.Tables 9 and 10 sum up the contingency values for the Rain-CEIV dichotomous statistical assessment related to the daytime and nighttime measurements, respectively.The statistical scores (shown in Table 11) have been calculated for all the classes considered together as well as for the light-tomoderate-rainy (C 1 ) and the heavy-to-very-heavy-rainy (C 2 ) classes separately.The accuracy scores for all the rainy/nonrainy pixels are 97 and 96 % for daytime and nighttime, respectively, when all the rainy classes are considered.High values for accuracy scores are also related to the C 1 and C 2 classes, considered separately both for daytime and nighttime.These results are significantly influenced by the number of the correct negatives.The bias scores indicate the Rain-CEIV tendency to overestimate the rainy events for all the rainy classes (bias = 1.36 for daytime, bias = 1.58 for nighttime) as well as the C 1 (bias = 1.33 for daytime, bias = 1.55 for nighttime) and C 2 (bias = 1.65 for daytime, bias = 1.89 for nighttime) classes considered separately.FAR, which gives the same information as bias score without considering the misses related to all the rainy classes, are 39 and 48 % for the daytime and nighttime validations, respectively.POD, which indicates the ability to detect rainy areas without considering the false alarms, is 81 % for all the rainy classes both for nighttime and daytime validations.POD indicates the ability of RainCEIV to detect rainy areas with a good approximation, but FAR shows its tendency to overestimate the number of rainy pixels.This tendency of RainCEIV will be analysed more in detail, considering the statistical scores related to the C 1 and C 2 classes separately.In order to be clearer, it is necessary to give the following definitions: -the percentage of the C 2 inC 1 samples (the samples classified as belonging to the C 2 class but that actually belong to the C 1 class) out of the total number of the C 1 samples used for validation will be indicated as %C 2 inC 1 -the percentage of the C 1 inC 2 samples (the samples classified as belonging to the C 1 class but that actually belong to the C 2 class) out of the total number of the C 2 samples used for validation will be indicated as %C 1 inC 2 -the percentage of the C 2 inC 0 samples (the samples classified as belonging to the C 2 class but that actually belong to the C 0 class) out of the total number of the  C 0 samples used for validation will be indicated as %C 2 inC 0 -the percentage of the C 0 in C 1 samples (the samples classified as belonging to the C 1 class but that actually belong to the C 0 class) out of the total number of the C 0 samples used for validation will be indicated as %C 0 inC 1 .
In detail, the bias score is higher for the C 2 class than for the C 1 one, and this proves the general RainCEIV tendency to overestimate the heavy-to-very-heavy-rainy pixels.
Moreover, FAR/POD related to the C 2 class is respectively 47 %/86 % and 65 %/65 % for daytime and nighttime validation, respectively.It is worth remarking that the FAR high values are due prevalently to the lower number of the C 2 samples.FAR related to the C 2 class is mainly affected by %C 2 inC 1 .In fact, %C 2 in C 0 (0.2 % for daytime and 0.3 % for nighttime) is lower than %C 2 inC 1 (2.4 % for daytime and 5.6 % for nighttime).This means that RainCEIV detects prevalently rainy areas, as testified by the POD value, but tends to misclassify C 1 samples as C 2 samples.In many cases RADARinSEVIRIv related to the misclassified C 1 samples is higher than 3 mm h −1 .The FAR/POD score related to the C 1 class is 41 %/77 % for daytime and 51 %/75 % for nighttime.%C 0 in C 1 (2.0 % for daytime and 2.8% for nighttime) is lower than %C 2 inC 1 (11.0 % for daytime and 28.2 % for nighttime).This points out both that RainCEIV is inclined to misclassify the C 2 samples as C 1 samples and the overestimation of the rainy area is mainly due to the misclassification of the non-rainy pixels as belonging to the C 1 class.The POD score related to the nighttime validation is quite similar to the POD score related to the daytime validation for all the rainy classes and the C 1 class (81 and 75 %, respectively), and it is lower for the C 2 class (65 %).
The worst values of the nighttime statistical scores, especially for the C 2 class, are mainly due to the unavailability of the spectral/textural features related to the VIS/NIR observations, which are characterized by a discriminatory power higher than that related to the spectral/textural features of the 3.9 and 12.0 µm observations.HSS has also been considered.It is a measure of the correct forecasts after eliminating those whose correctness would be due exclusively to  Case I was chosen because it highlights the RainCEIV ability in detecting very small rainy areas.On 29 September 2009 at approximately 13:00 UTC a very rapid and heavy rainfall event affected a small area between the Basilicata and Calabria regions in southern Italy.The accuracy score is high (99 %) due to the high occurrence of the non-rainy pixels detected correctly.POD shows that RainCEIV detects 67 % of the rainy samples correctly, while bias and FAR scores reveal the RainCEIV tendency to overestimate rainy samples (the FAR score is 47 % and the bias score is 1.25).In detail, the bias score related to the C 1 class (bias = 1.37) is higher than that related to the C 2 class (bias = 1.00), on the contrary FAR related to the C 1 class (FAR = 46 %) is lower than that related to the C 2 class (FAR = 50 %).This means that there is an overestimation of the heavy rainy area but (C 1 inC 2 + C 0 in C 2 ) and the number of the C 2 misses is balanced with the number of the C 2 hits.This is not true for the C 1 class that shows a higher number of hits than that of the C 2 class, and this results in a higher POD (75 and 50 % for the C 1 and C 2 class, respectively).In remarking this statistical results, it is worth noting that they are significantly influenced by the low number both of the C 2 RADARinSEVIRI samples (4) and C1 RADARinSEVIRI samples (8).Moreover, the temporal distance between the SEVIRI and RADAR acquisitions (about 5 minutes) can be determinant in the detection of the rainy events characterized by a high variability.It is argued that parts of the false alarms as well as the misses are brought about by the collocation errors in the SEVIRI grid.
The RainCEIV statistical scores related to cases II and III (Figs. 4 and 5, respectively) are better than those related to the case study discussed above.This is because they analyse rainy events characterized by a larger temporal and spatial distribution.Case study II is based on a set of heavy and moderate rainfall events that affected central and southern Italy on 4 August 2010 at 14:15 UTC.RainCEIV detects rainy samples with a POD of 89 % strongly related to the correct detection of the C 1 samples.In detail, POD is 82 % for the C 1 class and 66 % for the C 2 class, resulting from the fact that the number of misses related to the C 2 class is higher than that of the C 1 class.It is important to note that 70 % of the C 2 misses is misclassified as belonging to the C 1 class.Furthermore, the number of the false alarms related to the C 1 class is higher than that of the C 2 class and this leads to a lower value both of FAR (38 %) and bias (1.08) related to the C 2 class with respect to that related to the C 1 class (FAR = 56 % and bias = 1.86).Case study III is related to the analysis of an extreme convective event characterized by very heavy precipitation that occurred on 21 February 2013 on the east cost of Sicily which caused a flash flood over Catania.The RainCEIV detects all the rainy areas with a POD of 87 %, which becomes 50 % when only the C 2 samples are considered.The number of false alarms is higher for the C 1 class (FAR = 37 %) than for the C 2 class (FAR = 24 %), but while the C 1 samples are overestimated, RainCEIV missed 50 % of them (bias = 0.67).It is evident that RainCEIV is missing many heavy-rainy samples, which should be due to the high temporal variability of this rainy event.Nevertheless, it is able to monitor the evolution of all the rainy areas on the east cost of Sicily and on southern Calabria with a good approximation.

Conclusions
This paper proposes the RainCEIV technique as a useful tool for the continuous monitoring and characterization of the rainy areas in the Mediterranean region where there is an increased frequency of the extreme events.RainCEIV, which does not use any near-real-time ancillary data, exploits the  temporal differences of the brightness temperatures related to the SEVIRI water vapour channels.These are indicative of the atmosphere instability and, as a consequence, could give useful information for the detection of the rainy areas when analysed with the spectral and textural features related to the other SEVIRI channels.Because of the well-known limitations of the IR/VIS observations in determining RR values, the RainCEIV's main purpose is to provide a near-real time qualitative characterization of the rainy areas, especially in regions not covered by the radar and rain gauge network.
RainCEIV consists of two modules that use geostationary observations from SEVIRI in order to detect cloudy pixels and, successively, to associate them to a rainy/nonrainy class.RainCEIV uses both IR and VIS observations to determine if the SEVIRI pixel belongs to the non-rainy (C 0 ), light-to-moderate-rainy (C 1 ) or heavy-to-very-heavyrainy (C 2 ) class.The IR/VIS observations do not have the same potentiality as MW observations in characterizing rainy areas, but their high spatial and temporal resolution are used to get a continuous monitoring of the stratiform and convective events.The RainCEIV training phase has been carried out by collecting a set of SEVIRI pixels with co-located RR values inferred from AMSU-B/MHS observations processed by the PEMW algorithm and, when available, with co-locate radar-derived RR values.This double matching of the SE-VIRI pixels is an important aspect of RainCEIV because it allows for a reliable training data set.
RainCEIV has been validated on the basis of the RR observations from the Italian DPC operational weather radar network.The dichotomous statistical scores indicate that a good fraction (97 % for daytime validation and 96 % for nighttime validation) of the pixels examined are correctly identified as rainy or non-rainy by the RainCEIV.The bias scores (1.36 for daytime validation and 1.58 for nighttime validation) and the FAR scores (39 and 48 %) suggest that Rain-CEIV tends to overestimate rainy pixels, especially during the nighttime, while the POD scores (81 % both for daytime and nighttime validation) indicate that RainCEIV detects rainy areas with a good approximation.The rainy areas overestimation is mainly due to the misclassification of C 0 samples as C 1 samples.Moreover, the high FAR values related to the C 1 and C 2 classes are mainly due to the misclassification of the C 1 samples as C 2 samples and vice versa.The statistical scores obtained for the daytime validation are generally better than those obtained for the nighttime validation.This is mainly due to the fact that the features related to the VIS/NIR observations (unavailable during nighttime) have a strong influence on the RainCEIV output because of their higher discriminatory power when compared with that of the features related to the 3.9 and 12.0 µm observations.In remarking upon the comparison results, it is important to bear in mind the different spatial resolutions as well as the temporal distance between radar and satellite observations that could affect the statistical scores negatively, especially for rapid convective events, even if the time distance between radar and SEVIRI acquisitions is little.As far as future developments are concerned, RainCEIV will be updated to consider training-phase RADARinSEVIRI samples, characterized by a percentage of rainy RS samples lower than 80 %, so as to individuate extreme rainy events located over an area whose size is smaller than that of the SEVIRI pixel area.To this aim, information from the Visible Infrared Imaging Radiometer Suite (VIIRS) on board the Suomi National Polar-orbiting Partnership (NPP) (characterized by higher spatial and spectral resolutions than SEVIRI) will be taken into account when available.The purpose is the integration of the SEVIRI and VIIRS observations in order to determine the cloud classification and the rainfall occurrence probability at a better spatial resolution (from 3 km for SEVIRI to 0.375 km/0.750km for VIIRS at the sub-satellite point).

E. Ricciardelli et al.: A statistical approach for rain intensity differentiation
Appendix A: Procedure adopted for the training set refinement The RainCEIV and C_MACSP original training data sets have been refined by applying the same procedure to the samples of each class.
The refinement process consists in using the nearest neighbour decision rule described by Cover and Hart (1967) in order to classify each sample of the initial training classes.Here the aim of this process is to eliminate the redundant and misclassified training samples, which is similar to the CNN rule described in Hart (1968) but the main purpose of CNN is to get a training subset to perform as well as the original one.Before the description of the refinement process, a brief description of the NN decision rule and of the Fisher criterion (used to reduce the number of the components of the feature vector) will be given.
Let T o = {(x i , C j )} be the original training data set, where the pairs (x i C j ) indicate the training samples x i of the class C j , j = 1, 2,. . ., N c , where N c is the number of the classes, and i = 1, 2,. . ., N c,j , where N c,j is the number of the training samples for the class C j .Given a vector y to be classified, the NN rule establishes that y belongs to the class C j when the minimum distance is that from the training sample x i belonging to class C j , and then x i is the nearest neighbour of y.
Before applying the RR decision rule, it is important to define the dimension of the feature vector.In fact, since the k-NN classifier performance generally decreases with the dimension of the feature vector, the number of the components (x i ) of x has been reduced by applying the Fisher criterion (Ebert, 1987;Parikh, 1977) to evaluate the discriminatory power of the individual features and to choose the features characterized by the higher Fisher distance value.Let x i j and σ i j be the mean and standard deviation of the feature x i for the training set from class C j , thus the Fisher distance is defined as It measures the ability of the feature x i to differentiate class C j from class C k .The features x j , within x, have been ordered in a decreasing way on the basis of the D ij k values and the first d features have been chosen as the components of the feature vectors used.The dimension d has been fixed by following the suggestions in Jain and Chandrasekaran (1982), who point out that the ratio between the number of the training samples for each class and the feature vector dimension d should be at least five.The procedure to obtain the refined training data set, T r , starting from the original training data set T o , consists of 1. considering the ith pattern (x i , C j ) of T o 2. applying the NN decision rule and determining the following actions on the basis of the three possible classification results: -the NN belongs to the initial belonging class C j and the Euclidean distance is higher than zero, consequently the sample is put in T r -the NN belongs to a different class C i = C j , consequently the sample is reanalysed and included in the NN class -the Euclidean distance from the NN is zero, the sample is considered redundant and it is removed from T o and not included in T r 3. restarting from point 2 with another sample and applying the entire process until all the training samples have been analysed.
T r , determined for each class, is used as the definitive training data set.
, respectively, ofRicciardelli et al. (2008), but the training data set has been updated in order to build the training samples for the convective cloud class.The training samples were collected in the Mediterranean Basin, where RainCEIV operates.The cloud classification for the training data set has been made through a careful visual inspection of the SEVIRI images.
−1 , respectively(Di Tomaso et al., 2009).At present, the PEMW algorithm operative version (OPEMW) is operationally run 24/7 at IMAA-CNR.OPEMW has been validated byCimini et al. (2013) against radar-derived RR values and rain gauge surface rain intensity.The analysis shows an accuracy of 98 % in identifying rainy and non-rainy areas and a Heidke skill score of 45 % (with respect to radarderived RR values) and 42 % (with respect to rain gauge RR values).The accuracy, bias score, probability of detection, false alarm ratio (FAR), Heidke skill score (HSS) are described inEbert (2013).The AMSU-B/MHS observations used for building the training database are collected during the NOAA satellite passes over the Mediterranean area on the dates listed in Table1.The training data set has been built by coupling cloudy SE-VIRI pixels with the corresponding RR value calculated by the PEMW algorithm and, where available, with the radarderived RR values.When no radar-derived RR value is available (because the AMSU-B/MHS observation is outside the area covered by the radar network) the SEVIRI pixel is classified as belonging to one of the classes C 0 , C 1 , and C 2 on the basis of the corresponding PEMWinSEVIRIv and it is included in the initial training data set.When the RADARin-SEVIRIv is available and agrees with the PEMWinSEVIRIv in determining the rainy/non-rainy class the SEVIRI pixel belongs to, this is included in the initial training data set.Otherwise, when the RADARinSEVIRIv and PEMWinSE-VIRIv do not agree, the SEVIRI pixel is included in the initial training data set only if the correspondent RADARinSE-VIRI pixel belongs to a rainy class C 1 or C 2 and the percentage of the rainy RS is higher than 80 %.This choice is very useful for the training of the rainy events localized over an area smaller than the AMSU-B/MHS FOV area.The training samples have been considered separately for land and sea and grouped on the basis of the solar zenith angle (SZA).Finally, in order to refine the training data set, the process described in Appendix A has been applied to the initial training data set.The availability of the SEVIRI samples double matched with PEMW and radar-derived RR values is useful both for the mitigation of uncertainty due to the collocation process and the refinement of the original training data set, especially for the removal of the misclassified samples.Figure2describes the training procedure.Successively, in order to decide the best values for d and k, a set of test samples have been classified by varying d and k combinations.Moreover, an artificial data set, smoother and more versatile than the initial one, has been obtained by applying the bootstrap method (described byHamamoto et al., 1997) to the initial test samples.In order to make a more robust choice for d and k, the same d and k combinations chosen for the classification of the initial test data set have been used to classify the artificial data set.The best choice of d and k has been made by comparing the statistical scores obtained by classifying the two data set separately.Both the initial and the artificial data set contains the same number of samples for each class.Let Y = (y i , C j ) be the independent test data set built by examining the PEMW-RR values related to the AMSU-B/MSH overpasses of 12 February 2012 at 01:35 UTC, 12 November 2011 at 08:50 UTC, 22 November 2010 at 09:34 UTC, 4 August 2010 at 14:46 UTC, 26 April 2010 at 12:26 UTC, 1 October 2009 at 19:50 UTC, and 2 October 2009 at 05:00 UTC.The pairs (y i , C j ) indicate the test samples y i belonging to the class C j , j = 1, 2,. . ., N c , where N c is the number of the classes (for RainCEIV C j , j = 0, 1, 2, N c = 3) i = 1, 2,. . ., N c,j , where N c,j is the number of the test samples for the class C j .

A
careful screening has been done to eliminate the redundant bootstrap samples.The bootstrap (artificial) samples and the initial test samples have been classified separately by means of the k-NNM (using the original training data set).The statistical scores obtained for the two data sets are quite similar and they change in the same way, varying d and k as can be noted in Tables 2, 3 and 4 that list the statistical scores k = 3, d = 10, d = 16, d = 20 (Table 2); k = 5, d = 10, d = 16,
12 November 2010 at 11:27 UTC, 22 November 2010 at 09:27 UTC and at 11:43 UTC, 5 May 2012 at 20:27 UTC, 19 May 2012 at 10:57 UTC, 23 July 2012 at 10:27 UTC, 5 December 2012 at 08:43 UTC, 19 September 2009 at 19:13 UTC, 6 July 2010 at 11:27 UTC and 12:27 UTC, 4 August 2010 at 14:27 UTC, 26 December 2013 at 04:57 UTC, 8 October 2013 at 18:57 UTC, 7 October 2013 at 00:57 UTC and 20 January 2014 at 23:57 UTC.The validation has been carried out separately for samples acquired during nighttime and daytime by comparing the C_MACSP classification results and the samples manually collected from the independent data set images.The manual classification has been made through a careful observation of the SEVIRI RGB composition so as to get the same number of samples for each class.The convective cloud classification results have been validated considering the RR maps derived both from the weather radar network and the PEMW rain rate maps.The latter have been used for the areas where radar information is missing.The accuracy (defined as the ratio between the number of the test samples classified correctly and the total number of the test samples) has been determined for each class and Table
. In order to collect the training samples for the convective cloud class, the cloudy SEVIRI pixels have been matched with the corresponding PEMW-RR and radar-derived RR values, if available.The collocation process both of the radar-derived RR values and the PEMW-RR values in the SEVIRI grid is described in Sect. 2. The SEVIRI pixel is considered for the training when -both the RADARinSEVIRI pixel and PEMWin-SEVIRI pixel are available and the relation (RADARinSEVIRIv ≥ 4 mm h −1 ) and (PEMWin-SEVIRIv ≥ 4 mm h −1 ) is satisfied -both the RADARinSEVIRI pixel and PEMWin-SEVIRI pixel are available and the relations (RADARinSEVIRIv ≥ 4 mm h −1 ) and (PEMWin-SEVIRIv < 4 mm h −1 ) are satisfied and the percentage of the rainy RS samples is higher than 80 % -only the PEMWinSEVIRI pixel is available (the AMSU-B/MHS observation is outside the area covered by the radar network) and the relation (PEMWinSE-VIRIv ≥ 4 mm h −1 ) is satisfied.

Table 1 .
List of the NOAA satellite overpasses for the AMSU-B PEMW rain rate maps considered in the training phase.

Table 2 .
Statistical scores related to the RainCEIV rain rate results obtained classifying the initial and artificial test data set for k = 3.The statistical scores are shown for all the rainy classes (C 1 , C 2 ), light to moderate rain (C 1 ), and heavy to very heavy rain (C 2 ).

Table 3 .
Statistical scores related to the RainCEIV rain rate results obtained classifying the initial and artificial test data set for k = 5.The statistical scores are shown for all the rainy classes (C 1 , C 2 ), light to moderate rain (C 1 ), and heavy to very heavy rain (C 2 ).

Table 4 .
Statistical scores related to the RainCEIV rain rate results in classifying the initial and artificial test data set for k = 7.The statistical scores are shown for all the rainy classes (C 1 , C 2 ), light to moderate rain (C 1 ), and heavy to very heavy rain (C 2 ).

Table 5 .
Summary of the features considered for use in the RainCEIV k-NNM classifier during daytime.Label "A" means that the feature is used for all the C-MACSP classes; "LM" means that the feature is used for the low/middle cloud class; "HT/C" means that the feature is used for the high thick and convective cloud class.

Table 6 .
Summary of the features considered for use in the RainCEIV k-NNM classifier during nighttime.Label "A" means that the feature is used for all the C-MACSP classes; "LM" means that the feature is used for the low/middle cloud class; "HT/C" means that the feature is used for the high thick and convective cloud class.

Table 7 .
Accuracy of the C_MACSP algorithm on an independent data set.

Table 8 .
List of case studies used for validation.

Table 9 .
Contingency table for the dichotomous statistical assessment of the RainCEIV algorithm for all the pixels used for daytime validation.

Table 10 .
Contingency table for the dichotomous statistical assessment of the RainCEIV algorithm for all the pixels used for nighttime validation.August 2010 at 14:15 UTC (case II), and 21 February 2013 at 15:00 UTC (case III) are analysed separately and the RainCEIV results are shown in Figs.3, 4, and 5, together with the C_MACSP results and the rain classes obtained from the radar-derived RR measurements.The statistical scores calculated for each case are listed in Table12.

Table 11 .
Dichotomous statistical scores (RainCEIV versus radarderived rain rate measurements) for the case studies listed in Table8.The statistical scores are shown for all rainy classes (C 1 , C 2 ), light to moderate rain (C 1 ), and heavy to very heavy rain (C 2 ).

Table 12 .
Dichotomous statistical scores shown for all rainy classes (C 1 , C 2 ), light to moderate rain (C 1 ), and heavy to very heavy rain (C 2 ), for the case studies I, II and III.
Figure 3. 29 September 2009 at 13:00 UTC.From left to right: C_MACSP cloud classification results, radar-derived rain rate results, RainCEIV rain rate results.