Rain gauges are unevenly spaced around the world with extremely low gauge density over developing countries. For instance, in some regions in Africa the gauge density is often less than one station per 10 000
In this study, we developed a short-latency (i.e. 2–3 d) rainfall product derived from the combination of the Integrated Multi-Satellite Retrievals for GPM (Global Precipitation Measurement) Early Run (IMERG-ER) with multiple-satellite soil-moisture-based rainfall products derived from ASCAT (Advanced Scatterometer), SMOS (Soil Moisture and Ocean Salinity) and SMAP (Soil Moisture Active and Passive) L3 (Level 3) satellite soil moisture (SM) retrievals. We tested the performance of this product over four regions characterized by high-quality ground-based rainfall datasets (India, the conterminous United States, Australia and Europe) and over data-scarce regions in Africa and South America by using triple-collocation (TC) analysis. We found that the integration of satellite SM observations with in situ rainfall observations is very beneficial with improvements of IMERG-ER up to 20 % and 40 % in terms of correlation and error, respectively, and a generalized enhancement in terms of categorical scores with the integrated product often outperforming reanalysis and ground-based long-latency datasets. We also found a relevant overestimation of the rainfall variability of GPM-based products (up to twice the reference value), which was significantly reduced after the integration with satellite soil-moisture-based rainfall estimates.
Given the importance of a reliable and readily available rainfall product for water resource management and agricultural applications over data-scarce regions, the developed product can provide a valuable and unique source of rainfall information for these regions.
Rainfall is the main driver of the hydrological cycle
Ground networks of rain gauges are considered the most accurate (and as a reflection the most used) rainfall observations across many regions of the world. However, the difficulty and the costs associated with their maintenance along with the timeliness of their data availability are critical obstacles for their use in real-time and seasonal applications. Moreover, while in developed regions the rain gauge distribution is sufficiently dense and supported by well-organized and well-funded organizations, in developing countries the data coverage is extremely poor.
The number of gauges around the world has been estimated to range between 150 000 and 250 000, but their distribution is far from being homogeneous
SREs are normally derived from sensors on board low-Earth-orbiting (LEO) and geostationary satellites
The long history of research in the area led in 2014 to the Global Precipitation Measurement (GPM) mission
Although extremely useful, one of the problems with SRE is the instantaneous nature of the measurement, which, along with the intermittent character of the rainfall, make SRE prone to errors
Model reanalysis datasets, such as the European Centre for Medium Weather Forecast (ECMWF) Interim Reanalysis
Despite these inherent limitations, SRE and reanalysis products are still the only valuable alternative to gauge-based observations within gauge-scarce regions, and the efforts to improve these datasets by merging procedures or by including other ancillary information has been significantly increasing in the last decade.
For instance,
A potential solution to circumvent this problem is the use of satellite SM observations as a source of rainfall ground information
In general, the main advantage of using satellite SM as an indirect measure of ground rainfall information is its uniform temporal and spatial coverage, availability in near real time, and the fact that it transcends national boundaries. Drawbacks are the low spatial resolution and the relatively low quality in mountainous areas, frozen soils and dense forests, which, however, is also an issue in the case of ground-based observations (due to uneven spatial distribution and data transmission issues in inaccessible areas, undercatch problems, and the cost of maintenance). As these problems impact the type of the sensor (active or passive) and the retrieval in different way, their combination would allow for exploiting their relative strengths for improving SRE.
In this study, we developed a short-latency (2–3 d depending on the region) rainfall product derived from the combination of IMERG-ER with multiple-satellite SM-based rainfall products. The latter are obtained from the inversion of the SM retrievals derived from (1) the Soil Moisture Active and Passive (SMAP;
The integration method we adopted is the optimal linear combination (OLC) approach
The key strengths of this integrated product are the following:
The paper is divided as follows. Section
In this section we describe the datasets used for the integration of IMERG-ER with SM2RAIN rainfall estimates, as well as the datasets used to validate the integrated product.
Different ground-based rainfall datasets were used for the four different regions to cross-validate the integrated product, namely, the Australian Water Availability Project (AWAP) in Australia, the ECA&D (European Climate Assessment & Dataset) rainfall dataset E-OBS (ENSEMBLES daily gridded observational dataset) gridded dataset in Europe, the National Centers for Environmental Prediction (NCEP) Stage IV dataset over CONUS and the India Metrological Department (IMD) rainfall gridded dataset over India. Below we describe the main features of these datasets (readers interested in more details can refer to the related publications).
The Australian Water Availability Project (AWAP) rainfall product is generated via spatial analyses on the quality‐controlled daily rain gauge measurements from the Australian Bureau of Meteorology daily rain gauge network. AWAP daily rainfall for a given day is the 24 h total rainfall from the day before at 09:00 local time to the current day at 09:00. The rainfall fields are gridded on a The ECA&D rainfall dataset E-OBS gridded dataset is derived through interpolation of the ECA&D (European Climate Assessment & Data) station data. The station dataset comprises a network of 2316 stations, with the highest station in northern and central Europe and lower density in the Mediterranean, northern Scandinavia and eastern Europe. The E-OBS dataset is derived through a three-stage process The National Centers for Environmental Prediction (NCEP) Stage IV The India Metrological Department rainfall gridded dataset is prepared from daily rainfall data of 6955 stations, archived at the National Data Centre, IMD, Pune, by using the Shepard method
In the following we describe the main characteristics of the satellite SM products used in the study. They are the following:
The Advanced Scatterometer (ASCAT) on board the Metop-A, Metop-B and Metop-C satellites is a scatterometer operating at the C band (5.255 GHz). It provides a SM product characterized by a spatial sampling of 12.5 km and from one to two observations per day depending on the latitude The Soil Moisture and Ocean Salinity (SMOS) mission provides a SM product through a radiometer operating at the L band (1.4 GHz) with 50 km of spatial resolution and one observation every 2–3 d For SMAP L3, the Soil Moisture Active and Passive (SMAP) mission SM product is obtained by L-band radiometer observations (1.4 GHz) with 36 km and one or two observations every 3 d depending on the location For AMSR2, the Advanced Microwave Scanning Radiometer 2 (AMSR2) on board the Global Change Observation Mission for Water satellite is a radiometer operating in the microwave band. Soil moisture retrieval from AMSR2 is obtained from the C and X bands, which allow for obtaining a spatial–temporal resolution of 25 km daily
In addition to satellite SM products, different rainfall datasets were used in the study both for cross-comparison purposes and as a part of the integration procedure. In the following the main characteristics of each dataset are provided.
The First Guess Daily product provided by the Global Precipitation Climatology Center (GPCC; ERA5 is the latest climate reanalysis produced by ECMWF, providing hourly data on many atmospheric, land-surface and sea-state parameters together with estimates of uncertainty. The rainfall variable used in this study is characterized by a spatial resolution of 36 km and an hourly temporal resolution. ERA5 is available from the Copernicus Climate Change service ( The IMERG algorithm, firstly released in early 2015
SM2RAIN
Note that in Eq. (
The optimal linear combination (OLC) approach
Note that for the OLC method to be analytically optimal, a bias correction of the ensemble members in
In addition, it is worth mentioning that the rainfall information brought from different SM2RAIN products to IMERG-ER is potentially redundant especially when the SM estimates from SMAP, ASCAT and SMOS agree each other. The OLC method is particularly advantageous in this sense, as it accounts for both performance differences and error covariance between the rainfall products and is therefore insensitive to the addition of redundant information. Other more sophisticated methods can be also applied, although there is no guarantee that such methods would lead to better results. For instance,
This section describes the four steps necessary for obtaining the integrated product pre-processing of the soil moisture and rainfall products used in the integration (Sect. selection of the parameters of SM2RAIN (Sect. selection of the multiplication factors (Sect. calculation of the coefficients of OLC via Eq. (
Note that a unique calibration dataset,
Integration scheme used for the calculation of the integrated rainfall product
Global SM and rainfall products come with different resolutions and grids. Moreover, the application of the SM2RAIN algorithm to SM observations requires preliminary processing. In step 0, we resampled all the datasets to the same
As satellite SM data are not provided regularly spaced in time and contain gaps (for instance we did not include in the analysis observations characterized by frozen soils, snow presence or radio interference contamination; by using the specific flags for each product), they were linearly interpolated at 00:00 UTC to produce SM2RAIN daily rainfall from 00:00 to 23:59 UTC (see step 1). Note that we limited the interpolation to a maximum of 2 d; beyond that we assumed SM2RAIN rainfall were missing (in these cases only IMERG-ER is used in the integrated product as better described in Sect.
Step 1 refers to the calibration of SM2RAIN for the selection of the optimal parameters distribution pixel by pixel. All the parameters described in Sect.
As depicted in Sect.
This procedure is in principle a climatological correction rather than a bias correction because it uses the climatology of
For the application of OLC (i.e. integration), we proceeded by considering these three methodological aspects:
First off, we performed a quality check, by comparing the correlation coefficient of each SM2RAIN product with the calibration dataset ( The calculation of the OLC coefficients in Eq. ( The application of OLC among the SM2RAIN products and IMERG-ER was carried only when IMERG-ER values are larger than zero, taking advantage of the enhanced rain–no-rain detection accuracy of IMERG that uses DPR The final product is then composed of multiple rainfall datasets weighed according to Eq. (
The success of the overall procedure described above is dependent upon the quality of
For the validation of the integrated product, two different strategies were followed. First, we selected four key regions characterized by different climates and landscapes (i.e. CONUS, AU, EU and IN) where ground-based observations (derived from rain gauges and rain gauges plus radar) are very dense and of a high quality (see Sect.
Next, since many areas of the world like Africa, South America and central Asia have a highly variable density of rain gauges, validation was also performed using a TC analysis as proposed by (
Both continuous and categorical error metrics were adopted for validating daily rainfall. The continuous scores are the following:
In addition, three categorical scores were considered: the probability of detection POD
Contingency table commonly used for characterizing detection errors of precipitation products.
In this study, TC analysis
Suppose we have three measurement systems
In addition,
Note that the error (and correlation) calculated via TC is generally lower (higher) than those calculated using the classical validation, given that it does not include the reference uncertainty.
Although the integrated product is potentially available everywhere, we found that where the quality of satellite SM observations is very low like in forests, frozen soils and mountainous areas, OLC coefficients associated with the SM2RAIN products were very small, and the integrated product was mainly constituted by IMERG-ER. Therefore, to avoid any misinterpretation about the real benefit of integrating IMERG-ER with satellite SM observations, we limited the validation of the integrated product to the ASCAT committed area
Both the calibration of SM2RAIN and the OLC implementation need a calibration dataset as described in Sect.
The choice of a calibration dataset is strategic for both the SM2RAIN parameters selection and the OLC coefficients calculation. Thus, it has to be carefully selected based on (i) accuracy (i.e. low error and high correlation with “true” rainfall), (ii) homogeneous performance in time and space, and (iii) continuous spatial and temporal coverage (as well as spatial and temporal resolution closer to the one of the rainfall to be estimated). Potential candidates are:
GPCC, IMERG-ER and SM2RAIN-ASCAT* (triplet A) GPCC, IMERG-ER and ERA5 (triplet B) GPCC, ERA5 and SM2RAIN-ASCAT* (triplet C).
To explore the performance of ERA5 and GPCC, we applied TC as described in Sect.
Note that SM2RAIN-ASCAT* above is not the one used in the integration, but it was produced using constant parameters
Table
Triple-collocation correlation obtained by using triplet (A) GPCC–IMERG-ER–SM2RAIN-ASCAT, (B) GPCC–IMERG-ER–ERA5, (C) GPCC–ERA5–SM2RAIN-ASCAT for the period 2015–2018. The numbers refer to median values.
Figure
Note that, except for CONUS where rain gauge information is ingested into ERA5
Figure
Products used in the integration over CONUS
Table
Median correlation (
In terms of the variability ratio, we did not observe significant conditional biases of the
KGE results provide an integrated measure of the scores discussed above.
Correlation increments obtained by ingesting ASCAT, SMOS and SMAP SM2RAIN-based rainfall estimates into the IMERG-ER product. Values in bold inside the box plots refer to the median increments expressed in terms of percentage. The box plot refer to the 25th and 75th percentiles, while the whiskers refer to the minimum and maximum values. Outliers are not shown in the plot.
Figure
Figure
Percentage differences in correlation (
To understand the benefit of integrating SM-based rainfall with IMERG-ER as a function of the topographic complexity, Fig.
Difference in median correlation (
The benefit of the integration was also computed as a function of land cover (panels b and d in Fig.
Figure
Figure
Difference in categorical indices (probability of detection – POD, false alarm ratio – FAR – and threat score – TS) between the integrated product
After the 50–60th percentiles, a significant increment of POD is evident for all the study regions, whereas the differences in FAR denote a deterioration from the 50th to 80th percentile across CONUS, EU and AU (very small) and in IN (much larger). The latter seems caused by more noisy satellite SM observations over India, which directly impacts the quality of SM2RAIN estimates (causing higher FAR; see also
Prior to the assessment of the rainfall products over Africa and South America with TC, we run TC analysis over AU, CONUS, EU and IN, where
Figure
Products used to integrate IMERG-ER with SM2RAIN products derived from the setup during the calibration period in South America. When low correlation was found between the reference dataset (i.e. GPCC) and the SM2RAIN product, the latter was excluded from the analysis, and only IMERG-ER was retained.
Unlike the results presented in Sect. ERA5–GPCC–IMERG-ER ERA5–GPCC–SM2RAIN-ASCAT* ERA5–GPCC–
Figure
Triple-collocation squared correlation (
As in Fig.
Figure
Box plots of triple-collocation squared correlation (
In this study, we have developed a procedure to obtain a short-latency (less than 2–3 d), daily 25 km satellite-based rainfall product based on the integration of IMERG-ER with SM2RAIN-based rainfall estimates derived from three different satellite SM products (i.e. SMOS, SMAP and ASCAT). With this latency – potentially reduced to about 1 d via the use of L2 products – the product targets agricultural and water resource management applications over data-scarce regions like Africa, South America and central Asia.
To merge SM2RAIN-based rainfall estimates with IMERG-ER, we used the OLC approach previously used by
The integrated product was cross-validated with high-quality ground-based rainfall observations in Australia, India, Europe and the conterminous United States and cross-compared in the same regions against long-latency products (i.e. released with a time span of 1–2 months and thus not suited for operational applications). The validation entailed different continuous and categorical scores and was carried out for different land cover classes and as a function of the topographic complexity. In this respect, we found the following:
The integrated product performed relatively well and often better than the long-latency products, which are designed to obtain best performance, as they ingest many observations and use gauges (often the same used here for validation). The best product in regions with high-density rain gauge observations was found to be GPCC (although this product is obviously correlated with the ground reference). An interesting feature was the better performance of the integrated product with respect to the calibration dataset which highlights the high value of information provided by SM. These results are relevant given that the integrated product can be potentially released within 2–3 d. The improvement of IMERG-ER was relevant and ranged from 10 % to 15 % in terms of correlation and up to 40 % in terms of RMSE. A smaller impact of the integration was obtained over very dense forests and complex terrain given the inherent limitations of satellite-based observations over these areas. We also observed deterioration in correlation in some areas of north-western CONUS and India which need further analysis. An ability to reduce the variability ratio which was too high was observed in the IMERG-ER product. One of the reasons for this was also related to the lower variability of SM2RAIN-based rainfall estimates, which were produced by minimizing the RMSE with the calibration dataset (i.e. ERA5). Despite being beneficial in this case, this issue can be relevant and could also impact the ability in the prediction of extreme values and a modification of the true rainfall distribution. However, a closer look at the distributions of the reference and the estimated rainfall (not shown) suggests that the integrated product was not impacted too much form this issue. An improvement of the KGE score as a consequence of the improvement of the correlation (mainly) and the variability ratio was found in all cases except India. Here, despite the better correlation, the integrated product was characterized by a higher bias and lower variability, which drew KGE to values lower than the ones of IMERG-ER. An additional validation, totally independent from the calibration, was carried out in Africa and South America. Here, due to the lack of a reliable benchmark dataset, we adopted TC analysis (after having validated it) to calculate error and correlation of the integrated product, IMERG-ER, GPCC and ERA5. Results confirm the values of those obtained via classical validation with the integrated product outperforming IMERG-ER. Moreover, in data-scarce regions, the integrated product outperforms GPCC and provides similar performance to ERA5 (better in the Sahel region).
Despite the good performance achieved by the product, several aspects need further investigation.
The short time records of some of the satellite-based observations used in the integration (i.e. SMAP and IMERG-ER) limited the length of the calibration period which could impact the calculation of the climatological-correction procedure and the OLC coefficients shown in Methods. It also shrinks the length of the validation period, which was restricted to 2018. The relatively short period of calibration has therefore potential impacts on the ability of the products to reproduce correct climate patterns. Thanks to the recent availability of the IMERG-ER product from 2000 onwards, this aspect will be further investigated in the future versions of the product. Although TC is a possible (and likely the only) alternative for evaluating rainfall estimates over data-scarce regions, it does not provide a thorough evaluation of the rainfall estimates, as it does not provide information about categorical scores and bias. Therefore, over these regions it is not guaranteed that the integrated product performance is optimal in this respect. Future work should focus on testing the product for applications like flood prediction, water resource management, crop modelling and risk insurance. Note that first attempts in using the product for flood prediction (not shown in this study) are providing promising results. The integration is not possible everywhere given the low quality of the satellite SM observations over dense forests and the lack of SM information over frozen surfaces. We can only have confidence in the optimal performance of the integrated product over the area described in Sect. Daily 25 km temporal–spatial sampling might be not adequate for small-scale applications. Future work should therefore take into account satellite SM products with a higher spatial resolution Despite 2–3 d of latency being fine for many applications, it might not be sufficient for rainfall monitoring in real time and flood forecasting in medium to small basins. In this respect, IMERG-ER, with its 4–5 h of latency, is the only satellite product potentially providing rainfall observations that could be used for such applications, although in that case not only the latency is important but also the spatial resolution. Future work should focus on the integration of L2 satellite SM products with IMERG-ER also using alternative integration schemes and products with respect to those used in this study. The record length of the product is restricted to the GPM and SMAP eras (i.e. 2015 onward). This potentially limits the use of the products for drought and flood frequency analysis. However, the integration procedure does not rely upon the availability of the above products but can be applied to any other long-term rainfall and soil moisture dataset available. Note that all the IMERG products are now reprocessed back to the start of the TRMM (Tropical Rainfall Measuring Mission) era (from March 2000 to present), and SM observations are available back to 1978
The supplement related to this article is available online at:
CM proposed and developed the idea of integration, carried out the analysis and typeset the paper. LB participated in the discussion and setup of the study. TP participated in the discussion and setup of the study. PF helped in the data preparation and analysis. LC helped in the data preparation and analysis and participated in the discussion and setup of the study. VM helped in the paper revision and typesetting and in designing the study setup. GA helped in the application of the OLC technique and in the typesetting of the paper. YK participated in the discussion and setup of the study. DF participated in the discussion and setup of the study.
The authors declare no conflicts of interest.
This work is supported by the European Space Agency (ESA; contract no. 4000114738/15/I-SBo) project SMOS+Rainfall Land II. Gab Abramowitz acknowledges the support of the Australian Research Council Centre of Excellence for Climate Extremes (grant no. CE170100023).
This research has been supported by the European Space Agency (grant no. 4000114738/15/I-SBo).
This paper was edited by Shraddhanand Shukla and reviewed by two anonymous referees.