Substantial biases exist in satellite precipitation estimates (SPEs) over complex terrain regions, and it has always been a challenge to quantify and correct such biases. The combination of multiple SPEs and rain gauge observations would be beneficial to improve the gridded precipitation estimates. In this study, a two-stage blending (TSB) approach is proposed, which firstly reduces the systematic errors of the original SPEs based on a Bayesian correction model and then merges the bias-corrected SPEs with a Bayesian weighting model. In the first stage, the gauge-based observations are assumed to be a generalized regression function of the SPEs and terrain feature. In the second stage, the relative weights of the bias-corrected SPEs are calculated based on the associated performances with ground references. The proposed TSB method has the ability to extract benefits from the bias-corrected SPEs in terms of higher performance and mitigate negative impacts from the ones with lower quality. In addition, Bayesian analysis is applied in the two phases by specifying the prior distributions on model parameters, which enables the posterior ensembles associated with their predictive uncertainties to be produced. The performance of the proposed TSB method is evaluated with independent validation data in the warm season of 2010–2014 in the northeastern Tibetan Plateau. Results show that the blended SPE is greatly improved compared to the original SPEs, even in heavy rainfall events. This study can be expanded as a data fusion framework in the development of high-quality precipitation products in any region of interest.

High-quality precipitation data are fundamental to the understanding of regional and global hydrological processes. However, it is still difficult to acquire accurate precipitation information in the mountainous regions, e.g., the Tibetan Plateau (TP), due to limited ground sensors (Ma et al., 2015). Satellite sensors can provide precipitation estimates at a large scale (Hou et al., 2014), but performances of available satellite products vary among different retrieval methods and climate areas (Yong et al., 2015; Prat and Nelson, 2015; Ma et al., 2016). Thus, it is suggested to incorporate precipitation estimates from multiple sources into a fusion procedure with full consideration of the strength of individual members and associated uncertainty.

Precipitation data fusion was initially reported by merging radar–gauge
rainfall in the mid-1980s (Krajewski, 1987). The Global Precipitation Climatology Project (GPCP) was an earlier attempt for satellite–gauge data
fusion, which adopted a mean bias correction method and an
inverse-error-variance weighting approach to develop a monthly,
0.25

This paper develops a new data fusion method that enhances the quantitative modeling of individual error structures, prevents potential negative impacts from lower quality members, and enables an explicit description of a model's predictive uncertainty. In addition, a Bayesian concept for accurate rainfall estimation is proposed based on these assumptions. The Bayesian analysis has the advantage of a statistical post-processing idea that could yield a predictive distribution with quantitative uncertainty (Renard, 2011; Shrestha et al., 2015). For example, a Bayesian kriging approach, which assumes a Gaussian process of precipitation at any location and considers the elevation a covariate, is developed for merging monthly satellite and gauge precipitation data (Verdin et al., 2015). A dynamic Bayesian model averaging (BMA) method, which shows better skill scores than the existing one-outlier-removed (OOR) method, is applied for satellite precipitation data fusion across the TP (Ma et al., 2018; Shen et al., 2014). Given the challenges of quantifying precipitation biases in regions with complex terrain (Derin et al., 2019), continuous efforts are required to extract the potential merit of Bayesian analysis for this critical issue.

In this study, a two-stage blending (TSB) approach is proposed for merging multiple satellite precipitation estimates (SPEs) and ground observations. The experiment is performed in the warm season (from May to September) during 2010–2014 in the northeastern TP (NETP), where a relatively denser network of rain gauges is available compared to other regions of the TP. The TSB method is expected to help with the exploration of multi-source/scale precipitation data fusion in regions with complex terrain.

The remainder of this paper is organized below: Sect. 2 describes the experiment including the study region and precipitation data. Section 3 details the methodology, including the TSB approach, and two existing fusion methods (i.e., BMA and OOR). Results and discussions are presented in Sects. 4 and 5, respectively. The primary findings are summarized in Sect. 6.

The study domain is located in the upper Yellow River basin of the NETP (Fig. 1). As shown in the 90 m digital elevation data, the altitude ranges from 785 m in the northeast to 6252 m in the southeast. The total annual precipitation is around 500 mm, and the annual mean temperature is 0.7

Spatial map of the topography and GR network used in the study, where 27 black cells are used for model calibration and 7 red cells are for model verification.

Four mainstream SPEs are used, including Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks – Climate Data
Record (PERSIANN-CDR) (Ashouri et al., 2015), Tropical Rainfall Measuring Mission
(TRMM) Multi-satellite Precipitation Analysis (TMPA) 3B42 version 7 (3B42V7)
(Huffman et al., 2007), National Oceanic and Atmospheric Administration (NOAA) Climate Prediction Center (CPC) Morphing Technique Global Precipitation Analyses Version 1 (CMORPH) (Xie et al., 2017), and the
Integrated Multi-satellitE Retrievals for the Global Precipitation Measurement (GPM) mission V06 Level 3 final run product (IMERG) (Huffman et
al., 2018). Basic information on the SPEs is shown in Table 1. The IMERG has
a 0.10

Basic information of the original SPEs used in this study.

The China Gauge-based Daily Precipitation Analysis (CGDPA) is used as ground precipitation source. It is developed based on a rain gauge network of 2400 gauge stations in mainland China using a climatology-based optimal interpolation and topographic correction algorithm (Shen and Xiong, 2016). The 34 grid cells with the gauge sites in the regions of interest are assumed as ground references (GRs), and all of the grid cells are independent from the Global Precipitation Climatology Center (GPCC) stations, which are used for bias correction of the TRMM/GPM-era data (e.g., 3B42V7 and IMERG) and CMORPH (Huffman et al., 2007; Hou et al., 2014; Xie et al., 2017; Joyce et al., 2004).

The diagram of the proposed TSB algorithm.

The diagram of the TSB method is shown in Fig. 2. Stage 1 is designed to reduce the bias of the original SPEs based on the GRs at the training sites with a Bayesian correction (BC) procedure. In Stage 2, a Bayesian weighting (BW) model is used to merge the bias-corrected SPEs.

Let

In Stage 1, we perform a conditional modeling of GRs on each SPE, i.e., the
probabilistic distribution

According to Bayes' theorem, the posterior probability density function (PDF) of parameter set

The estimation of the posterior distribution

Based on the posterior distribution of parameter set

For the

Generate a value

Ideally, the blended SPE (

The BMA method is a statistical algorithm that merges predictive ensembles
based on the individual SPE at the training period in regions of interest.
Here, the BMA result refers to the ensemble SPE. Based on the law of total
probability, the conditional probability of the BMA data on the individual
SPEs is expressed as

The OOR method is defined as the arithmetic mean of the individual SPEs by removing the feature with the largest offset. It is written as

To assess the performance of the proposed TSB method, several statistical error indices including the root mean square error (RMSE), normalized mean absolute error (NMAE), and the Pearson's correlation coefficient (CC) are used in this study. The specific formulas of these metrics can be found below:

In the experiment, model parameters are calibrated on the daily precipitation of warm season in 2014, where GR data at the 27 black grids in Fig. 1 are randomly selected for training the model. The model validation is performed under two scenarios: Scenario 1 will validate the model in space based on the data of the same period in validation stations (i.e., the seven red grids in Fig. 1), and Scenario 2 will validate the model in time based on the data of warm season from 2010 to 2013 at the same 27 black grids in Fig. 1. In addition, we consider a 10-fold cross-validation in space by randomly selecting 7 sites for model validation and the data of the remaining 27 sites as the training set. The performance of the TSB approach is further compared with BMA and OOR in the two scenarios.

Figures 4 and 5 show the posterior distribution curves of the posterior
parameters in Stage 1 and 2, respectively. As for each parameter in the
bias-corrected process, the individual SPEs including PERCDR, 3B42V7, CMORPH,
and IMERG show a similar pattern (Fig. 4a to d). This shows that the bias
structures of the original SPEs have similar characteristics. For all SPEs, the distribution mass of parameter

The PDF curves of posterior parameter sets with regard to

The PDF curves of posterior parameter sets in the data fusion process of Stage 2.

Table 2 presents the summary of the statistical error indices including RMSE, NMAE, and CC of the original (i.e., PERCDR, 3B42V7, CMORPH and IMERG), bias-corrected (i.e., BC-PER, BC-V7, BC-CMO, and BC-IME), and blended SPE under two scenarios in the NETP. Subsection 4.2.1 and 4.2.2 show the performance of the model validation under Scenario 1 and 2, respectively.

Summary of statistical error indices (i.e., RMSE, NMAE, and CC) of the original, bias-corrected, and blended SPEs in two scenarios in the NETP.

The original SPEs show large biases, with the RMSE, NMAE, and CC indices ranging from 6.25–8.56 mm d

The proposed TSB approach is also validated in Scenario 2, where the blended
SPE shows better performance in terms of its RMSE, NMAE, and CC at 6.37 mm h

As learned from the two validated scenarios, it is proven that the TSB approach has the potential to improve the satellite rainfall accuracy, and it has the ability to extract benefits from SPEs in terms of higher performances and mitigate poor impacts from the ones with lower quality.

Figures 7 and 8 show the statistics of evaluation scores of RMSE, NMAE, and CC for the original SPEs and blended estimates at the validation grids, with 10 random tests of the gauge locations in the warm season of 2014. For each test, seven grid sites are randomly selected from the 34 grid cells and used for model verification, and the remaining 27 grid sites are used for training the model.

Statistical error indices of the original and blended SPEs for 10 random verified tests in the warm season of 2014 in the NETP:

The box-and-whisker plots of improvement ratios of statistics for the blended SPE compared with the original SPEs, including PERCDR, 3B42V7, CMORPH, and IMERG for 10 random verified tests in the warm season of 2014 in the NETP:

As for the blended SPE, it achieves similar scores at the validation grids
among the 10-fold random samples. The blended SPE shows better skill compared with the original SPEs for each test in terms of RMSE, NMAE, and CC (Fig. 7). Statistically, the mean values of RMSE, NMAE, and CC for the blended SPE are 5.75 mm h

Summary of the mean values of RMSE, NMAE, and CC for the original and blended SPEs for 10 random verified tests in the warm season of 2014 in the NETP.

Mean improvement ratios of statistical error indices of the blended SPE, in terms of RMSE, NMAE, and CC compared with the original SPES for 10 random verified tests in the warm season of 2014 in the NETP.

To assess the performance of the proposed TSB approach, it is beneficial to compare the TSB result with the existing fusion approach. In this study, the BMA approach makes use of four original satellite data and the corresponding GR data at the 27 black grids shown in Fig. 1 in the warm season of 2014 to estimate the optimal BMA weights. In Scenario 1, the BMA data are calculated based on the BMA weights and the original SPEs from the seven red grids in the warm season of 2014, and the OOR data are calculated based on the OOR method using the original SPE data from the seven red grids in the warm season of 2014. In Scenario 2, the BMA data are calculated based on the BMA weights and the original SPEs from the 27 black grids in the warm season from 2010 to 2013, and the OOR result is calculated based on the OOR method and the original SPE data from the 27 black grids in the warm season from 2010 to 2013. Herein, we compare the blended SPE with both the BMA and OOR predictions in two scenarios, and their statistical error summary is shown in Table 5.

Summary of statistical error indices (i.e., RMSE, NMAE, and CC) for three fusion methods (i.e., OOR, BMA, and TSB) in the two scenarios in the NETP.

In Scenario 1, the TSB method achieves better skill scores, with RMSE, NMAE, and CC values of 5.36 mm d

Local recycling plays a premier role in the moisture sources of rainfall extremes in the NETP (Ma et al., 2020a). The 22 September 2014 event was a storm that represents the local heavy rainfall pattern in the warm season. Considering that accurate precipitation estimates for extreme weather are very important for flood hazard mitigation, we investigate the utility of the proposed TSB approach for this event to quantify its performance in an extreme rainfall case (Fig. 9a). The relative weights of BC-PER, BC-V7, BC-CMO, and BC-IME for the blended SPE are 0.264, 0.14, 0.191, and 0.405, respectively, for this event (Fig. 9b). It is found that the IMERG data have the biggest contribution, and the 3B42V7 and CMORPH data have a nearly similar contribution in the blended SPE.

Summary of statistical error indices (i.e., RMSE, NMAE, and CC) for the original and blended SPEs during a heavy rainfall event of 22 September 2014 in the NETP.

The PDF curves of blended SPE samples and the corresponding mean value at three gauge-based grids for a heavy rainfall case on 22 September 2014:

Spatial patterns of the daily mean precipitation in terms of the original SPEs in the warm season of 2010 to 2014 in the NETP:

Spatial patterns of the blended SPE in terms of

Table 6 reports the evaluation statistics reflecting the blended performance
for this case. It shows that the RMSE, NMAE, and CC values of the original SPEs
range from 8.18–9.24 mm d

It is important to explore the Bayesian ensembles at unknown sites in the domain. As learned from Fig. 11, it seems that each of the original SPEs can capture the spatial pattern of daily mean precipitation in the warm season but might fail in the representation of the precipitation amount, partly because of the satellite retrieval bias in complex terrain and limited GR network. Thus, the TSB method is further applied in the region of interest to demonstrate its performance for daily precipitation in the warm season of 2010–2014 in the NETP. It is found that the blended SPE shows high precipitation in the southwest and low precipitation in the northwest, as well as moderate precipitation in the eastern region. In addition, as compared with the original SPEs, higher values disappear from the spatial map except in the southwest corner for the blended SPE. The possible reason is that daily mean rainfall is the highest in the southwest corner for most SPEs, and larger value exists after the TSB approach. Meanwhile, the predictive Bayesian uncertainties including lower (2.5 %) and upper (97.5 %) quantiles are displayed from Fig. 12b to c to illustrate the blending variation in this application.

In spite of the superior performance of the TSB algorithm, some issues still need to be considered in practical applications, detailed in the following.

Because of limited knowledge on the influences of complex terrain and local climate on the rainfall patterns in the study area, the elevation feature is considered in the first stage. Table 7 quantifies the impact of the elevation covariate on the bias-corrected and blended SPE performances in Scenario 1 in the warm season of 2014 in the NETP. It is found that the inclusion of the elevation feature provides slightly better skill compared with the results without terrain information in this experiment. Considering that deep convective systems occurring near the mountainous area have an effect on the precipitation cloud (Houze, 2012), more attempts are required to improve the orographic precipitation in the TP in future.

Summary of statistical error indices (i.e., RMSE, NMAE, and CC) for bias-corrected and blended SPEs with and without consideration of terrain feature as a covariate in the TSB method in Scenario 1 in the NETP.

The data fusion application is based on four mainstream SPEs, and BC-IME and BC-PER show the best and worst performances among the bias-corrected SPEs in Stage 1. It raises a question as to why the first stage of bias correction is not simply applied and then the best-performing bias-corrected SPE selected as the final product. To address this issue, we investigate the statistical error differences among the BC-IME and blended SPE before and after the removal of BC-PER for 10-fold cross validation in the warm season of 2014 in the NETP (Fig. 13). It is beneficial to involve the Stage 2 in the TSB method because the blended SPE performs better skill than the best-performed bias-corrected SPE (i.e., BC-IME) in Stage 1. The primary reason is that the BW model is designed to integrate various types of bias-corrected SPE, which is limited in the BC model. In addition, both the blended SPEs with and without the consideration of PERCDR show similar performances of the RMSE, NMAE, and CC indices (Fig. 13). It implies that the TSB approach has the advantage of not being impacted by the poor-quality individuals (e.g., BC-PER), partly because the BW model can reallocate the contribution of the bias-corrected SPEs based on their corresponding bias characteristics.

Statistical error indices (i.e., RMSE, NMAE, and CC) of the best-performing bias-corrected SPE (i.e., BC-IME, black) and blended SPE before (red) and after (blue) removing the worst-performing BC-PER, for 10 random verified tests in the warm season of 2014 in the NETP.

In addition, as calculating the blended result at any new sites, the model parameters derived from the training grid sites are assumed to be applicable in the whole domain. Since we have a relatively dense GR network in the survey region, the current assumption is acceptable according to the performance of the blended SPE. It is helpful to give some guidelines on how many training sites are needed to apply the TSB approach in a region with complex terrain and limited GRs. The sensitivity analysis of the number of training grid cells on the performance of blended SPE at the validation grids is explored in Fig. 14. As the number of training sites is increasing, there is a decreasing trend for the RMSE and NMAE values but a slight increasing trend for the CC value. It seems that the performance of the blended SPE becomes similar as the number of training sites increases to 21. We admit that more information from the ground observations would be more beneficial for the blended gridded product in the region of interest. It is noted that, if extended to the TP or global scale, the extension of model parameters and training sites should be carefully considered. For instance, there are few gauges installed in the western and central TP (Ma et al., 2015); it might be a potential risk to directly apply this fusion algorithm to these regions.

Statistical error indices (i.e., RMSE, NMAE, and CC) of the blended SPE at the validation grid locations in terms of a different number of training sites in the warm season of 2014 in the NETP.

The aim of this study is not to model rainfall processes in a target domain but to propose an idea to extract valuable information from available SPEs and provide more reliable gridded precipitation in the high–cold region with complex terrain. Considering its spatiotemporal differences and the existence of many zero-value records, rainfall is extremely difficult to observe and predict (Yong et al., 2015; Bartsotas et al., 2018). With regard to the probability of rainfall occurrence, a zero-inflated model, which is coherent with the empirical distribution of rainfall amount, is expected to improve the proposed TSB algorithm. Also, hourly or even instantaneous precipitation intensity is extremely vital for flood prediction, which should be specifically designed when extending this fusion framework in the next step.

This study proposes a TSB algorithm for multi-SPE data fusion. A preliminary
experiment is conducted in the NETP using four mainstream SPEs (i.e., PERCDR,
3B42V7, CMORPH, and IMERG) to demonstrate the performance of this TSB
approach. Primary conclusions are summarized below:

This TSB algorithm has two stages and involves the BC and BW models. It is found that this blended method is capable of involving a group of original SPEs. Meanwhile, it provides a convenient way to quantify the fusion performance and the associated uncertainty.

The experiment shows that the blended SPE has better skill scores compared to the original SPEs in the two validated scenarios. The 10-fold cross validation in Scenario 1 further confirms the superiority of the TSB algorithm. In addition, it is found that the TSB method outperforms another two existing fusion methods (i.e., BMA and OOR) in the two scenarios. The performance of this fusion method is also demonstrated for a heavy rainfall event in the region of interest.

The application proves that this algorithm can allocate the contribution of individual SPEs to the blended result because it is capable of ingesting useful information from uneven individuals and alleviating potential negative impacts from the poorly performing members.

The gauge data are from the China Meteorological Data Service Center (

YM and XS conceived the idea. XS and YZ acquired the project and financial support. YM conducted the detailed analysis. HC, XS, and YZ gave comments on the analysis. All the authors contributed to the writing and revisions.

The authors declare that they have no conflict of interest.

This study is supported by the National Key Research and Development Program of China (nos. 2017YFC1503001 and 2017YFA0603101) and the Strategic Priority Research Program (A) of CAS (no. XDA2006020102).

This paper was edited by Fuqiang Tian and reviewed by three anonymous referees.