A flexible two-stage approach for blending multiple satellite precipitation estimates and rain gauge observations: an experiment in the northeastern Tibetan Plateau

Substantial biases exist in the Satellite Precipitation Estimates (SPE) over complex terrain regions and it has always 15 been a challenge to quantify and correct such biases. The combination of multiple SPE and ground observations would be beneficial to improve the precipitation estimates. In this study, a flexible two-step approach is proposed by firstly reducing the systematic errors of each SPE using rain gauge observations as references, and then merging the improved multi-SPE with a Bayesian weighting model. In the 1st stage, gauge references are assumed as a generalized regression function of SPE and terrain feature. In the 2nd stage, the weights assigned to the involved SPE are calculated according to the associated 20 performance relative to gauge references. This blending method has the ability to exert benefits from multi-SPE in terms of higher performance and mitigate negative impacts from the ones with lower quality. In addition, Bayesian analysis is applied in the two phases by specifying prior distributions on the model parameters, which enables to produce posterior ensembles associated with their predictive uncertainties. The performance of the two-step blending approach is assessed using independent rain gauge observations during the warm season of 2014 in the northeastern Tibetan Plateau. Results show that 25 the blended multi-SPE are significantly improved compared to the original individuals, especially during heavy rainfall events. This study can also be expanded as a data fusion framework in the development of high-quality precipitation products in highcold regions characterized by complex terrain.

ground sensors (Ma et al., 2015). The satellite sensors are capable of providing precipitation estimates at a large scale (Hou et al., 2014), but performances of available satellite products vary among different retrieval methods and climatic areas (Yong et al., 2015;Prat and Nelson, 2015;Ma et al., 2016). Thus, it is suggested to incorporate precipitation estimates from multiple sources into a fusion procedure with fully consideration of the strength of individual members and associated uncertainty. 35 Precipitation data fusion was initially reported by merging radar-gauge rainfall in the mid-1980s (Krajewski, 1987. The Global Precipitation Climatology Project (GPCP) was an earlier attempt for satellite-gauge data fusion, which adopted a mean biascorrected method and an inverse-error-variance weighting approach to develop a monthly, 0.25 global precipitation data (Huffman et al., 1997). Another popular dataset, Climate Prediction Center Merged Analysis of Precipitation (CMAP), 40 included global monthly precipitation with a 2.5° x 2.5° resolution for a 17-year period by merging gauges, satellites and reanalysis data using the maximum likelihood estimation method (Xie and Arkin, 1997). Since then, several blending approaches have been developed to generate rainfall products with higher quality by merging gauge, radar and satellite observations (e.g., Li et al., 2015;Beck et al., 2017;Xie and Xiong, 2011;Yang et al., 2017;Baez-Villanueva et al., 2020).
Overall, those fusion methods follow a general concept by eliminating biases in satellite/radar-based data and then merging 45 the bias-adjusted satellite/radar estimates with point-wise gauge observations. However, these efforts might be insufficient for quantifying the predicted data uncertainty. Some blended estimates are also partially polluted by the poorly performed individuals . Therefore, this paper develops a new blending algorithm that enhances the quantitative modeling of individual error structures, prevents potential negative impacts from lower-quality members, and enables an explicit description of the model's predictive uncertainty. In addition, a Bayesian concept for accurate rainfall estimates is 50 proposed based on these conditions. The Bayesian analysis has the advantage of a statistically post-processing idea that could yield a predictive distribution with quantitative uncertainty (Renard, 2011). For instance, a Bayesian kriging approach, which assumes a Gaussian process for precipitation at any location and considers the elevation as a covariate, is developed for merging monthly satellite and gauge precipitation data (Verdin et al., 2015). A dynamic Bayesian model averaging method is applied for satellite precipitation data merging across the TP (Ma et al., 2018). Given the flexible distribution of multiple 55 sources of precipitation biases in regions with complex terrain, continuous efforts should be taken to exert the potential merit of Bayesian approach on this critical issue.
In this paper, a two-step approach is described for blending multiple Satellite Precipitation Estimates (multi-SPE) and pointbased rain gauge observations. The initial experiment is performed during the warm season of 2014 in the northeastern TP 60 (NETP), where a denser network of rain gauges is available compared to other regions of TP. The proposed two-stage blending approach is also expected to help with the exploration of multi-source/scale precipitation data merging in other regions with complex terrain. https://doi.org/10.5194/hess-2020-43 Preprint. Discussion started: 17 February 2020 c Author(s) 2020. CC BY 4.0 License.

Overview
This algorithm aims at developing a multi-source data merging framework so as to provide the best-available precipitation 95 product in any region of interest. Let ( , ) denote near-surface precipitation at gauge site s and the t th day in a year. The original and bias-adjusted multi-SPE at the same location and time are defined as ( 1 ( , ), 2 ( , ), … , ( , ) and ( 1 ′ ( , ), 2 ′ ( , ), … , ′ ( , ) ), respectively. For simplicity, they are accordingly replaced by , ( 1 , 2 ,…, ), and ( 1 ′ , 2 ′ ,…, ′ ). Noted that the value of p equals to 4 in this study, where PERCDR, 3B42V7, CMORPH and IMERG refers to 1 , 2 , 3 , 4 , respectively. 100 The diagram of the proposed two-stage blending approach is shown in Figure 2. Stage 1 is designed to correct the systematic errors of individual SPE using point-based rain gauge observations (training sites) as ground references, where the assumptions of various probabilistic distribution for gauge references conditional on each SPE are not limited to Gaussian prototype. The impact of topography is also considered. In the 2 nd step, a Bayesian weight model is applied to blend the improved multi-SPE. 105 It has the ability to exert benefits from multi-SPE of higher performance and mitigate negative impacts from the ones with lower quality. It is expected to produce posterior blended results associated with their predictive uncertainties in the survey region.
The details of the two-stage blending algorithm are described in Sections 3.2 and 3.3, respectively. 110

Stage 1: Bias adjustment
A generalized regression function between gauge references, individual SPE, and terrain features is proposed in the 1 st stage.
Because the bias of SPE generally follows a skew Normal distribution, it is important to fit an appropriate function. In this paper, a Student's t distribution is assumed for modelling of gauge observations conditional on the individual SPE. It is written as: 115 | ~( , α + β * + γ * , σ ), α , β , γ ∈ , , σ ∈ + (1) where = { , α , β , γ , σ } are model parameter sets in order to adjust the i th SPE. (α + β * + γ * ) represents the sample mean and Z is the associated collection of covariates (e.g., topography). More specifically, the normalized elevation is used as a covariate in this experiment. These parameters are real numbers and , σ are positive. It should be noted that some other distributions (e.g., Lognormal, Normal) are also examined but there are no obvious improvements in terms of the bias-120 adjusted result compared to Student's t distribution for this test.
Based on the gauge observations and multi-SPE at the training sites, model parameters for each SPE could be estimated within a Bayesian analysis using the Markov Chain Monte Carlo (MCMC) technique (Gelman et al., 2013). Next, it is to calculate https://doi.org/10.5194/hess-2020-43 Preprint. Discussion started: 17 February 2020 c Author(s) 2020. CC BY 4.0 License. each of the bias-adjusted SPE at any new site ( ′ ) of the domain at the same period. The conditional distribution of bias-125 adjusted SPE at any new site is mathematically defined as: where the posterior distribution of ′ | from Eq. 2 can be simulated numerically based on the calculated parameter samples at the training sites using the MCMC samplings. We further assume N as the size of post-convergence MCMC samples. The 130 above process is repeated N times and produces a predictive posterior distribution at any new site ′ and time t.

Stage 2: Data merging
On the basis of Stage 1, this part is designed to blend the updated multi-SPE at each grid cell of the domain. With regard to the individual SPE, the median value of the posterior samples from Stage 1 is assumed as the new SPE. Here, we redefine the bias-adjusted multi-SPE as ′ ( = 1,2, … , ), respectively. 135 The formulas of blending the bias-adjusted multi-SPE are shown below: where B means the blended result; (i=1,2,…,p) stands for the relative weight of the i th SPE, respectively, and their values 140 range from 0 to 1; is the residual error, whose value is positive real number. Ideally, the blended multi-SPE, i.e., B, at the training site s and time t should be close to gauge references R(s, t). Thereby, the weight parameters, including ( = 1,2, . . ) and are able to be estimated at the training sites based on gauge observations and new multi-SPE within a Bayesian analysis using the MCMC technique.

145
As the weight parameters are successfully derived above, similar to Eq. 4, the blended result at any new sites at time t are calculated based on the retrieved new multi-SPE and corresponding optimal weights. Finally, we can obtain spatial patterns of blended multi-SPE and point-based rain gauge observations in terms of the median, standard deviation (SD) and associated confidence intervals (e.g., 5% and 95% quantiles) in regions of interest.

Results 150
To assess the performance of the proposed two-stage blending method, several statistical error indices including Root Mean Square Errors (RMSE), Normalized Mean Absolute Errors (NMAE), and the Pearson's Correlation Coefficients (CC) are used in this study. The specific formulas of these metrics can be found in Chen et al. (2019).

Bias adjustment of multi-SPE
Compared to the gauge references, the original multi-SPE including PERCDR, 3B42V7, CMORPH and IMERG show 155 significant biases at the independent validation sites over the NETP during the warm season of 2014 (Table 2). Their statistical error metrics range from 6.59-8.07 mm/d, 63.2-83.5%, and 0.40-0.57, in terms of RMSE, NMAE, and CC, respectively.
3B42V7 performs the worst with the highest RMSE and NMAE at 8.07 mm/d and 83.5%, and the lowest CC of 0.40. IMERG shows the best performance in terms of the lowest NMAE at 63.2% and highest CC at 0.57 among the four SPE. It seems that the satellite retrievals need to be further clarified with regard to the mainstream SPE in the NETP. 160 After the bias adjustment of each SPE, the updated multi-SPE show great improvement in data quality during this experiment ( Fig. 3). These changes result in better agreement of SPE with rain gauge measurements at the validated sites in the NETP.
Both RMSE and NMAE of the corrected multi-SPE are correspondingly decreasing in terms of 4.56~5.06 mm/d and 50.9~58.7%, respectively. As compared with the original multi-SPE, the updated ones decrease by 27~37.3% and 19.1~31.1% 165 with respect to RMSE and NMAE, respectively. Moreover, the CC index of four SPE vary from 0.42 to 0.57 after bias adjustment, which slightly increases as compared to the original SPE. It is also found that 3B42V7 improves the most in terms of its RMSE decreasing from 8.07 mm to 5.06 mm using the step 1's method. Basically, it implies that the proposed biasadjusted algorithm occurred in phase 1 is very effective for reducing systematic errors of four involved SPE in the warm season of 2014 in the NETP. 170

Blending multi-SPE and independent validation
To test the performance of the proposed two-step blending approach, the blended multi-SPE at the validation sites are further examined. As shown in Figure 3, the fusion result is closer to the ground reference in terms of RMSE, NMAE and CC, compared to the individual SPE. The RMSE and NMAE indices of the merging data decrease by 34.1~65.4% and 27.1~41.1%, respectively, compared to the individual SPE, while the CC index increases by 6.7~50.4%, accordingly (Table 2). Compared 175 to the bias-corrected multi-SPE, the performance of the blended data increases by 5.1~14.2%, 3.3~16.2%, and 5.9~47.8% in terms of RMSE, NMAE and CC, respectively. That is to say, the merged precipitation in the warm season of 2014 at the validated sites of NETP exhibit higher quality after merging the bias-corrected multi-SPE using the optimal relative weights shown in Figure 4a. The blended data have been effectively dropped towards the gauge references at the validation sites, which is evidenced from the scatterplots in terms of red dots in Figure 4b. These improvements prove the significant superiority of 180 the two-step blending method on reducing the systematic errors of the original multi-SPE and supplying higher daily precipitation in the warm season of 2014 over the NETP.
The best-performed merging result is due to the ensemble contributions of the bias-corrected multi-SPE. The optimally relative weights are 0.019, 0.052, 0.289, and 0.640 with respect to PERCDR, 3B42V7, CMORPH and IMERG, respectively. It shows that the bias-adjusted IMERG and PERCDR contributes the highest and lowest weights, respectively, in this blending process, and the contributions of the other SPE rank between IMERG and PERCDR accordingly. As the bias-adjusted IMERG shows the best performance among all the individuals, it proves that higher informative SPE shows more positive impact on the blended result under this two-step fusion approach. It is further concluded that this blending method has its ability to exert benefits from multi-SPE in terms of higher performance and mitigate poor impacts from the ones with lower quality. 190 Figure 5 shows the evaluation scores of RMSE, NMAE, and CC for the original multi-SPE and blended estimates at the validation sites with 10 random split of the gauge stations. For each test, 7 gauge sites are randomly selected from the 34 sites and used for model verification, and the remaining 27 gauge sites are used for model fitting.

195
As for the blended result, it performs similar skill scores at the independent sites for the 10 random tests. It also shows better performance in terms of RMSE, NMAE and CC, which are 4.34~5.57 mm/h, 49.2~61.7%, and 0.49~0.67, respectively, compared to the raw multi-SPE at each random experiment. Statistically, the averaged values of RMSE, NMAE and CC for the blended data at the validation sites are 4.98 mm/h, 54.9% and 0.60, respectively, while the four SPE range from 6.21~7.72 mm/d, 66.3~78.9%, and 0.38~0.57, respectively (Table 3). The averaged improvement ratios of RMSE for the blended data 200 are 35.1%, 33.7%, 19.6% and 32.1% compared to PERCDR, 3B42V7, CMORPH and IMERG, respectively (Table 4). Similar performance is seen from the NMAE scores, where their mean improvement ratios are 29.8%, 30.1%, 17.0% and 21.3%, respectively for the four SPE. As seen in Figure 6, the blended result shows a significant improvement over the original multi-SPE in the survey area, especially for PERCDR and 3B42V7. It is concluded that the biases of multi-SPE could be significantly reduced as the impacts of bias functions are well considered in the proposed two-step blending algorithm. 205

Model application in spatial domain
It is important to explore the Bayesian ensembles at any unknown sites in the study domain. Therefore, the two-step blending approach is applied on the four spatially distributed SPE (i.e., PERCDR, 3B42V7, CMORPH, IMERG) from Figures 7a to 7d to obtain blended estimates of daily mean precipitation in the warm season of 2014 for the whole study domain (not only at the gauge stations). Spatial maps of the merging predictions and the associated predictive uncertainties including SD, 5% and 210 95% quantiles are shown along with the gauge observations ( Figure 8).
All of the multi-SPE have the ability in capturing the spatial patterns of daily mean precipitation during the warm season, but might fail in the representation of precipitation amounts in the NETP (Figure 7), likely because of the satellite retrieval biases in complex terrain and limited ground validation network. The spatial patterns of blended multi-SPE are shown in Figure 8a, 215 which has a similar spatial pattern with a higher precipitation amounts in the southwest compared with the individual SPE.
Based on the proposed blending approach, the fusion estimate performs a higher adjustment compared to the original SPE. It https://doi.org/10.5194/hess-2020-43 Preprint. Discussion started: 17 February 2020 c Author(s) 2020. CC BY 4.0 License.
is expected to show a better performance in terms of magnitude and distribution in the study area. Moreover, the predictive uncertainties are displayed from Figures 8b to 8d so as to illustrate the blending variance. In total, this study confirms the priority of exploring daily precipitation in spatial at higher accuracy and quantifying the associated uncertainty in the study 220 domain.

Model performance on a heavy rainfall case
Accurate precipitation on extreme weather is very important for flood hazard mitigation. Here, we investigate the utility of this two-step blending approach on a heavy rainfall event of Sep 22, 2014 in the NETP (Figure 9a). The relative weights of PERCDR, 3B42V7, CMORPH, and IMERG for the blended data are 0.464, 0.123, 0.112 and 0.301, respectively, during this 225 particular heavy rainfall event (Figure 9b). and 32.5~53.9%, respectively, while the CC index correspondingly increases by 3.4~23.9% on this heavy rainfall case compared to the individuals. The two-step blending approach has a great influence on the performance of SPE in terms of rainfall extremes in the warm season of the NETP.

235
The blended model performance is further explored at three gauge sites (i.e., IDs 56171, 56152, 56182) with the top three rainfall records in terms of daily rainfall amounts on Sep 22, 2014 ( Figure 9a). Figure 10 shows the Probabilistic Density Function (PDF) curves of blended samples of the above three sites during this event. It aims to demonstrate the blended performance on quantifying the predictive uncertainty on rainfall extremes in the survey region. At ID 56171, the estimated rainfall derived from PERCDR, 3B42V7, CMORPH and IMERG are 19.8 mm, 35.3 mm, 26 mm, and 40.2 mm, respectively. 240 3B42V7 and IMERG shows an overestimation, while PERCDR and CMORPH underperform the daily rainfall at the corresponding pixel (Figure 10a). Based on the two-step blending methods, the median and SD values of the merging estimates are 24.1 mm/d, and 4.4 mm/d, respectively. At IDs 56152 and 56182, the median/SD values of blended multi-SPE are 24.3/5.0 mm/d and 21.9/4.5 mm/d, respectively. As learned from Figures 10b and 10c, the medians of the blended result at IDs 56152 and 56182 are very close to gauge observations in terms of 24.6 mm, and 23.1 mm, respectively. It shows that this two-step 245 fusion algorithm provides a posterior inference and quantifies its predictive uncertainties on the heavy rainfall events. It is confirmed that the proposed two-step blending method is able to improve the daily precipitation amounts even during rainfall extremes in the NETP. https://doi.org/10.5194/hess-2020-43 Preprint. Discussion started: 17 February 2020 c Author(s) 2020. CC BY 4.0 License.

Discussions
This study proposes a flexible two-step blending algorithm for merging multi-satellite and rain gauge precipitation data at 250 daily scale, aiming to provide a more accurate precipitation datasets in regions with complex terrain. In spite of superior performance of the merging results, some issues still need to clarify: Because of limited knowledge on the influences of complex terrain and local climate on the rainfall patterns in the study area, the elevation feature is merely considered in the first stage. It is noted that deep convective systems occurring near mountainous 255 have an effect on the precipitation cloud (Houze, 2012), which should be considered in future to resolve the impact of orography on the adjustment of individual SPE. In addition, as calculating the blended result in any new sites, the model parameters derived from the training sites are assumed to be applicable in the domain. Since the domain of this study is not very large and we have a relatively dense rain gauge network, the current assumption seems to be acceptable according to the performance of the blended data. However, it is noted that, if extended to the TP or the global scale, the extension of model 260 parameters should be carefully considered. For instance, there are few gauges installed in the western and central TP (Ma et al., 2015), it might be a potential risk to directly apply this fusion algorithm for these regions.
The goal of this study is not to model rainfall process in a target domain, but to propose an idea to extract valuable information from available multi-satellite sources and provide more reliable precipitation in high-cold regions with complex topography. 265 Considering its spatiotemporal differences and the existence of many zero-value records, rainfall is extremely difficult to observe and predict (Yong et al., 2015;Bartsotas et al., 2018). With regard to the probability of rainfall occurrence, a zeroinflated model, which is coherent with the empirical distribution of rainfall data, is expected to further improve the two-step fusion algorithm. In addition, hourly or even instantaneous precipitation intensity is extremely vital for flood prediction, which should be considered when extending this framework. 270

Summary and prospects
This study proposes a two-step blending algorithm for multi-SPE data fusion. A preliminary experiment is conducted over the NETP using four mainstream SPE to demonstrate the performance of this fusion approach, including PERCDR, 3B42V7, CMORPH, and IMERG. Primary conclusions are summarized below:

275
(1) This blending algorithm is designed with high flexibility, which is capable of involving a group of multi-SPE that may follow different probabilistic distributions conditional on ground references. In addition, it provides a convenient way to compare the merging performance and further quantify the associated fusion uncertainties. https://doi.org/10.5194/hess-2020-43 Preprint. Discussion started: 17 February 2020 c Author(s) 2020. CC BY 4.0 License.
(2) The case studies show that the merged precipitation has higher skill scores compared with the individual SPE at the 280 independent validation sites. The 10 random verification tests further confirms the superiority of the proposed two-step blending algorithm. The performance of this fusion algorithm is further demonstrated for the heavy rainfall event.
(3) The experiment proves that this algorithm can allocate the contribution of individual SPE on the blended prediction because it is capable of ingesting useful information from uneven individuals and alleviating potential negative impacts from the poorly 285 performing members.
Overall, this work provides an opportunity for blending multi-SPE products. It is expected to promote the development of higher quality precipitation product in the remotely high-cold regions with widely available satellite precipitation retrievals. The exploration of model reliability of this two-step blending algorithm at larger scale (e.g., the TP) and higher temporal 290 resolution (e.g., hourly) should be pursued in a future study.

Author contributions
YM, XS and YH conceive the idea; XS, YH and YZ provide the project and financial supports. YM conduce the detailed analysis; HC, XS and YZ give comments on the analysis; all the authors contribute to the writing and revisions.