Simulation of rainfall fields conditioned on rain gauge observations and radar estimates using random mixing

The accuracy of spatial precipitation estimates with the relatively high temporospatial resolution is of vital importance in various fields of research and practice. Yet the intricate variability and the intermittent nature of precipitation make it very difficult to obtain accurate spatial precipitation estimates. Radar and rain gauge are two complementary sources of precipitation information: the former is inaccurate in general but is a valid indicator for the spatial pattern of the rainfall field; the latter is relatively accurate but lack spatial coverage. Considering the pros and cons of the two sources of precipitation 5 information, a number of radar-gauge merging techniques have been developed to obtain spatial precipitation estimates over the past years. Conditional simulation has great potential to be used in spatial precipitation estimation. Unlike the commonly used interpolation methods, the results from the conditional simulation is a range of possible estimates due to its Monte Carlo framework. Yet an obstacle that hampers the application of conditional simulation in spatial precipitation estimation is how to obtain the marginal distribution function of the rainfall field with accuracy. In this work, we propose a method to obtain 10 the marginal distribution function of the rainfall field based on rain gauge observations and radar estimates. The conditional simulation method, random mixing (RM), is used to simulate rainfall fields. The properties of the results from the proposed method are elaborated through the comparison with the results from other methods: ordinary Kriging, Kriging with external drift, and conditional merging. Finally, the sensitivity of the proposed method towards the two factors density of rain gauges and random error in radar estimates is analyzed. 15

the field pattern indicated by radar, etc. There exists a variety of methods that simulate Gaussian random fields with given covariance functions, such as turning bands simulation, LU-decomposition-based methods, sequential Gaussian simulation, etc. (Mantoglou and Wilson, 1982;Deutsch and Journel, 1998;Chilès and Delfiner, 2000;Lantuéjoul, 2002). Yet the studies on conditional simulation of rainfall fields are rare. One of the major obstacles that hamper the application of conditional 60 simulation in spatial precipitation estimation is how to obtain the marginal distribution function of the rainfall field with accuracy. The distribution function is essential to convert the simulated Gaussian fields to rainfall fields of interest, as the simulation is normally embedded in Gaussian space. In view of this, we propose a method to obtain the distribution function of the rainfall field based on rain gauge observations and radar estimates.
In this paper, the method employed to simulate rainfall fields is random mixing (RM), which was first proposed by Bár-65 dossy and Hörning (2016) to solve inverse modeling problems in the field of groundwater flow and transport. RM inherits the concept of the gradual deformation (GD) approach described in (Hu, 2000) that a conditional field of interest is obtained as a linear combination of unconditional random fields. Yet different from GD, RM is targeted and flexible to incorporate different kinds of constraints: linear or nonlinear, and the utilization of spatial copulas in the description of the underlying dependence structure enables the separate treatment of dependence structure and marginal distribution. The properties of the results from 70 the proposed method are elaborated through the comparison with the results from other methods: ordinary Kriging, Kriging with external drift, and conditional merging. Finally, the sensitivity of the proposed method towards the two factors -density of rain gauges and random error in radar estimates -is analyzed. This paper is divided into 5 parts. After the introduction, the methodology of RM is elaborated in Sect. 2. Sect. 3 describes the data used in this work. Sect. 4 compares the results from the proposed method with those from other methods and analyzes 75 the sensitivity of the proposed method. Sect. 5 ends this paper with conclusions.
2 Methodology 2.1 cdf of the rainfall field As has been described in the introduction, one of the major obstacles that hamper the application of conditional simulation in spatial precipitation estimation is how to obtain the cumulative distribution function of the rainfall field (abbr. cdf-rainfall) with 80 accuracy. As the simulation is normally embedded in Gaussian space, the direct output of the simulation is a Gaussian random field. Thus the cdf-rainfall is needed to convert the simulated Gaussian field to the rainfall field of interest. In this subsection, an algorithm to compute the cdf-rainfall from a set of rain gauge observations and radar estimates is introduced. The algorithm is specified as follows.
1. Estimate the spatial intermittency of the rainfall field as the ratio of the number of dry pixels to the total number of the 85 pixels in the radar estimates R r . We denote the estimated spatial intermittency as u 0 .
2. Uniform the radar estimates R r to a quantile map. Due to the intermittent nature of precipitation, all the dry pixels in R r are uniformed to u 0 . In other words, u 0 is the smallest quantile in the uniformed radar grid.
3. Take the following quality control steps for the two datasets: rain gauge observations r k for k = 1, . . . , K, and the quantiles collocated with the rain gauges in the uniformed radar grid, denoted as u k for k = 1, . . . , K. 90 1) First, check the zeros in r k , if the corresponding u k > u 0 , set u k = u 0 ; check the sampled dry pixels in u k , if the corresponding r k > 0, set r k = 0.
This quality control step is applied to maintain the consistency of the two datasets at zeros. It is assumed that the radar estimates can represent the pattern of the rainfall field, yet various factors do exist that can reduce the representativeness of the radar-indicated pattern. It can happen that a zero gauge observation is collocated with a 95 quantile that is slightly larger than u 0 , i.e., a wet pixel in the uniformed radar grid. Similarly, a dry pixel in the uniformed radar grid can correspond to a gauge observation that is slightly larger than 0 mm.
Due to the same reason as in 1), it can happen that the Spearman's rank correlation of the two datasets is less than 1, which results in abnormal behavior of the cdf. Thus the two datasets are sorted to maintain the non-decreasing 100 nature of the cdf. The empirical cdf(s) obtained by linearly joining (r k , u k ) after the two quality control steps, with one highlighted by the black dashed line. Red: the true cdf-rainfall. Note that the empirical cdf(s) shown here are computed from the synthetic data as described in Sect. 3. 4. After the quality control steps, a set of points (r k , u k ) are obtained. It is assumed that these points distribute unbiasedly around the true cdf, see the empirical cdf obtained by linearly joining the points (r k , u k ) in Fig. 1. The cdf-rainfall is obtained by fitting a theoretical cdf model under the condition that the fitted cdf starts to be positive at the point (0, u 0 ).
We denote the estimated cdf-rainfall as G(·) hereafter.

Prepare the constraints
The Gaussian embedding of the simulation means the constraints should also be defined in the marginal of the standard normal distribution (normalized Gaussian distribution). In the preparation of the constraints, the estimated cdf-rainfall G(·) is used extensively. In specific, we consider the following three constraints.
1) The equality constraints at rain gauge locations, when defined in the standard normal marginal, means the simulated 110 values at the rain gauge locations should equal the values mapped from the rain gauge observations r k : (1) where Z is the simulated Gaussian field, x k denotes the gauge location, and Φ(·) is the cdf of the standard normal distribution.
2) The simulated Gaussian field Z should preserve a given correlation function, which is derived from the radar estimates 115 in the standard normal marginal, i.e., A problem arises here that from the radar estimates R r , one can only obtain a truncated Gaussian field Z r , as all the dry pixels in R r are converted to z 0 = Φ −1 (u 0 ), namely z 0 is the smallest value in Z r . And due to the truncation, one can only obtain a correlation function with a reduced sill. Fig. 2 displays the empirical variograms evaluated from 1000 truncated 120 Gaussian fields in comparison with the true variogram used to generate the corresponding continuous Gaussian fields.
From the figure, it can be seen that the empirical and the true variograms have very similar patterns, which indicates the true correlation function can be approached given a priori knowledge that the variance of the simulated field is 1. We denote the estimated covariance function as Γ hereafter. 3) The simulated field should reproduce the field pattern indicated by radar to a certain degree. Obviously, this is an op-125 timization problem, and the goal is to maximize the Pearson's correlation coefficient of the simulated field Z and the reference field Z r : where Z is the simulated Gaussian field, Z r is obtained from the radar estimates in Eqn (2), and ρ(·) is the formula of the Pearson's correlation coefficient. The same problem arises: Z is continuous, while Z r is truncated. To evaluate the 130 objective function, surely one could truncate Z with z 0 (the smallest value in Z r ). Yet one could also use Z directly as shown in Eqn (3). The difference is minor, as a high correlation between Z r and Z means a high correlation between Z r and Z-truncated. From a parsimonious point of view, we use Z directly in the evaluation of the objective function.

Random mixing
The task is to estimate the true rainfall field given the set of rain gauge observations and the radar estimates. In terms of 135 conditional simulation, this means to simulate a Gaussian field Z that fulfills all the constraints described in Sect. 2.2, and then convert Z to the rainfall field of interest, i.e., to obtain an estimate for the true rainfall field.
We use random mixing (RM) to fulfill the task. RM utilizes the concept described in Hu (2000) that the conditional field of interest is obtained as a linear combination of a number of unconditional random fields Y i : where Y i denote a number of independent Gaussian random fields with identical statistical properties, i.e., the marginal distribution all follow the standard normal distribution and with the same correlation function. If in addition, the L-2 norm of the then the resultant conditional field Z is statistically identical to Y i , see (Bárdossy and Hörning, 2016) for the demonstration.

145
There exists a variety of methods to simulate a Gaussian random field with the given correlation function, and we have used an efficient method, the fast Fourier transformation for regular grids (Wood and Chan, 1994;Wood, 1995;Ravalec et al., 2000).
It is necessary to differentiate Hu's method from the proposed one before specifying the procedure: Hu's method (Hu, 2000;Hu et al., 2001) incorporates linear data using conditional Kriging, and the method is extended to combine dependent conditional fields in Hu (2002), while RM incorporates linear / nonlinear constraints under the unified concept of randomly 150 mixing unconditional random fields, which is specified as follows.
1. The conditional Gaussian field Z that has a prospect to fulfill all the constraints is obtained as From the equation, it can be seen that Z is constructed by two parts: (a) The first part is made up of a series of statistically identical unconditional random fields Y i with the correlation function Γ, which is estimated from the radar estimates R r . In total, we have N unknowns: α i for i = 1, . . . , N . The role of this part is to fulfill the equality constraints at the rain gauge locations. Thus the following K linear constraints are subject See Eqn.
(1) for the definitions of x k , r k , G(·) and Φ −1 (·). With K equation, N unknowns and N > K, it forms an under-determined system. Multiple techniques exist to solve the under-determined system. Specifically, we found 160 the set of weights with the least norm, i.e., minimized N i=1 α 2 i , using the singular value decomposition technique. Apart from the K linear constraints, the constraint N i=1 α 2 i < 1 is subject in addition to ensure that the weight of the second part 1 − α 2 i is positive. This extra constraint can be satisfied further by increasing the number of unknowns N .
is made up of two independent, statistically identical conditional random fields: H and H , which are referred to 165 as H-field in the following.
The H-field is also obtained as a linear combination of unconditional random fields Y i (statistically identical to The H-field is special because of the zeros at the rain gauge locations: Similarly, to ensure that the H-field is statistically identical to Y (or Y ), the following constraint is subject.
Due to the special zeros, the addition of H-fields does not change the values at the rain gauge locations, i.e., the equalities in Eqn. (7) can be rewritten as The remaining question is how to obtain such H-fields. The H-field is easy to obtain taking advantage of the 'zeros'. Still an under-determined system is created with M unknowns, K equations (Eqn. 8) and M > K. The Gaussian field Z obtained from Eqn. (6) fulfills the equality constraints at the rain gauge locations, and reproduces the correlation function Γ. The correlation function is reproduced because the L-2 norm of the weights of all the fields 180 underlying Z satisfies It should be notified that one H-field is already enough to do the same job. One could obtain such a Z by The aim of using two H-fields instead of one is to introduce more freedom, such that the third constraint can be met additionally. The relative weights of the two parts ( α 2 i : 1 − α 2 i ) do matter. It is desired that the second part weighs more, such that it is favorable for the solution of the optimization problem defined in the next step. In view of 185 this, when solving the under-determined system in (a), the set of weights α 2 i 1 is found.
2. Due to the freedom introduced by the extra H-field, the Gaussian field Z obtained from Eqn. (6) is a function of θ, and so does the objective function evaluated from Z, which actually defines an 1D optimization problem w.r.t. θ: with θ ∈ (−π, π]. See the definitions of Z r and ρ(·) in Eqns. (2) and (3). The task is to find the θ that produces the 190 maximum objective function value. Various methods exist to solve the 1D unconstrained optimization problem, e.g., the George Dantzig's simplex algorithm, the BFGS method, etc. In this work, we simply used the trial-and-error method, and a coarse search followed by a fine search is what has been implemented to accelerate the solving process. The solution to the 1D optimization problem is denoted as θ * .
3. If Z(θ * ) meets the stopping criterion of the optimization, continue with Step 4. Otherwise, update H and H as the 195 following, and go back to Step 2.
H ← A newly generated H-field As always in optimization, there exist multiple choices of the stopping criteria, such as the number of iterations, the pre-set limit of the objective function, the rate of decrease of the objective function, and so forth. Specifically, we have 200 adopted the number of continuous iterations without improvement as the stopping criterion.
4. Finally, an estimate of the true rainfall field is obtained as where G −1 (·) is the inverse of the estimated cdf-rainfall, and Φ(·) is the cdf of the standard normal distribution.

Generate the 'true' rainfall fields
An artificial experiment was carried out to test the efficiency of the proposed method at estimating the true rainfall field. Since the true rainfall field is never known in reality, a sequence of 1000 independent rainfall fields with the grid size 80×80 was generated. Each pixel is representative of a 1×1 km 2 area, and all the 1000 fields were generated with identical properties, i.e., identical spatial intermittency, cumulative distribution function (cdf), and correlation function. In the following, we specify 210 how the 'true' rainfall fields were generated. 1. Generate 1000 Gaussian random fields Z T with a given correlation function. Fig. 3 (a) displays the exponential correlation function used in the generation of Z T . Note that the subscript " T " is used throughout this paper to denote the true Gaussian/rainfall fields, or further the true cdf of the rainfall field. Similarly as before, we have used the fast Fourier transformation for regular grids to generate Z T . 215 2. Generate a cdf for the rainfall field. We have used the log-normal distribution as the model of the cdf-rainfall, as the distribution is shown to be effective in describing the marginal distribution of rainfall rates over an area, or the objective of this paper: short time rainfall over an area (Bell, 1987;Crane and Robert, 1990;Pegram et al., 2001). Fig. 3 (b) shows the cdf-rainfall used in the generation of the 1000 true rainfall fields.
3. Convert the Gaussian random fields Z T to rainfall fields using the normal-score transformation method: where R T denotes the true rainfall field, Φ(·) is the cdf of the standard normal distribution, and G −1 T (·) is the inverse of the cdf-rainfall generated in Step 2. Note that a quantile value u 0 is used to maintain the spatial intermittency, as labeled in Fig. 3 (b). Namely, all the pixel values that are smaller than

Generate the radar estimates
The radar estimates were derived from the 'true' rainfall fields. Specifically, the Gaussian field Z T (Sect.3.1 step 1) was used.
To mimic two commonly seen errors in radar estimates (random and systematic error), the following procedure was applied.
1) Introduce a random error.
The proposed method assumes that the radar estimates measured aloft can represent the pattern of the rainfall field on 230 the ground. Yet some factors do exist that can reduce the representativeness of the radar estimates for the rainfall pattern on the ground, such as evaporation, complex terrain effects and anthropogenic influences. A random error is therefore introduced to mimic this kind of error. We utilize the concept of random mixing to introduce the random error. The Gaussian field with the random error introduced is obtained by mixing two fields: where Z T is the 'true' Gaussian field, and Z e is an independently generated Gaussian random field with the same statistical properties as Z T . The condition w 2 1 +w 2 2 = 1 is subject in addition to ensure that the resultant Z r is statistically identical Z T (or Z e ). The ratio of the two weights w1 w2 is used to measure the significance of the random error, in analogy with a commonly seen parameter Signal-Noise Ratio (SNR). The proposed method was tested on three levels of significance: SNR = 3, 5 and 10. Accordingly, the Gaussian fields with the random error introduced were obtained as: 2) Convert the random error-introduced Gaussian field Z r to rainfall field: See Eqn. (16) for the definitions of G −1 T (·) and Φ(·). It should be notified that the introduction of random error in Step 1 differs the radar estimates from the true rainfall field in terms of the field pattern, yet the changes in the statistical properties are tiny. See the statistical properties evaluated from the 'true' and from the random error-introduced rainfall 240 fields in Fig. 4.

3) Apply a nonlinear transformation
The purpose of applying the nonlinear transformation is to mimic the systematic error in radar estimates. In real world application, the accurate Z-R relationship is difficult to achieve. The omnipresent Z-R relationship Z = 200R 1.6 given by Marshall and Palmer (1948) is widely used in radar hydrology to convert radar reflectivity to rain rate. Long lists of 245 vastly different Z-R relationships have been derived for different areas in different conditions over the past years (Battan, 1973;Uijlenhoet, 2001;Fabry, 2015). Yet an important point to note is that the relationship is changing in space and time, and in general there is no means to achieve the accurate one in time. In other words, most of the time an inaccurate Z-R relationship is used to obtain the radar estimates, which results in a nonlinear departure of the radar estimates from the true rainfall field. In view of this, we have used a power function to mimic this sort of nonlinear system error: where the operator '←' means updating. The choice of the two parameters -factor 0.87 and exponent 0.83 -is quite arbitrary. We have modeled a case when radar underestimates the precipitation. Surely one could model other cases.
However, the proposed method is immune to this sort of error, as the radar estimates are uniformed to indicate the spatial pattern of the rainfall field. Nevertheless, the nonlinear transformation can result in biases for methods that are 255 more dependent on the accuracy of the radar estimates. Unlike the random error introduced in Step 1, the nonlinear transformation changes the statistical properties of the original rainfall field, except for the spatial intermittency.

Generate the rain gauge observations
The rain gauge observations were sampled from the true rainfall field R T at the 'rain gauge' locations. As in general, the constraint from the individual gauge observation is local, i.e., the closer an ungaged location from the rain gauge, the less 260 uncertain the corresponding estimate is. Thus to achieve an estimate with less uncertainty, it is favorable to have a denser network of rain gauges that is uniformly distributed in the domain of interest. Yet it is always an interesting question to ask how dense should a rain gauge network be to achieve a satisfying accuracy. We try to answer this question by testing the proposed method on 3 layouts of rain gauges: 5×5, 6×6, and 7×7 gauges that are uniformly distributed in the domain of interest. This gives an approximate coverage of one rain gauge for every 256, 178, and 131 km 2 , respectively.

Results Comparison
Random mixing (RM) was used to estimate the 1000 true rainfall fields based on the corresponding radar estimates and rain gauge observations. To elaborate the characteristics of the results from RM, the results are compared with those from the methods: ordinary Kriging, Kriging with external drift (KED) and conditional merging (Sinclair and Pegram, 2005). Surely 270 it is not possible to display all the results for the 1000 true rainfall fields in details. We pay attention to the results for one rainfall field that is randomly drawn from the 1000. In  1. The rainfall field obtained from ordinary Kriging misses both the pattern and the extreme of the true rainfall field, as the method considers the rain gauge observations only, while the radar estimates do not contribute to the final estimates.
2. The estimates from KED and conditional merging capture the pattern of the true rainfall field, yet they both miss the extreme of the true field. KED outperforms conditional merging in this case, as KED takes the radar estimates as the 280 external drift, and tries to capture the linear relationship between the rain gauge observations and the radar estimates at the gauge locations. Yet the method considers only the linear relationship, and the scenario is radar underestimates the rainfall field nonlinearly, which to some extent explains why the estimates from KED correct the extreme of the radar estimates a little bit, but not perfectly. In this specific case, the extremes of the true and the KED-estimated rainfall fields are 46.24 mm and 32.55 mm, respectively.

285
Conditional merging applies the procedure of ordinary Kriging twice for the rain gauge observations and the radar estimates at the gauge locations. The estimated rainfall field is obtained aŝ where K G , K R denote the Kriged fields from the rain gauge observations and the radar estimates at the gauge locations, respectively; R r denotes the radar estimates. Compared to KED, conditional merging is more dependent on the accuracy 290 of the radar estimates, as the method assumes that the radar data provides an estimate of the actual Kriging error.
As KED outperforms ordinary Kriging and conditional merging in this context, we compare the results from KED with those from RM in the following subsections.
3. The rainfall field obtained from RM captures the extreme of the true rainfall field, where the extremes of the true and the estimated rainfall fields are 46.24 mm and 47.57 mm, respectively. The pattern of the true field is captured to a certain 295 degree. The results from RM are further analyzed, and we summarize the properties of the results as follows. (3-1) The proposed method is capable of capturing the extreme of the true field under the condition that the estimated cdf of the rainfall field is accurate, see Fig. 6 for the estimated cdf by the proposed algorithm in comparison with the true cdf. As the cdf is estimated from the rain gauge observations and radar estimates, it is favorable to have a relatively dense network or rain gauges, such that it is more likely to obtain a set of quantiles (collocated with 300 the rain gauges) that covers the entire range [u 0 , 1.0), and correspondingly it is more likely to have a set of gauge observations that is representative for the entire rainfall field. On the other hand, it is also favorable when the radar estimates are with less random error, such that the sampled quantiles are more accurate, i.e., the empirical cdf bumps less around the true cdf as shown in Fig. 6.   To show the estimation uncertainty at different pixels, the box and whisker plots for 9 selected pixels are displayed 325 in Fig. 8. The locations and IDs of the 9 pixels are given in Fig. 7 (b). It can be seen from Fig. 8 that the true values of the 9 pixels all fall in the central boxes, i.e., within the interquartile range (IQR). Among the 9 pixels, Pixels 3, 4 and 8 are collocated with the rain gauges, and all the estimates at the 3 pixels equal the true values, which demonstrates the fulfillment of the equality constraints at the rain gauge locations directly. Pixels 1 and 2 are distant from the neighboring rain gauges, yet the two pixels are sited in regions where less rainfall is expected. Therefore, 330 the estimates at the 2 pixels present relatively small variance. Pixels 5, 7 and 9 are also distant from the neighboring rain gauges, yet the 3 pixels are sited in regions where more rainfall is expected. Thus the corresponding estimates present higher variance. Though Pixel 6 is close to the neighboring rain gauge, it is sited in a region where much rainfall is expected. Thus the estimates there show relatively high variance.

335
In this subsection, we focus on two statistics: the mean and the extreme of the estimated rainfall field. The former is noteworthy when the estimated rainfall field is used in studies where the water balance of a region is important; the latter is of great concern when the extreme of the region is of interest, for example, storm water management, flood risk assessment, etc. Spatial intermittency is not the focus at the moment, as the difference between a 0 mm pixel and a 0.1 mm pixel is tiny, and it makes no sense to differentiate them strictly with dry-or wet-labels.

Field extreme
The extreme of the rainfall field obtained from RM / KED is compared with the true extreme in order to evaluate the accuracy of the method in the estimation of the field extreme. As literally one could obtain an infinite number of realizations for each of the 1000 true rainfall fields by RM, the following procedure is applied to increase the representativeness of the results. For each true rainfall field, 20 realizations are produced, and for each realization, the difference between the simulated and the 345 true extremes is computed, i.e., the error w.r.t. the estimated field extreme, and finally the median of these errors is used as the representative of the simulation cycle. Note that we term it a simulation cycle when all the estimates are for the same true rainfall field based on the same radar and gauge data.
The accuracy of the estimated field extreme is strongly correlated with the accuracy of the estimated cdf-rainfall. As the cdf is estimated from gauge observations and radar estimates, it is reasonable to assume that (1) the more rain gauges present in the 350 domain, and (2) the less random error in the radar estimates, the more accurate the estimated cdf is, and accordingly the more accurate the estimated field extreme is. The former actually increases the size of the sample, such that the representativeness of the sample is increased; the latter increases the quality of the sample, as less random error in the radar estimates means the sampled radar quantiles are more accurate. To analyze the influence of the two factors, the 1000 true rainfall fields were estimated under different scenarios: radar estimates with different random error (SNR = 3, 5 and 10), and rain gauges of 355 different layouts (G25, G36 and G49, abbr. 5×5, 6×6 and 7×7 rain gauges, respectively). to reduce the size of the error and to decrease the estimation variance (to shrink the scattering of the histogram).

365
The histogram shown above can be summarized by the two statistics: mean error (ME) and interquartile range of the error (IQR, the range between the quantiles 0.75 and 0.25). Table 1 shows the two statistics evaluated from the results of all the 9 scenarios. It can be seen from the upper part of the table that slight overestimation exists in the results from RM, and relatively significant underestimation exists in the estimates from KED. For both methods, increasing the number of rain gauges, and reducing the random error in the radar estimates both help in reducing the ME and shrinking the IQR. Yet for the method 370 RM, it is not necessarily the 'more' the better. For example the best performances in terms of the ME and IQR both present in the scenario (SNR=10, G36). The scenario (SNR=10, G49), which is assumed to perform the best, is only ranked the second, which might be triggered by the over-fitting problem in the estimation of the cdf-rainfall: when the radar estimates are relatively accurate, a certain number of rain gauges is already enough to sample sufficient information of the rainfall field; increasing the number of rain gauges further can lead to the over-fitting problem due to the surplus of information. Table 1. The mean error (ME) and the interquartile range (IQR) of the error w.r.t. the estimated field extremes for the 1000 true rainfall fields by RM (the three columns on the left), and by KED (the three columns on the right). The column names 3, 5 and 10 denote the SNRs of the radar estimates, and the row names G25, G36 and G49 denote the layouts of rain gauges. The best performances in terms of the ME and IQR for both methods are printed in bold and italic.

Field mean
The focus at the moment is to compare the estimated mean of the rainfall field by RM / KED with the true mean in order to evaluate the accuracy of the method in terms of the mean-statistic. Still for the method RM, one result should be used to represent the infinite number of possible results for a single simulation cycle. The similar procedure is applied as in the previous subsection: 20 realizations are produced for each true rainfall field, and the error w.r.t. the estimated mean of the field 380 is computed for each realization, and finally the mean of the 20 errors is used as the representative of the simulation cycle. The procedure is applied for all the 1000 simulation cycles. Similarly as before, the influence of the two factors -layout of rain gauges and random error in radar estimates -is analyzed by estimating the true rainfall fields under different scenarios. Fig. 10 shows the histograms of the errors w.r.t. the estimated field means for the 1000 true rainfall fields obtained from RM on Panels (a, b), and from KED on Panels (c, d). It can be 385 seen from the figure that the two methods have different sensitivity towards the two factors in discussion: KED seems to be more sensitive compared to RM. Further, slight positive biases can be observed in the estimates from KED for all the displayed scenarios. Among the four sub-figures, (a) and (c) show the influence of the layouts of rain gauges for RM and KED, respectively, and (b) and (d) show the influence of the random error in the radar estimates. For the method KED, the influence of the two factors is ambiguous: one could say increasing the number of rain gauges, and reducing the random error in the 390 radar estimates help in reducing the estimation variance, yet one can hardly say it helps in reducing the size of the error. And for the method RM, the influence of the two factors is not obvious, and the biases in the estimates are not obvious either.
We still use the two statistics -ME and IQR -to generalize each histogram, and the results of all the scenarios are shown in Table 2. It can be seen from the upper part of the table that RM outperforms KED in terms of the ME, as the MEs of all the scenarios from RM are smaller than the best performance of KED. It is worth considering that from the previous analysis,

395
KED is expected to underestimate the extreme of the rainfall field, yet the results shown here indicate that KED overestimates the mean of the rainfall field a little bit. Thus it is very likely that KED overestimates the lower quantiles of the rainfall field.
Further, it is demonstrated again by the table that the accuracy of RM in terms of the estimated field mean is not very sensitive towards the layout of rain gauges and the random error in the radar estimates. One can barely see any tendency, and even with the most unfavorable scenario (G25 and SNR=3), the ME is small.

400
It can be seen from the lower part of the table that the IQRs of the errors from RM are generally larger than that from KED, which in a more general sense reveals the distinction between a simulation method and an interpolation method. A simulation method preserves the variability at the cost of larger estimation variance, while an interpolation method normally has smaller estimation variance yet at the cost of losing the variability. Table 2. The mean error (ME) and the interquartile range (IQR) of the error w.r.t. the estimated field means for the 1000 true rainfall fields by RM (the three columns on the left), and by KED (the three columns on the right). The column names 3, 5 and 10 denote the SNRs of the radar estimates, and the row names G25, G36 and G49 denote the layouts of rain gauges. The best performances in terms of the ME and IQR for both methods are printed in bold and italic.

405
The goal of this paper is to simulate rainfall fields conditioned on radar and rain gauge data. To achieve this goal, an algorithm to obtain the cumulative distribution function of the rainfall field (cdf-rainfall) from radar and rain gauge data is proposed, and the conditional simulation method random mixing (RM) is utilized. To test the efficiency of the proposed method, an artificial experiment has been made. The properties of the results from the proposed method are elaborated by the comparison with the results from other methods: ordinary Kriging, Kriging with external drift (KED), and conditional merging. The properties are 410 summarized as follows. (1) The proposed method incorporates the spatial pattern indicated by radar, namely the uniformed radar estimates. The results from the proposed method are immune to the systematic error in the radar estimates, i.e., the error induced due to the uncertainty in the Z-R relationship. Due to this property, the proposed method outperforms KED and conditional merging in the artificial experiment, as the two methods are more dependent on the accuracy of the radar estimates.
(2) Unlike the interpolation methods -ordinary Kriging, KED, or condition merging -where a Kriged mean field is given by 415 minimizing the estimation variance, one can obtain an infinite number of estimates (realizations) from the proposed method due to its Monte Carlo framework. (3) The various estimates provide a reasonable representation of the estimation uncertainty, and the estimation uncertainty of a pixel follows the tendency: the closer the pixel from the neighboring rain gauge, and the smaller the expected estimate at the pixel, the lower the estimation uncertainty is at the pixel.
The sensitivity of the proposed method towards the factors -density of rain gauges and random error in radar estimates -420 has been analyzed by running the simulation under different scenarios. The accuracy of the proposed method in estimating the extreme and mean of the rainfall field is studied, and the results are compared with that from KED. For the estimation of field extreme, it is found that increasing the number of rain gauges and reducing the random error in the radar estimates both help to improve the estimation quality for the proposed method. Yet when the radar estimates are relatively accurate, a certain number of rain gauges is already enough to gather adequate information of the rainfall field. Increasing the number of rain gauges 425 further can lead to the over-fitting problem in the estimation of the cdf-rainfall. Comparing the two methods, the proposed method outperforms KED in terms of the mean error (ME) and the interquartile range (IQR) of the error. For the estimation of field mean, the proposed method is not as sensitive as the method KED towards the two factors in the discussion. The proposed method outperforms KED in terms of the ME. The estimates from the proposed method generally have a larger estimation variance compared to that from KED, which actually reveals the distinction between a simulation and an interpolation method.

430
Author contributions. The first author, JY, did the programming work and part of the manuscript writing. The first author, FL, did lots of computational work and part of the manuscript writing. They contributed equally to this work. The second author, AB, contributed to the research idea and supervised the research. The third author, TT, provided valuable suggestions for the revision of this article.
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. We thank China Postdoctoral Science Foundation (Grant No. 0400229136) and the National Natural Science Foundation