Data fusion aims at integrating multiple data sources that can be redundant or complementary to produce complete, accurate information of the parameter of interest. In this work, data fusion of precipitable water vapor (PWV) estimated from remote sensing observations and data from the Weather Research and Forecasting (WRF) modeling system are applied to provide complete grids of PWV with high quality. Our goal is to correctly infer PWV at spatially continuous, highly resolved grids from heterogeneous data sets. This is done by a geostatistical data fusion approach based on the method of fixed-rank kriging. The first data set contains absolute maps of atmospheric PWV produced by combining observations from the Global Navigation Satellite Systems (GNSS) and Interferometric Synthetic Aperture Radar (InSAR). These PWV maps have a high spatial density and a millimeter accuracy; however, the data are missing in regions of low coherence (e.g., forests and vegetated areas). The PWV maps simulated by the WRF model represent the second data set. The model maps are available for wide areas, but they have a coarse spatial resolution and a still limited accuracy. The PWV maps inferred by the data fusion at any spatial resolution show better qualities than those inferred from single data sets. In addition, by using the fixed-rank kriging method, the computational burden is significantly lower than that for ordinary kriging.

Water vapor is a vital constituent of the Earth's electrically neutral
atmosphere (neutrosphere). Although the ratio of water vapor partial to total
atmospheric pressure is typically below 4 %, it is an important constituent
in many respects. Due to the dynamic nature of the neutrosphere and the
complex energy exchange with the Earth's surface, the spatio-temporal
distribution of water vapor can be highly variable. Accurate information
about its content and tendency is the main prerequisite for the prediction of
clouds and precipitation. Water vapor is important for studies of climate and
natural disasters such as floods, droughts or glacier melting. On the other
hand, radio signals transmitted from spaceborne sensors are refracted when
traversing the Earth's neutrosphere. The neutrospheric water vapor
contributes less than 10 % of the signal path delay; however, this error
source is not easily eliminated. Accurate information about the water vapor
concentration along the signal path is required, which is not always
obtainable. Although many efforts have been made to produce accurate
information about water vapor using ground-based, space-based or numerical
methods, the available information is often limited in the temporal
resolution, spatial resolution or accuracy

The amount of remote sensing data available for monitoring the Earth and its
atmosphere is growing in a rapid, continuous way. InSAR has proved its
capability for detecting surface deformation, landslides, and tectonic
movements

Atmospheric modeling systems are standard approaches to simulate
3-D distributions of
the neutrospheric water vapor at various temporal and spatial samplings.
Dynamic local area models (LAMs) are common tools for scaling down the coarse
grids of global circulation models to meso-scale applicability. Several
studies employed the Weather Research and Forecasting modeling system (WRF,

Despite manifold improvements over the last years, considerable uncertainties
are still connected with the parameterization of physical processes in
mesoscale-atmospheric models and biases of the driving model

Due to the availability of various data sources, which can be complementary or redundant, data fusion has received increasing attention in the Earth observation studies. The focus is put on the combination of multiple sources, which may be spatially, temporally, or spectrally inhomogeneous, to produce a more complete representation of a geophysical process. In this work, we use remote sensing data and numerical atmospheric models through a data fusion approach to provide improved information about the distribution of atmospheric water vapor. This information is important not only for weather forecasting and climate research, but also for better understanding how the InSAR interferograms are affected by water vapor, and for selecting the most appropriate method for reducing this noise. In turn, reliable local water vapor maps can support adaptation of the WRF model configurations and, hence, may improve the model performance.

Maps of the absolute atmospheric PWV derived by combining PSI and GNSS data and the corresponding map from MERIS. The spatial correlation is 95 % and the rms value of the differences is 0.68 mm.

In the following, we present water vapor maps derived from microwave remote sensing data and numerical atmospheric models. Since the available data have different spatial levels of aggregation, it is important to discuss the change of support problem. Then, we present the data fusion approach based on the kriging or fixed-rank kriging techniques. We first describe the ordinary kriging and how it can be extended for fusing multiple data sets. Then, we present the reasons behind using the fixed-rank kriging. We use the data fusion approach for predicting maps of the atmospheric PWV from remote sensing data and atmospheric models.

Several observation systems are commonly used to continuously monitor the vertical and horizontal distributions of water vapor in the atmosphere. These devices are used either from the ground, such as radiosondes and ground-based water vapor radiometers, or from space, such as space-based water vapor radiometers and infrared sensors. In this work, we employ microwave remote sensing systems as well as numerical atmospheric models to provide accurate maps of the atmospheric water vapor at a high spatial resolution.

The PSI method produces information where stable persistent scatterers are identified, which requires a high coherence between the SAR images. In forests and vegetated areas, the probability of identifying persistent scatterers is low; therefore, in these regions, only sparse points are found. The white areas within the left figure indicate regions of low coherence and the corresponding data from MERIS are masked out. The spatial correlation between the maps is 95 % and the root mean square (rms) value of the differences is 0.68 mm. We can observe that the persistent scatterers are dense in the urban areas, while they almost disappear in the low coherence regions. Since PWV data are spatial, their covariance function is exploited by geostatistical techniques to reasonably infer the PWV at regular grids. In order to improve the inferred PWV maps, especially in the areas where the PWV estimates are sparse, we apply data fusion of the remotely sensed PWV maps with maps produced by the WRF model.

WRF model set up with a parent domain of resolution
27 km

Maps of PWV content as received from MERIS and WRF, where a linear
trend is subtracted from each map. The upper data are received on 27 June
2005 (09:51 UTC), the lower data on 5 September 2005 (09:51 UTC). Gaussian
averaging is applied to scale the MERIS data at WRF resolution,
3 km

As depicted in Fig.

The WRF simulations cover the period between July 2004 and September 2005,
such that the first 5 months were considered as spin-up. The PWV content was
determined at every output time step (10 min) by a vertical integration of
all moisture fields from the land surface to the model top. Two output time
slices were compared with the simultaneous MERIS observations. The long-scale
signal is modeled by a linear trend and subtracted from the maps; hence,
negative values are observed on the color bars. From the compared maps shown
in Fig.

At the lateral boundaries, WRF ingests the mixing ratio concentration from the global model. Thus, for the presented simulation, the global climate model lateral boundary conditions were applied to the first (outer) domain. Neither gridded nor spectral nudging was activated in order to conserve the model's internal water balance. Hence the GCM boundary fluxes and the local area model physics solely determine the propagation of moisture through the respective domains. For the analysis of 27 June 2005, the atmospheric conditions were rather unexcited and varied slowly, resulting in a good agreement between MERIS and WRF data. On 5 September, a quickly moving frontal system with a strong west-to-east gradient and a notch in the atmospheric vapor over the Upper Rhine Graben characterized the study region. It is not clearly distinguishable whether the structure and dynamics of the ERA-INTERIM boundaries or the WRF model configurations are responsible for the discontinuity in PWV.

Spatial data, for which close observations correlate more than distant ones,
can be collected at points or areal units. The former are called point-level
data or simply point data and the latter are areal-level or block data

For block data that can be expressed as an average of point data as if it is
collected within the block, such as rainfall, temperature, surface elevation,
and atmospheric water vapor, the following model is
appropriate:

Point and block data, such that for spatial data,

In geostatistics, a spatial process can be inferred over a continuous spatial
domain by exploiting the covariance function as an important source of
information. Predictions are obtained based either on single or multiple
sets. Kriging is a geostatistical interpolation technique that infers values
at new locations by considering spatial correlations

The kriging method extends the spatial process using the following linear
model:

Spatial statistical data fusion (SSDF) is a method that statistically
combines two data sets to optimally infer the quantity of interest and
calculate the corresponding uncertainties at any predefined grid

Spatial autocorrelation function for a PWV map, with the long-wavelength component removed, computed from remote sensing data acquired on 5 September 2005, 10:51 UTC.

The fixed-rank kriging (FRK) approach splits the spatial process into two or
three components depending on the spatial wavelength, i.e,

The weights stored in the matrix

The observation domain with the black dots defines the locations at
which the data are available. The black little squares indicate the nodes.
The weights for each location

The last component in Eq. (

Obtaining predictions via the FRK method.

FRK nodes or center locations of 93 basis functions at three spatial resolutions. The first resolution is 40 km, the second resolution is 20 km, and the third resolution is 10 km.

Based on the model in Eq. (

We classify the spatial variations of the atmospheric water vapor signal into
three components: long wavelength, medium to short wavelength, and
uncorrelated fine scale. Therefore, we split the water vapor signal using the
linear model in Eq. (

Wet delay prediction map using block OK and FRK. The resolution of
the grid is 3 km

We applied the OK and FRK to estimate the zenith-directed wet delay derived
from remote sensing data. For the FRK, the matrix

In the next section, we describe the extension of the FRK method for predicting the atmospheric PWV by fusing remote sensing data and the WRF model.

In this section, we fuse the PWV maps derived from the remote sensing data and WRF model. Since we classify the spatial variations of the atmospheric water vapor signal into long wavelength, medium to short wavelength, and uncorrelated fine-scale components, we use the following model setup for prediction.

PWV maps will be derived from the remote sensing data, denoted

To solve the system in Eq. (

The covariance terms are obtained from

Model components from point-level and areal-level data.

Using the FRK covariance model in Eq. (

In this section, we build PWV maps from remote sensing and WRF model data
using a spatial statistical data fusion method. The first PWV map, derived by
combining GNSS and PSI, has 169 688 data points. The WRF model provides
a block-level map of 1296 cells of the size 3 km

Following the work flow in Fig.

PWV maps from the PSI + GNSS combination and WRF on 5 September
2005, with a linear trend subtracted from each map. PSI + GNSS provide
point-level observations, while WRF generates block data with a block size of
3 km

Spatial correlation coefficients (CC) and rms values when comparing the prediction maps with MERIS PWV maps.

Second, the matrices

PWV prediction and MSPE maps obtained by data fusion of PWV
estimates from PSI and GNSS and maps from WRF as well as predictions obtained
by applying FRK to individual data sets. The data are available on
5 September 2005 at 09:51 UTC. The output grid has a block size of
3 km

PWV maps from remote sensing (PSI+GNSS) and WRF model data on
27 June 2005 at 09:51 UTC as well as prediction maps obtained by data fusion
and individual data sets. The output grid has a block size of
3 km

In the third step, the covariance parameters (

So far, all components required to produce the predictions using
Eq. (

In addition, we show the PWV profiles over the line drawn horizontally at the
latitude 49.37

In the above example, the data from remote sensing have a more significant
influence on the output. In Fig.

We presented a method to obtain the atmospheric PWV over any aggregation
level by the fusion of remote sensing data and atmospheric models. The PWV
maps derived by combining data from PSI and GNSS are available at discrete
points that are absent in regions of low coherence. On the other hand, the
WRF model provides simulations of PWV in the atmosphere on regular grids at
a coarse spatial resolution. Both the quality of the model data and the model
skills for representing mesoscale atmospheric structures should be improved.
The quality of the prediction maps should be improved by data fusion. For
data fusion, the method of spatial statistical data fusion, first presented
in

To further improve the results, we suggest the following. The matrix

Predicting the stochastic component of the atmospheric signal using kriging
requires obtaining the covariance function

Another approach proposed by

Assume that the observations in

In the expectation step of the algorithm, we calculate

The measurement error variance

Estimate of the covariance matrix

The estimate of

The authors would like to thank the GNSS data providers: RENAG, RGP, Teria,
and Orpheon (France),
SA