Research article 22 Aug 2018
Research article  22 Aug 2018
Estimating timedependent vegetation biases in the SMAP soil moisture product
 ^{1}Department of Geography, University of Guelph, Guelph, Ontario, Canada
 ^{2}Department of Environmental Engineering, ETH Zurich, Zurich, Switzerland
 ^{3}NASA Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, USA
 ^{4}USDAARS Hydrology and Remote Sensing Laboratory, Beltsville, Maryland, USA
 ^{5}Instituto Hispano Luso de Investigaciones Agrarias, Universidad de Salamanca, Salamanca, Spain
 ^{6}Science and Technology Branch, Agriculture and AgriFood Canada, Ottawa, Ontario, Canada
 ^{7}USDAARS Grazinglands Research Laboratory, El Reno, Oklahoma, USA
 ^{8}Comisión Nacional de Actividades Espaciales, Buenos Aires, Argentina
 ^{1}Department of Geography, University of Guelph, Guelph, Ontario, Canada
 ^{2}Department of Environmental Engineering, ETH Zurich, Zurich, Switzerland
 ^{3}NASA Jet Propulsion Laboratory, California Institute of Technology, Pasadena, California, USA
 ^{4}USDAARS Hydrology and Remote Sensing Laboratory, Beltsville, Maryland, USA
 ^{5}Instituto Hispano Luso de Investigaciones Agrarias, Universidad de Salamanca, Salamanca, Spain
 ^{6}Science and Technology Branch, Agriculture and AgriFood Canada, Ottawa, Ontario, Canada
 ^{7}USDAARS Grazinglands Research Laboratory, El Reno, Oklahoma, USA
 ^{8}Comisión Nacional de Actividades Espaciales, Buenos Aires, Argentina
Correspondence: Simon Zwieback (zwieback@uoguelph.ca)
Hide author detailsCorrespondence: Simon Zwieback (zwieback@uoguelph.ca)
Remotely sensed soil moisture products are influenced by vegetation and how it is accounted for in the retrieval, which is a potential source of timevariable biases. To estimate such complex, timevariable error structures from noisy data, we introduce a Bayesian extension to triple collocation in which the systematic errors and noise terms are not constant but vary with explanatory variables. We apply the technique to the Soil Moisture Active Passive (SMAP) soil moisture product over croplands, hypothesizing that errors in the vegetation correction during the retrieval leave a characteristic fingerprint in the soil moisture time series. We find that timevariable offsets and sensitivities are commonly associated with an imperfect vegetation correction. Especially the changes in sensitivity can be large, with seasonal variations of up to 40 %. Variations of this size impede the seasonal comparison of soil moisture dynamics and the detection of extreme events. Also, estimates of vegetation–hydrology coupling can be distorted, as the SMAP soil moisture has larger R^{2} values with a biomass proxy than the in situ data, whereas noise alone would induce the opposite effect. This observation highlights that timevariable biases can easily give rise to distorted results and misleading interpretations. They should hence be accounted for in observational and modelling studies.
 Article
(10397 KB) 
Supplement
(484 KB)  BibTeX
 EndNote
Soil moisture products derived from satellite measurements are subject to errors. These are not constant, but vary in space and time. For any given location, they may depend on variable factors such as vegetation phenology, atmospheric conditions or measurement characteristics like the incidence angle (Entekhabi et al., 2010; Loew and Schlenz, 2011; Su et al., 2016). Vegetation is commonly considered to be the most delicate factor to control when retrieving soil moisture from the raw satellite measurements (Chan et al., 2016; Dorigo et al., 2017). An imperfect vegetation correction during the retrieval will induce timevariable errors in the soil moisture product (Konings et al., 2017). As such structural errors potentially distort estimates of vegetation–soil moisture coupling in unexpected ways (Doherty and Welter, 2010), they are more pernicious than simple quasirandom noise or than timeconstant biases.
However, the timevariable error properties of soil moisture products are poorly understood and rarely considered in practice (Loew and Schlenz, 2011). Arguably one reason for this is that there is currently no consistent way of estimating such rich error structures from data. If an exact reference soil moisture measurement were available, this would be an easy enterprise (Su and Ryu, 2015). Dense in situ measurements are widely considered to be the closest one can get to perfect reference data, but they are rare, and nonnegligible uncertainties remain (Colliander et al., 2017). In the absence of perfect reference data, any useful error estimation procedure must cope with errors in all input data. Triple collocation and its various extensions can provide consistent error estimates in these circumstances (Gruber et al., 2016; Zwieback et al., 2016). However, similar to the standard RMSE metric, they cannot directly separate nonconstant systematic errors from quasirandom measurement noise. Conversely, common preprocessing steps (forming anomalies, analysis of short time periods) provide a simple means to deal with certain nonconstant errors, but lack generality (Gruber et al., 2016; Loew and Schlenz, 2011).
Here, we extend triple collocation to estimate nonconstant error structures (Sect. 2). The central idea is to express systematic errors such as an offset and random errors in terms of explanatory variables like a vegetation index (Doherty and Welter, 2010; Xu et al., 2017). The choice of explanatory variables should be informed by the measurement principle and the retrieval algorithm. In our case study of the Soil Moisture Active Passive (SMAP) product, we will specify it in terms of a misspecification of the vegetation correction during the retrieval process. We generally assume that three independent and noisy products are available. Once their hypothesized error structure has been specified, our Bayesian triple collocation approach proceeds in two steps. First, the specified error structure is embedded in a probabilistic model that links the unknown soil moisture θ with observed soil moisture products y_{n}. Second, Bayesian inference is applied, and one thus obtains estimates and uncertainties of the error parameters of interest. A simulation study indicates that timevariable systematic errors can be estimated reliably for as few as 250 samples (Sect. 3).
We apply this procedure to estimate timevariable biases in the SMAP soil moisture product that are associated with an imperfect vegetation correction (Sect. 4). This is not to say that other soil moisture products are not subject to similar biases; on the contrary, the SMAP baseline product is widely considered to provide the most reliable global soil moisture data record available (Chan et al., 2017; Colliander et al., 2017). To estimate soil moisture from the passive microwave measurements, the retrieval algorithm has to correct for the vegetation influence (Chan et al., 2017; Kurum et al., 2011). Specifically, the SMAP algorithm assumes that the vegetation optical depth τ, a dimensionless measure of how much the microwaves interact with the vegetation, is known a priori.
We focus on croplands, as they present a particular challenge to the vegetation correction approach using an a priori τ. Over crops, the input τ, which is derived from the Normalized Difference Vegetation Index (NDVI), only provides an incomplete picture of the vegetation influence at microwave frequencies. The NDVI, being strongly influenced by leaf chemistry, can only indirectly account for the dominant controls on τ, namely the vegetation water content and the canopy structure, both of which are particularly diverse and dynamic in crops (Lawrence et al., 2014; Momen et al., 2017). The NDVIbased input τ also does not account for interannual variability in vegetation conditions, which cannot be neglected over croplands (Patton and Hornbuckle, 2013). Consequently, agricultural regions have been identified as a weak spot for the SMAP product (Colliander et al., 2017). However, the timeaverage metrics analysed so far cannot distinguish seasonal biases from quasirandom noise, and the magnitude of vegetationinduced systematic errors thus remains unknown.
We hypothesize that seasonal changes in the error structure arise due to an inaccurate vegetation correction in the retrieval, so that the biases relative to the in situ data track the misspecification in the vegetation optical depth Δτ. We specify the SMAP offset and sensitivity as a function of Δτ, based on predictions by the τ−ω radiative transfer model (Kurum et al., 2011). We estimate the associated error parameters with the Bayesian triple collocation approach, using in situ and reanalysis data as additional soil moisture products. We find that SMAP soil moisture biases that track Δτ are widespread and large over croplands. This is especially so for the sensitivity, resulting in a timedependent dynamic range of the SMAP soil moisture product that impedes seasonal comparisons of soil moisture dynamics. We attribute the timevariable biases to the imperfect vegetation correction, as the inferred bias characteristics largely match those predicted by the τ−ω model. To illustrate the potential influence of these biases on estimates of vegetation–hydrology coupling (Adegoke and Carleton, 2002), we show that the coefficient of determination between SMOS τ anomalies and SMAP soil moisture anomalies is inflated compared to in situ soil moisture measurements. In summary, our analyses suggest that soil moisture products can be subject to previously neglected timevariable biases that should be accounted for in observational and modelling studies.
We now present a general overview of the approach (Fig. 1), which consists of two components. First, a probabilistic model that links the unknown soil moisture θ with the observed soil moisture products y_{n}. The link itself characterizes the error structure: it depends on error parameters ${\mathit{\gamma}}_{{y}_{n}}$ and explanatory variables w. Second, a Bayesian inference approach that provides estimates of all the unknown quantities, in particular the error parameters. By conditioning on the observed soil moisture data (input), we get a posterior distribution over the unobservable quantities (output). We focus on a setting inspired by triple collocation studies, i.e. we for the most part assume that N=3 independent and noisy products are available (Gruber et al., 2016). In regular triple collocation, three independent products provide sufficient information to estimate the σ of all three products and bias parameters of two of the three products. In a Bayesian setting, the presence of prior information allows one to reduce the number of independent products, but the results will strongly depend on the prior distributions.
Our approach has several characteristics that make it useful in a wide range of applications. It is widely applicable, as no soil moisture product is assumed to be free of errors. This is particularly critical for estimating the noise magnitude and the sensitivity, which cannot be estimated consistently by standard regression approaches when the reference product is subject to errors (Yilmaz and Crow, 2013). Also, it provides principled uncertainty estimates through the posterior distribution. It is flexible, as it can be adapted to many functional relations and error structures. Finally, it is transparent because all modelling assumptions are explicit. Owing to its flexibility, the model can be modified to test the sensitivity of the results to certain assumptions. We will exploit these advantages by modifying the prior distribution and the likelihood; for instance, we will test several models for the unknown soil moisture θ.
2.1 Probabilistic model
The probability distribution comprises the observable products y_{0} to y_{N−1}, the unknown soil moisture θ and numerous parameters: the set of parameters ${\mathit{\gamma}}_{{y}_{n}}$ characterizes product n, and γ_{θ} does the same for the soil moisture. The structure of the model is summarized in Fig. 1a, which depicts the corresponding directed acyclic graph (MacKay, 2003). This graph expresses how the distribution over all the random variables is factorized into smaller components. Starting with an observable product y_{n}, its distribution is modelled conditional on the unknown θ, the associated parameters ${\mathit{\gamma}}_{{y}_{n}}$ (including the timeindependent and timedependent bias parameters) and the explanatory variables ${w}_{\cdot ,n}$. We refer to this conditional distribution as the error model. It is conditional on the soil moisture θ, whose distribution (conditional on parameters γ_{θ}) is referred to as the soil moisture model. The final component are the parameters ${\mathit{\gamma}}_{{y}_{n}}$ and γ_{θ}, which are assigned prior distributions. These prior distributions are integral to the Bayesian approach, and they express the initial belief about the parameters' likely values. We now describe each of these components in turn. To facilitate future applications of the approach, we do this using very general notation, which we will later in the SMAP case study simplify by dropping subscripts.
2.1.1 Error model
Each product's error model is governed by a set of parameters ${\mathit{\gamma}}_{{y}_{n}}$ which quantify the error component such as the biases (systematic errors). We consider an affine error model according to which the dynamic range or sensitivity of the product can differ from that of the true soil moisture, governed by the scaling parameter L_{n}(t). We also include an additive offset or bias M_{n}(t) (Yilmaz and Crow, 2013; Zwieback et al., 2012), yielding
where we essentially relate the deviation of soil moisture from a typical, prescribed soil moisture θ_{0} to the observed product.
Our key extension compared to previous triple collocation studies is to make the bias terms and the noise magnitude vary with timedependent explanatory variables, ${w}_{\cdot ,np}\left(t\right)$:
where l_{n} and m_{n} are timeinvariant values specific to product n, and the parameters λ_{np} and μ_{np} are coefficients that quantify the dependence on the pth explanatory variable w_{λ,np} and w_{μ,np}, respectively. The explanatory variables can depend on the product n as well as on the parameter (μ, λ). They are external quantities that are part of the model input. To facilitate the interpretation of the parameters, we always assume the explanatory variables to have zero mean and unit standard deviation (Gelman et al., 2008). l_{n} and m_{n} thus represent the timeaverage biases. The magnitude of λ_{n} and μ_{n} represents the magnitude of the associated temporal changes in L_{n} and M_{n}, respectively.
The quasirandom errors are characterized by their variance and further distributional assumptions. For the variance ${S}_{n}^{\mathrm{2}}\left(t\right)=E\left({\mathit{\u03f5}}_{n}\right(t{)}^{\mathrm{2}})$ we suggest a multiplicative model that is commonly employed in regression studies (Harvey, 1976):
where κ governs the sensitivity of the error variance to the explanatory variable, which has to be positive. We always assume it to be normalized so its geometric mean is one, as this simplifies the interpretation of the typical variance ${\mathit{\sigma}}_{n}^{\mathrm{2}}\left(t\right)$. A positive value κ>0 indicates a larger variance as the explanatory variable increases, and a negative value corresponds to a smaller variance. To complete the specification of the errors, a probability distribution has to be assumed. We focus on a normal distribution with zero mean and variance given by ${S}_{n}^{\mathrm{2}}$, as our robustness checks indicated that the results were not very sensitive to this assumption (cf. Sect. 3). Finally, two further properties have to be specified. First, we assume that the errors at different times are independent. We will later show that violations of this assumption commonly have negligible impact on the estimated parameter values. Second, we postulate that the errors of the various products are independent. This is generally a key assumption in triple collocationtype studies (Gruber et al., 2016).
We generally specify y_{0} to be the reference product (Gruber et al., 2016; Yilmaz and Crow, 2013). Its error magnitude is assumed to be constant over time, and so are its additive bias (M=0 m^{3} m^{−3}) and its sensitivity L=1. This constraint ensures that the unknown soil moisture distribution (its mean and scale) can be inferred from data, as it essentially specified what soil moisture value the reference product was a noisy estimate of (Zwieback et al., 2016). The bias terms are thus always relative to this reference product.
2.1.2 Soil moisture model
The second piece of the probabilistic model concerns the soil moisture θ, the distribution of which also has to be specified. Our default representation is a simple logistic model. The physical quantity θ, which is constrained to between 0 and the porosity ϕ, is expressed as a function of the nondimensional unbounded soil moisture Θ
Θ is modelled as a standard normal random variable. The sitespecific parameters A, B and ϕ are inferred in the Bayesian inference. We summarized the parameters of the soil moisture model under γ_{θ}.
One drawback of this model is that it cannot account for the autocorrelation and seasonality of soil moisture. To test for the importance of temporal characteristics, we also generalize the model by making A and B timedependent. We do this by expanding A and B in a spline basis with 12 monthly basis functions. While this model cannot completely account for the complex temporal characteristics of soil moisture, it captures the seasonal trends.
2.1.3 Prior distributions
To complete the full probability distribution, one has to specify the prior distributions of the parameters γ_{θ} and ${\mathit{\gamma}}_{{y}_{n}}$. As we work with normalized explanatory variables, we can use a problemindependent prior distribution (Gelman et al., 2008). We choose the priors to be weakly informative, thus partially ruling out unreasonable values but still letting the data speak for themselves (Gelman et al., 2008). Our default choices are summarized in Table 1.
For all products y_{n}, we put a very weak prior on the error magnitude variance σ^{2}. It is given by an exponential distribution with mean 0.1 (m^{3} m^{−3})^{2}, plotted in Fig. S1 in the Supplement. In other words, we barely constrain this quantity. For the product error parameters, m and μ (m^{3} m^{−3}) are assumed distributed according to a t distribution T (0, 0.3^{2}; 4); Fig. S1. It is centred at 0 with a standard deviation of 0.3 and very heavy tails due to its four degrees of freedom. These values barely constrain the estimation of the additive bias m, as they do not rule out biases as large as 0.5 m^{3} m^{−3}. Similarly for l and λ (–), whose prior is T (1, 0.3^{2}; 4).
The standard prior distribution for the soil moisture parameters γ_{θ} follows a similar logic. The porosity (m^{3} m^{−3}) is given by T (0.4, 0.1^{2}; 4). The parameters A and B in the standard logistic model are assigned T (0, 3.0^{2}; 4) and an exponential distribution with mean 3.0, respectively.
2.2 Bayesian inference
The Bayesian inference takes the observed products and explanatory variables as input and outputs posterior probability distributions over the unknown quantities (Fig. 1). The posterior distributions are obtained from the probabilistic model by conditioning on the input data. Conditioning is a welldefined mathematical operation, but analytical solutions are infeasible for complicated models like ours (MacKay, 2003). Instead, one has to resort to approximations. Monte Carlo methods are arguably the most popular. Their output is an approximation to the posterior distribution that consists of samples drawn from this distribution. Here, we rely on Hamiltonian Monte Carlo as implemented using the adaptive NoUTurn Sampler in pymc3 (Hoffman and Gelman, 2014; Salvatier et al., 2016). The NoUTurn Sampler produces successive, dependent samples of the posterior distribution that are called a chain. Each sample consists of draws from the posterior distribution, or actually an approximation thereof, of all the unobserved random variables (Output in Fig. 1b). They comprise the parameter random variables (e.g. the timedependent biases) as well as the soil moisture time series, i.e. one value of θ for each SMAP observation. For each location, we sample two independent chains with 2000 samples each, which standard quality controls (divergences, chain mixing) indicate is sufficient. Following common practice, the first 1000 samples are discarded (Brooks and Gelman, 1998)
We now study the applicability of Bayesian triple collocation using a simulation study. We used three simulated products with realistic error properties (Table 1). y_{0} was taken to be the reference product both in the simulation and in the probabilistic models. For the other two products the biases and error magnitudes were assumed dependent on a normalized explanatory variable that varied seasonally (Table 1). The soil moisture was prescribed using a simple antecedent precipitation model driven by a seasonally dependent hidden Markov model rainfall generator, which gave rise to autocorrelated soil moisture. Mimicking the SMAP sensor, an observation was made every 2–4 days.
We first analysed the fidelity with which the error parameters could be estimated. To this end, we simulated R=25 time series and computed the posterior distribution using the default probability model of the previous section, summarized in Table 1. We computed an aggregated RMSE error for each parameter π by comparing the prescribed parameters π_{n} for all products with the posterior means inferred from each simulation run r:
where ${\widehat{\mathit{\pi}}}_{n,r}$ is the posterior mean of parameter π estimated for product n (out of N_{π}) in run r. To quantify systematic deviations, we computed an average absolute bias (in the root mean square sense) between the truth and posterior mean for each parameter:
and we subsequently averaged the magnitudes over all products (referred to as mean absolute bias). To explore the impact of the number of observations on the accuracy, we did this for 100, 250 and 500 observations, which bound the number of observations that were available in the SMAP study.
The error parameters could be estimated with sufficient fidelity in the simulation study (Fig. 2; full model). The accuracy of the estimated timevariable additive bias parameter μ was found better than 0.01 m^{3} m^{−3}, and thus likely sufficient to detect relevant nonconstant biases. The sensitivity coefficients λ are retrieved with comparable precision: the RMSE of 0.05 corresponds to a differential bias between dry and wet conditions of around 0.01 m^{3} m^{−3} (assuming a soil moisture dynamic range ∼0.25 m^{3} m^{−3}). Again, this should be sufficient for many applications. Finally, the timeconstant noise magnitude σ could be estimated accurately (≪0.005 m^{3} m^{−3}).
Bayesian triple collocation yields a distribution of the parameters and thus naturally provides uncertainty estimates. The posterior standard deviation s_{p} compares favourably to the RMSE errors: Fig. 2b shows that the posterior standard deviations are within 10 % to 20 % of the RMSE errors. Even though these two quantities represent different kinds of uncertainty, they are expected to be comparable for large samples (MacKay, 2003). The posterior distribution hence provides a useful summary of the estimation uncertainty.
To test the sensitivity of the estimates to model assumptions, we extended the simulation study. The most critical aspect turned out to be the specification of the bias terms: neglecting variable bias terms can impair the overall estimation quality. Neglecting the complex error structure leads to an overestimation of the error magnitude (Fig. 2). In our case, setting μ, λ and κ to 0 induced an increase in the RMSE of σ by a factor of 2. This increase was mainly due to a large bias, as the varying offsets were wrongly attributed to quasirandom noise. The posterior uncertainty estimates also became inaccurate. Conversely, neglecting only the variability of the error magnitude (i.e. setting κ=0) had limited impact on the retrieval of the other parameters.
To test for additional model assumptions, we modified the model and the forward simulations. The impact on the estimation accuracy was typically limited (see the Supplement), so we only provide a short summary. First, the model for the noise term ϵ had a moderate influence on the estimation quality. A mismatch between assumed and simulated ϵ distributions increased the RMSE of the error variance by a small amount (<0.001 m^{3} m^{−3} for σ) for a range of error distributions and strongly autocorrelated errors. The other crucial assumption in the model is the probability distribution for the soil moisture. Also here, the changes are typically small (up to 10 % improvement in the RMSE, but a decrease in bias) when replacing the standard timeinvariant model by a seasonally variable model or by a different timeinvariant model. The improvement suggests that the modelinternal soil moisture distribution can have an impact on the estimated biases, in particular when the actual soil moisture is correlated with the explanatory variable, as it was in the simulated data. We would hence expect that for most applications it is the seasonal and subseasonal timescales that the soil moisture model should be able to capture. For comparison, autocorrelation on the interstorm timescale that is not captured by our model but present in the simulated data did not seem to introduce major limitations (sufficient fidelity for the full model, Fig. 2). The prior distributions had an even smaller impact. Making the prior distributions twice as wide or the tails less heavy changed the estimates by only a few percent.
4.1 Materials and methods
4.1.1 Data
To estimate the biases of the SMAP soil moisture product, we used N=3 soil moisture data sets in the probabilistic inference: apart from the SMAP product, these were in situ data from dense networks or sparse sites, and MERRA2 reanalysis data. The SMAP data set we analysed was the SMAP enhanced Level2 soil moisture product, which is disseminated on the 9 km EASEGrid 2.0 at a resolution of 33 km (Chan et al., 2017; O'Neill et al., 2017). It contains a variety of estimates of the top (5 cm) soil moisture, of which we chose the standard product (baseline V single channel algorithm, 09:00 morning passes). The single channel retrievals rely on an a priori τ derived from a MODIS climatology; these τ values are included in the disseminated product. We studied all available data since the beginning of the record in April 2015 until August 2017, i.e. up to three annual growing seasons. After removing flagged retrievals (Colliander et al., 2017), the number of available measurements is on the order of 300–500, which is not ideal but should be sufficient according to Fig. 2a.
The analyses focus on seven locations in North America, South America and Europe with significant crop cover, due to the availability of highquality dense in situ networks (Table 2). At these SMAP core or candidate sites, continuous calibrated in situ measurements at 5 cm depth are collected at multiple locations within a SMAP grid cell (Colliander, 2017; Colliander et al., 2017).
To provide a better overview of the spatial patterns, we also used data from >200 in situ sites in the contiguous US (SCAN and USCRN networks). These sparse sites consist of a single station per satellite pixel, and their representativeness is hence not comparable to that of the network sites. The USDA's SCAN network has been in continuous operation since 1999 and provides soil moisture data at 2 in (5 cm) depth (Schaefer et al., 2007). The USCRN network consists of 114 sites whose location was chosen to be maximally representative of its surroundings; we used the 5 cm soil moisture observations (Bell et al., 2013; Diamond et al., 2013; Palecki et al., 2017). We assigned these sites a dominant land cover based on the MODIS MCD12C1 land cover product (Friedl et al., 2017).
For the third soil moisture product we used the MERRA2 reanalysis (M2T1NXLND.5.12.4) (Gelaro et al., 2017; Global Modeling and Assimilation Office, 2017). It is the most recent reanalysis product of NASA's Global Modeling Office, available at a resolution of 55 km. For sensitivity analyses, we also used GLDAS2 (Noah model, GLDAS_NOAH025_3H.2.1), a popular land assimilation data set (Roddell and Beaudoing, 2017).
To quantify the error structure as a function of Δτ, we estimated Δτ as the difference between an external estimate of τ and the SMAP input τ. The external estimate was based on independent Lband microwave observations by the SMOS satellite (Level 3 operational algorithm). The multiangular observations and the temporal aggregation of multiple overpasses are conducive to providing robust, if noisy, measurements of the vegetation optical depth relevant to SMAP (Al Bitar et al., 2017). To reduce the impact of highfrequency noise, the Δτ data were smoothed to a temporal resolution of 16 days (LOWESS filter). The explanatory variable used in the Bayesian inference is the normalized version
4.1.2 Error estimation
In our estimation we specified the error structure based on our hypothesized impact of a vegetation misspecification. The τ−ω model (Kurum et al., 2011) predicts that a vegetation misspecification $\mathrm{\Delta}\mathit{\tau}={\mathit{\tau}}_{\mathrm{inv}}{\mathit{\tau}}_{\mathrm{true}}$ will induce both a sensitivity L and an offset M which scale approximately linearly with Δτ (Fig. 3a). There is thus only one explanatory variable (${P}_{\mathit{\lambda},n}={P}_{\mathit{\mu},n}=\mathrm{1}$), namely the normalized w_{Δτ} of Eq. (8). By slightly simplifying the notation, the error model reads
Here θ_{0} was taken to be the mean value of the in situ product. We quickly recall the interpretation of the error parameters. λ and μ quantify the temporal changes in the sensitivity and offset, respectively, that are associated with changes in Δτ. Their magnitudes correspond to the temporal standard deviation of the sensitivity and offset, respectively (Fig. 3b). Their sign expresses the direction of the dependence on Δτ. It is predicted to be positive: as Δτ grows, the inversion increasingly overcompensates the vegetationinduced loss of sensitivity to soil moisture and increase in brightness temperature, which in turn inflates both L and M, respectively.
To compute the predicted biases in Fig. 3a), we assumed the τ−ω model applied and was correctly specified (temperature, dielectric mixing model [Dobson; silt loam], singlescattering albedo ω=0.05, etc.). For a given value of τ_{true}, we simulated the Vpolarized brightness temperatures for dry and wet soil moisture conditions. These brightness temperatures were in turn the basis for estimating soil moisture by inverting the τ−ω model using the wrong τ_{inv} as a function of Δτ. For both dry and wet soil moisture conditions, the deviation was an estimate of the retrieval bias: their mean was taken to be an estimate of the offset M, whereas their difference allowed us to estimate L. When plotted against Δτ, M and L increase nearly linearly and only show a weak dependence on τ_{true}. The slope of this relation is thus well but not perfectly defined. We refer to the slopes as μ^{*} (for M) and λ^{*} (for L), respectively. To account for the spread due to the slight curvature and dependence on τ, we estimated the likely range of values by computing the slopes from the differences in L or M between five equally spaced values of Δτ (between −0.1 and 0.1), repeated for equally many values of τ_{true} (between 0.1 and 0.6). The range of these values was ${\mathit{\lambda}}_{\mathrm{pred}}^{*}\in [\mathrm{2.0}$, 3.8] and ${\mathit{\mu}}_{\mathrm{pred}}^{*}\in [\mathrm{0.33}$, 0.65] m^{3} m^{−3}. These ranges will later be compared to datadriven estimates, thus providing a firstorder assessment of the agreement between predictions and observations, despite the simplified setup and neglect of other retrieval errors.
The same overcompensation that increases the sensitivity also increases the noise level, so that we also made the variance of ϵ(t), S^{2}(t) time dependent. We used two explanatory variables, expΔτ and the external τ (both normalized, ${P}_{\mathit{\kappa},n}=\mathrm{2}$). The reason for also including τ was that the noise level S^{2} is predicted to depend on τ even if Δτ is constant, in contrast to the bias terms. By transforming Δτ to expΔτ, the predicted dependence of S^{2} shown in Fig. 3 could be reproduced accurately. The dependence on expΔτ is positive, i.e. an increase in Δτ is associated with an increase in noise level. We would thus expect κ_{expΔτ}>0.
The same error structure was assumed for the reanalysis data. The inclusion of a Δτdependent bias for the reanalysis product is not driven by physical reasoning, but for statistical reasons. By controlling for the same explanatory variables for both products, the impact of potential confounders – e.g. a seasonal bias that is correlated with Δτ – on the bias estimates of the remotely sensed product can be reduced. If this were not done, the model would try to partially adjust the timevariable bias term of the remote sensing product to minimize the systematic differences to the reanalysis product, thus distorting the bias estimates. The in situ observations were taken as reference product y_{0}: L=1, M=0, κ=0). All other model components – the soil moisture and error distributions, the priors – were identical to the simulation study (Table 1).
To compare the estimated biases with the model predictions, we reexpressed λ and μ in absolute terms by reversing the rescaling from Δτ to w_{Δτ}. Thus,
are, respectively, the slope of L and M vs. Δτ. In other words, the estimated λ^{*} describes the inferred change in L for a unit change in Δτ, and similarly for μ^{*}. They can be directly compared to the model predictions described above. Note that the division only reverses the scaling but not the offset inherent in the normalization from Δτ to w_{Δτ}. Estimating the derivative at the mean value of Δτ (due to the offset) rather than at Δτ=0 has, however, no impact for a purely linear relation.
To test the robustness of the estimates, we varied the input data and the model configuration. Instead of using the SMOS τ as the reference, we also derived Δτ from the SMAP dualchannel (LOWESSsmoothed) retrievals (O'Neill et al., 2017) and from contemporaneous MODIS NDVI data (Didan, 2017). We converted the MODIS NDVI to τ using the same equations as in the generation of the SMAP input climatology. We refer to these model runs as SMAP DC τ and MODIS τ, respectively. We also modified the external soil moisture products: instead of MERRA2 we used GLDAS2 (GLDAS θ), and we also dropped the reanalysis data set altogether (no reanalysis). This is possible in a Bayesian setting; to improve the identification of the errors, we assigned a narrower prior distribution to the in situ noise magnitude (0.02 m^{3} m^{−3}). To account for a potential confounding of τ itself, which may also have an effect on the biases, we included the smoothed SMOS τ as a second explanatory variable for L and M, referred to as the τ control configuration. Finally, we also modified the model configuration. We made the probability model for θ vary seasonally (θ spline) to account for the nonnegligible impact of this detail of the model specification on the estimates observed in the simulation (Sect. 2.1.2).
4.1.3 Estimates of vegetation–soil moisture coupling
To explore the relation between timevariable vegetation biases and estimates of vegetation–water coupling, we analysed the coefficient of determination R^{2} between τ and soil moisture anomalies. These were derived from the SMOS τ time series and both SMAP and in situ soil moisture, respectively. The associated anomalies τ^{′} and θ^{′} were obtained by subtracting a seasonal climatology that in turn was the smoothed (30day) multiyear average of the input time series. To compare the SMAP and the in situ soil moisture data, we then computed the difference in the coefficient of determination ΔR^{2}:
If the SMAP soil moisture were only contaminated by random noise relative to the in situ data, ΔR^{2} would tend to be negative. Timevariable vegetation biases, on the other hand, can induce positive values. Timeaverage biases cancel out. Of course, there are also other error sources such as representativeness errors, especially for the sparse networks. Further, the time series are comparatively short, but ΔR^{2} can provide a first tentative assessment of the reliability of coupling metrics such as R^{2} in the presence of timevariable biases.
4.2 Results
4.2.1 Network sites
SMAP biases that track the vegetation misspecification Δτ are common at the core validation sites. Especially changes in the sensitivity can be large. Reanalysis bias parameters were estimated as well, but they are considerably smaller in magnitude than those of the SMAP product.
The varying sensitivity of SMAP is illustrated for the South Fork site, Iowa (USA), in Fig. 4a). In July, when the predominantly cultivated corn is heading and flowering (Tomer et al., 2008), the magnitude of the SMAP response to rainfall events matches that of the in situ data. Conversely, in autumn (senescence, post harvest) the sensitivity of SMAP to soil moisture variations is diminished. The Bayesian inference reproduces this visually inferred pattern, as the sensitivity L drops by more than 0.5 (or 50 %). By design, the temporal changes in L are governed by changes in Δτ, which drops by about 0.1 from July to September. Over the entire time series, its temporal variability is $\left\mathit{\lambda}\right=\mathrm{0.3}$. Note that even when Δτ≈0 (August), the sensitivity is too low. As Δτ is essentially zero on average over the entire time series (−0.01), the sensitivity at Δτ=0 corresponds to the timeaverage value l. Its posterior median is 0.64 and thus less than the expected value of 1. We will return to the timeaverage biases later.
Pronounced changes in the sensitivity are found for all network sites but one (Fig. 4b). For the most part their magnitude is large, as the $\left\mathit{\lambda}\right$ values correspond to a temporal variability of around 10 % to 50 %. The inferred relation to Δτ is consistently positive (λ>0), i.e. as Δτ increases over time, so does the SMAP sensitivity L(t). The inferred direction hence matches the model prediction (Fig. 3a) of a positive λ. The model does not constrain the magnitude of the λ parameter because the latter is an internally normalized quantity. When converted into absolute quantities (λ^{*}), the inferred dependence of L on Δτ matches the model predictions reasonably well (Fig. 4b). In other words, the dataderived, completely independent estimate is broadly consistent with the predicted impact of a τ misspecification in the retrieval, despite limitations in the estimates (e.g. issues with the reference τ) and the model predictions (e.g. assumed knowledge of the land surface temperature) of λ^{*}. There is no clear apparent dependence of λ^{*} on location or land cover properties; for instance, Monte Buey and Bell Ville are within <100 km of one another, and despite the similarity in planted crops the latter's λ^{*} is considerably larger.
Also, the additive biases track changes in Δτ at the network sites, but to a lesser extent (Fig. 4c). Over time, the biases vary by up to 0.02 m^{3} m^{−3} in magnitude. The inferred direction, μ>0, matches the predictions, as an increase in Δτ corresponds to a larger offset. However, the magnitude of the inferred change in the offset with Δτ, μ^{*}, is generally smaller than predicted. A very small value of μ^{*} is found at the South Fork site: it is a factor of 10 smaller than its predicted range (Fig. 4c). It is also small enough to be practically irrelevant, as the additive bias at South Fork is inferred to be essentially constant (Fig. 4a).
The timevariable biases are complemented by the timeaverage biases, which are quite large at several network sites. The timeaverage sensitivity l deviates from its nominal value of one by more than 30 % at three out of 8 sites (Fig. 5). It tends to be larger than one, as may be expected considering the surface soil moisture SMAP is sensitive to has a larger dynamic range than the moisture at the depth of the probes. The South Fork site of (Fig. 4a) with l<1 is thus somewhat unusual. However, it is typical in that its timeaverage additive bias m is negative. Conversely, the noise level σ inferred using our approach tends to be small, usually on the order of 0.03 m^{3} m^{−3} (Fig. 5). These values are not directly comparable to standard RMSE estimates because our approach disentangles the quasirandom noise from a sensitivity that deviates from 1, temporally variable biases associated with Δτ and in situ errors. Also, the quasirandom errors tend to increase with Δτ (Fig. 5), as predicted by the τ−ω model (Fig. 3a). However, there is considerable variability in the magnitude across the study sties.
4.2.2 Sensitivity analysis
Our sensitivity analyses focus on the reference τ product. When the SMAP dualchannel result is used as the reference τ product, the bias parameters change little for the vast majority of sites (Fig. 6, column 1). When the posterior uncertainties are taken into account, the λ and μ values tend to overlap with those obtained using the SMOS τ product. This is despite potential disadvantages of the SMOS product (different resolution and footprint than SMAP, different retrieval model and use of NDVIbased regularization in the retrieval; Al Bitar et al., 2017) or of the SMAP DC product (e.g. no multiangular information, measurement noiseinduced error correlations with soil moisture estimate). Taken together, the similarity over the network sites indicates that the results are not particularly sensitive to whether the SMOS or the SMAP DC τ product serves as the reference for Δτ.
By contrast, the estimates can change substantially when τ is derived from contemporaneous NDVI data, and predominantly they are smaller in magnitude (Fig. 6, column 2). If the problem with the use of the NDVI climatology in the retrieval were the use of a climatology alone, we would expect similar estimates. Conversely, we would expect the estimates to be smaller in magnitude if it was the link between NDVI and τ that led to an inaccurate vegetation correction, because the NDVIbased Δτ would provide little information on the biases. The smaller estimates that were actually observed may thus indicate that the use of a climatology is not a dominant error source in the SMAP vegetation input data.
The estimates of the timevariable biases are reasonably robust to other aspects of the model specification. The impact of replacing the MERRA2 with the GLDAS2 soil moisture or dropping it altogether is also small (Fig. 6, columns 3 and 4). By additionally including the smoothed SMOS τ as an explanatory variable, potential confounding on Δτ could be assessed: the estimates changed very little for all but one station (Fig. 6, column 5). When allowing the soil moisture parameters A and B to vary seasonally (Eq. 5), the parameter estimates do not change substantially for all sites but one (Fig. 6, column 6). The standard setup of the probabilistic model hence seems adequate for quantifying the SMAP error structure.
4.2.3 Sparse sites
Across the sparse sites within the contiguous US we commonly find pronounced timevariable biases (Fig. 7a and b). While the results over the sparse sites are deemed much less reliable than those over the network sites, they are similar in areas of overlap. The largest changes in sensitivity of λ≈0.3 are found over croplands (e.g. Midwest, Central Valley in California). As predicted by the model, the λ are predominantly positive, but notable exceptions occur in the Mississippi Delta. Large and positive λ are also common over pastures and grasslands. The additive bias parameters μ tend to show a similar spatial pattern, in that they are largest over croplands and grasslands (Fig. 7b).
Large biases over croplands and grasslands are also found when Δτ is computed from the SMAP DC product (Fig. 8a and b). The spatial patterns are very similar. Again, there is a close correspondence between the sparse and network sites. Thus, irrespective of the Δτ product, the analysis of the sparse sites reveals sizeable biases over grasslands in addition to croplands.
To analyse the estimated noise level for all three products, we computed a normalized version σl^{−1}, where the division by l accounts for the different dynamic ranges of the three products by scaling the noise level with respect to the in situ data (Fig. 9; SMOSbased Δτ). SMAP achieves a median value of 0.045 m^{3} m^{−3}, a higher value than that of the in situ data or MERRA2 (0.029 and 0.040 m^{3} m^{−3}, respectively). For all three products, the corresponding values over the network sites are smaller by ∼50 %. For the in situ data, the larger noise level at the sparse sites is not surprising, owing to their limited representativeness. However, direct comparisons could be misleading. For instance, the larger noise level estimates (and greater spread of these estimates) may be partially accounted for by the small number of available networks and by the heterogeneous land cover and vegetation conditions across the sparse sites in the contiguous US.
4.2.4 Vegetation–soil moisture coupling
The observed coupling between vegetation and soil moisture anomalies is larger when using SMAP than when using in situ soil moisture data (Fig. 7c). Positive values of ΔR^{2} are particularly pronounced over croplands (0.12 on average), with the spatial pattern largely conforming to that of the timevariable biases. Conversely, if the timevariable errors in the remotely sensed soil moisture were purely random, the degree of association would decrease, i.e. ΔR^{2}<0. Even though spatial representative errors are likely large for the sparse networks, the ΔR^{2} values are comparable at proximal network sites. When the SMAP DC τ product is used instead, the results are similar in terms of both magnitudes and spatial patterns (Fig. 8c).
While the spatial patterns largely match those of the timevariable biases, the link between them is not clear and not necessarily uniform across all sites. The computation of anomalies largely removes seasonal offsets, which constitute a major fraction of the estimated additive biases. However, it does not remove higherfrequency variations or interannual differences, although the record is too short to reliably study those. Neither can it account for the changes in sensitivity, which are particularly large over croplands. Finally, the in situ soil moisture anomalies, predominantly derived from single probes, are subject to major uncertainties. All these factors likely contribute to the elevated associations between the τ and the SMAP soil moisture anomalies (ΔR^{2}>0), but the precise impact of timevariable biases on our ability to diagnose such interactions remains an open question.
5.1 Detected timevariable errors in the SMAP product
By applying Bayesian triple collocation to the SMAP soil moisture product, we detect timevariable biases. These timevariable biases track the misspecification of the vegetation optical depth Δτ during the soil moisture retrieval. They are both additive and multiplicative, i.e. not only the offset but also the sensitivity changes over time. Especially the changes in sensitivity can be large over croplands, as seasonal variations in L on the order of 30 % (λ≈0.3) deserve attention in future studies (Figs. 4 and 7). The spatial patterns of the timevariable biases largely match those of the temporal autocorrelation estimated by Dong et al. (2018), which were also largest over croplands. These and our findings suggest that the NDVIbased vegetation correction in the SMAP retrieval introduces particularly large errors in agricultural regions.
A mechanistic interpretation of the inferred biases is complicated by a number of poorly understood factors. First, the timevariable biases are relative to the in situ data. The results over the sparse sites should hence be interpreted with caution due to representativeness error, even if they are similar to those at the dense highquality network sites (Fig. 7). Also at the network sites, residual timedependent biases in the in situ data cannot be ruled out completely. Another major uncertainty are errors in the satellitederived τ products, which are not accounted for in the estimation. One reason why these errors are difficult to quantify is that in the context of soil moisture retrieval τ can be considered as essentially a modelinternal effective quantity (Parrens et al., 2017). As such, an observationbased estimate of τ reflects not only the vegetation conditions but also inaccuracies of the tauomega model itself, the way it is parameterized and other environmental conditions. An instance for the latter are roughness changes associated with harvest in croplands (Patton and Hornbuckle, 2013), which likely contribute to the autumnal increase in SMOS τ in Fig. 4a. To a good degree of approximation, roughness changes will be captured by the effective τ that the SMOS or SMAP DC algorithms retrieve from the brightness temperatures (Parrens et al., 2017). Nevertheless, the estimates used in this study will still be affected by errors with respect to this effective quantity that can be random and systematic (e.g. due to the different incidence angles for SMOS and SMAP). The impact of such errors on the estimated biases is unknown, but analogies to simple regression models suggest that they can distort these estimates in either direction (Frost and Thompson, 2002).
While it is premature to attribute the inferred biases completely to an imperfect vegetation correction, there are two lines of reasoning that suggest that the inferred biases are not spurious. First, they are fairly consistent across croplands, and also between sites with sparse and dense in situ networks (Fig. 7). Also, they tend to be large both in absolute terms (e.g. λ>0.1) and compared to the posterior uncertainties. Further, they are also robust to the specification of the input τ product (SMAP DC instead of SMOS) and to several model modifications (Sect. 4.2.2). However, these results are purely descriptive in that they only quantify associations, rather than establishing a causal link. A first step towards such a mechanistic interpretation is the comparison of the timevariable biases with predictions by the τ−ω model. This second line of reasoning suggests that the estimated multiplicative biases λ^{*} are largely consistent with theoretical expectations (Fig. 4). However, this analysis is contingent on (i) the tauomega model being appropriate and correctly specified (e.g. known ω), (ii) there being no confounding biases such as seasonal inundation, and (iii) the sufficient accuracy of the input τ product. It is difficult to dispel these concerns, and indeed the deviations from the predictions (for μ^{*}) may indicate that unconsidered phenomena also contribute to the timevarying biases in addition to those resulting from the vegetation correction.
One further caveat is that timeaverage biases are also present (additive bias: m≠0, sensitivity: l≠1). For instance, the SMAP retrievals at the South Fork site have too low a sensitivity (l<1) and are too dry (negative m) on average (Fig. 4a). Our analysis has focused on the timedependent biases, partly because the timeindependent biases are better known and more commonly compensated for in analyses such as data assimilation studies (Colliander et al., 2017; Kornelsen and Coulibaly, 2015; Yilmaz and Crow, 2013). Also, there are many potential sources for these timeinvariant biases between the retrievals and the in situ data, e.g. the calibration of the in situ probes, the dielectric mixing model in the retrieval, or an offset in the mean input vegetation optical depth (i.e. mean(Δτ)≠0).
We conclude that our key finding is the presence of sizeable timevariable biases in the SMAP product. They are associated with, but likely not entirely caused by, deviations of the a priori τ used in the soil moisture retrieval from τ estimated from contemporaneous microwave data.
5.2 Implications for hydrological applications
The timevarying biases can have a negative impact in many applications. The changing sensitivity impedes the seasonal comparison of soil moisture dynamics, as the same SMAPobserved change corresponds to a wide range of actual soil moisture changes depending on the season (e.g. Fig. 4a). Variable sensitivities are particularly problematic for characterizing droughts, as extreme conditions may not be apparent in the SMAP data when the sensitivity happens to be small (as would happen for reduced vegetation water content). These issues also carry over to interannual comparisons. Interannual differences in the vegetation water content may, owing to the use of a climatology for τ in the retrieval, induce a spurious vegetation signal in the soil moisture retrievals.
The spurious vegetation signal in the soil moisture data may distort estimates of water–vegetation coupling. We find inflated values of R^{2} between the SMOS vegetation optical depth and SMAP soil moisture, whereas purely random noise would decrease the R^{2} (Fig. 7c). While the spatial patterns largely match those of the estimated biases, this does not imply a causal link between the two. However, the inflated R^{2} values hint at potential pitfalls in using remotely sensed soil moisture to study global hydrology.
5.3 Estimating complex error structures
Soil moisture products can be subject to complex, timevariable errors, as revealed by our novel method for estimating such complex error structures from data. Other estimation procedures are conceivable, especially if highquality in situ data are available, and should be explored in the future. Our Bayesian triple collocation approach is widely applicable because it yields consistent estimates of error magnitudes and biases even when no errorfree reference data set is available. It does, however, have to be assumed to be free of systematic error. The method is flexible, so that the error structure parameterization can be adapted to the problem at hand. We hope that this will enable the community to better characterize the uncertainties of remotely sensed soil moisture products. The knowledge of timevariable structural errors is key to improving the products, and it also helps to inform the application of these data sets in practice.
The presented approach could be applied to a wide range of variables besides soil moisture, such as wind speed, land surface fluxes and leaf area index. The issue of nonconstant error sources, be they associated with environmental conditions or varying observational parameters, likely pertains to many such variables. By shedding light onto residual biases, our approach could in the future contribute to the development of improved retrieval approaches.
We developed a probabilistic approach for estimating complex error structures to study timevariable biases in the SMAP soil moisture product. We hypothesized that temporal changes in the error structure arise due to an inaccurate vegetation correction in the retrieval, so that the biases relative to the in situ data track the misspecification in the vegetation optical depth Δτ. Our conclusions are as follows.

Sizeable temporal changes in the offset and the sensitivity were detected, and they were particularly large over croplands (e.g. change in sensitivity ∼30 %).

While the estimated timevariable biases track the Δτ (with respect to contemporaneous SMOS or SMAP dualchannel estimates), they are not necessarily entirely caused by an inappropriate vegetation correction. The attribution to this source is complicated by the potential presence of confounding variables or errors in the reference τ. However, the timevariable multiplicative biases match the expected biases, as predicted by the τ−ω model, in direction and magnitude, suggesting that the vegetation correction is indeed an issue. Conversely, the additive biases only match in direction.

The timevariable biases impede the seasonal comparison of remotely sensed soil moisture values. In particular, extreme conditions like droughts may not be apparent in the SMAP data when the sensitivity happens to be small.

The presented estimation approach is widely applicable because it yields consistent estimates of error magnitude and biases even when no errorfree reference data set is available. Further, it is flexible in that a wide range of different kinds of error structures can be estimated purely from observations.
More generally, our findings illustrate the importance of recognizing timevariable biases in soil moisture products. These products are deemed central to a wide variety of tasks, including policyrelevant research on drought monitoring and the role of vegetation in weather, climate and the economy. The timevariable biases we detect can greatly reduce the extent to which the products can contribute to such tasks, unless they are accounted for. For instance, the distorted estimates of plant–water coupling we find for the SMAP product suggest that naive analyses of how water availability is coupled with vegetation growth and thus crop yields may give rise to misleading results. As similar phenologydependent biases may also be present in other remotely sensed products, it is necessary to quantify and account for such seasonal and interannual biases when using the products in ecohydrology, agronomy and related applications.
Timevariable biases should thus be considered in future uncertainty analyses. Previous mission requirements have largely focused on the RMSE error metric, which, however, cannot distinguish between such systematic errors and white noise. Because neglecting that distinction can easily give rise to misleading interpretations, it is important that timevariable biases be quantified. The robust estimation approaches like that developed here can help to quantify and mitigate these biases, and thus to exploit the full potential of observational data sets.
The remotely sensed and reanalysis data are available at the URLs provided in the references (registration generally required). The sparse in situ data can be found at http://ismn.geo.tuwien.ac.at/. The SMAP network in situ data are available from https://nsidc.org/data/nsidc0712 until October 2016; the 2017 data are expected to be uploaded in early 2018; until then, they are available from the authors upon request (Colliander, 2017). The Python code for Bayesian triple collocation has been made available at https://github.com/szwieback/BayesianTripleCollocation (Zwieback, 2018).
The supplement related to this article is available online at: https://doi.org/10.5194/hess2244732018supplement.
SZ, AB and AC devised the study; SZ developed the technique and performed the data analysis; MC, JMF, HM, PS, MT, AB provided data and expertise on the core validation sites; SZ was the lead author, with inputs from all coauthors.
The authors declare that they have no conflict of interest.
The authors are grateful to Kaighin McColl, Dara Entekhabi and his group, and
the SMAP team for comments and suggestions. They thank Alexandra Konings and
Wade Crow for insightful reviews. Simon Zwieback acknowledges support from the Swiss
National Science Foundation (P2EZP2_168789). The support of the Canadian
Space Agency, Environment and Climate Change Canada and the Natural Sciences
and Engineering Research Council of Canada is acknowledge for support of the
Kenaston network. A contribution to this work was made at the Jet Propulsion
Laboratory, California Institute of Technology, under a contract with the
National Aeronautics and Space Administration. The University of Salamanca
team's involvement in this study was supported by the Spanish Ministry of
Economy and Competitiveness with the project PROMISES: ESP201567549C3 and
the European Regional Development Fund (ERDF).
Edited by: Graham Jewitt
Reviewed by: Alexandra Konings and Wade Crow
Adegoke, J. O. and Carleton, A. M.: Relations between Soil Moisture and Satellite Vegetation Indices in the U.S. Corn Belt, J. Hydrometeorol., 3, 395–405, https://doi.org/10.1175/15257541(2002)003<0395:RBSMAS>2.0.CO;2, 2002. a
Al Bitar, A., Mialon, A., Kerr, Y. H., Cabot, F., Richaume, P., Jacquette, E., Quesney, A., Mahmoodi, A., Tarot, S., Parrens, M., AlYaari, A., Pellarin, T., RodriguezFernandez, N., and Wigneron, J.P.: The global SMOS Level 3 daily soil moisture and brightness temperature maps, Earth Syst. Sci. Data, 9, 293–315, https://doi.org/10.5194/essd92932017, 2017. a, b
Bell, J. E., Palecki, M. A., Baker, C. B., Collins, W. G., Lawrimore, J. H., Leeper, R. D., Hall, M. E., Kochendorfer, J., Meyers, T. P., Wilson, T., and Diamond, H. J.: U.S. Climate Reference Network Soil Moisture and Temperature Observations, J. Hydrometeorol., 14, 977–988, https://doi.org/10.1175/JHMD120146.1, 2013. a
Brooks, S. P. and Gelman, A.: General Methods for Monitoring Convergence of Iterative Simulations, J. Comput. Graph. Stat., 7, 434–455, https://doi.org/10.1080/10618600.1998.10474787, 1998. a
Chan, S., Bindlish, R., O'Neill, P., Jackson, T., Njoku, E., Dunbar, S., Chaubell, J., Piepmeier, J., Yueh, S., Entekhabi, D., Colliander, A., Chen, F., Cosh, M., Caldwell, T., Walker, J., Berg, A., McNairn, H., Thibeault, M., MartínezFernández, J., Uldall, F., Seyfried, M., Bosch, D., Starks, P., Collins, C. H., Prueger, J., van der Velde, R., Asanuma, J., Palecki, M., Small, E., Zreda, M., Calvet, J., Crow, W., and Kerr, Y.: Development and assessment of the SMAP enhanced passive soil moisture product, Remote Sens. Environ., 204, 931–941, https://doi.org/10.1016/j.rse.2017.08.025, 2017. a, b, c
Chan, S. K., Bindlish, R., O'Neill, P. E., Njoku, E., Jackson, T., Colliander, A., Chen, F., Burgin, M., Dunbar, S., Piepmeier, J., Yueh, S., Entekhabi, D., Cosh, M. H., Caldwell, T., Walker, J., Wu, X., Berg, A., Rowlandson, T., Pacheco, A., McNairn, H., Thibeault, M., MartínezFernändez, J., GonzálezZamora, Á., Seyfried, M., Bosch, D., Starks, P., Goodrich, D., Prueger, J., Palecki, M., Small, E. E., Zreda, M., Calvet, J. C., Crow, W. T., and Kerr, Y.: Assessment of the SMAP Passive Soil Moisture Product, IEEE T. Geosci. Remote, 54, 4994–5007, https://doi.org/10.1109/TGRS.2016.2561938, 2016. a
Colliander, A.: SMAP/In Situ Core Validation Site Land Surface Parameters MatchUp Data, Version 1, NASA National Snow and Ice Data Center Distributed Active Archive Center, https://doi.org/10.5067/DXAVIXLY18KM, 2017. a, b
Colliander, A., Jackson, T., Bindlish, R., Chan, S., Das, N., Kim, S., Cosh, M., Dunbar, R., Dang, L., Pashaian, L., Asanuma, J., Aida, K., Berg, A., Rowlandson, T., Bosch, D., Caldwell, T., Caylor, K., Goodrich, D., al Jassar, H., LopezBaeza, E., MartínezFernández, J., GonzálezZamora, A., Livingston, S., McNairn, H., Pacheco, A., Moghaddam, M., Montzka, C., Notarnicola, C., Niedrist, G., Pellarin, T., Prueger, J., Pulliainen, J., Rautiainen, K., Ramos, J., Seyfried, M., Starks, P., Su, Z., Zeng, Y., van der Velde, R., Thibeault, M., Dorigo, W., Vreugdenhil, M., Walker, J., Wu, X., Monerris, A., O'Neill, P., Entekhabi, D., Njoku, E., and Yueh, S.: Validation of SMAP surface soil moisture products with core validation sites, Remote Sens. Environ., 191, 215–231, https://doi.org/10.1016/j.rse.2017.01.021, 2017. a, b, c, d, e, f
Diamond, H. J., Karl, T. R., Palecki, M. A., Baker, C. B., Bell, J. E., Leeper, R. D., Easterling, D. R., Lawrimore, J. H., Meyers, T. P., Helfert, M. R., Goodge, G., and Thorne, P. W.: U.S. Climate Reference Network after One Decade of Operations: Status and Assessment, B. Am. Meteorol. Soc., 94, 485–498, https://doi.org/10.1175/BAMSD1200170.1, 2013. a
Didan, K.: MYD13A2: MODIS/Aqua Vegetation Indices 16Day L3 Global 1 km SIN Grid V006, https://doi.org/10.5067/MODIS/MYD13A2.006, 2017. a
Doherty, J. and Welter, D.: A short exploration of structural noise, Water Resources Research, 46, w05525, https://doi.org/10.1029/2009WR008377, , 2010. a, b
Dong, J., Crow, W. T., and Bindlish, R.: The Error Structure of the SMAP Single and Dual Channel Soil Moisture Retrievals, Geophys. Res. Lett., 45, 758–765, https://doi.org/10.1002/2017GL075656, 2018. a
Dorigo, W., Wagner, W., Albergel, C., Albrecht, F., Balsamo, G., Brocca, L., Chung, D., Ertl, M., Forkel, M., Gruber, A., Haas, E., Hamer, P. D., Hirschi, M., Ikonen, J., de Jeu, R., Kidd, R., Lahoz, W., Liu, Y. Y., Miralles, D., Mistelbauer, T., NicolaiShaw, N., Parinussa, R., Pratola, C., Reimer, C., van der Schalie, R., Seneviratne, S. I., Smolander, T., and Lecomte, P.: ESA CCI Soil Moisture for improved Earth system understanding: Stateofthe art and future directions, Remote Sens. Environ., 203, 185–215, https://doi.org/10.1016/j.rse.2017.07.001, 2017. a
Entekhabi, D., Reichle, R., Koster, R., and Crow, W.: Performance Metrics for Soil Moisture Retrievals and Application Requirements, J. Hydrometeorol., 11, 832–840, 2010. a
Friedl, M., MenasheSulla, D., and MOSAPS SIPS: MCD12C1 MODIS/Terra+Aqua Land Cover Type Yearly L3 Global 0.05Deg CMG, NASA LP DAAC, Sioux Falls, South Dakota, USA, https://doi.org/10.5067/MODIS/MCD12C1.006, 2017. a
Frost, C. and Thompson, S. G.: Correcting for regression dilution bias: comparison of methods for a single predictor variable, J. Roy. Stat. Soc. A, 163, 173–189, https://doi.org/10.1111/1467985X.00164, 2002. a
Gelaro, R., McCarty, W., Suárez, M. J., Todling, R., Molod, A., Takacs, L., Randles, C. A., Darmenov, A., Bosilovich, M. G., Reichle, R., Wargan, K., Coy, L., Cullather, R., Draper, C., Akella, S., Buchard, V., Conaty, A., da Silva, A. M., Gu, W., Kim, G.K., Koster, R., Lucchesi, R., Merkova, D., Nielsen, J. E., Partyka, G., Pawson, S., Putman, W., Rienecker, M., Schubert, S. D., Sienkiewicz, M., and Zhao, B.: The ModernEra Retrospective Analysis for Research and Applications, Version 2 (MERRA2), J. Climate, 30, 5419–5454, https://doi.org/10.1175/JCLID160758.1, 2017. a
Gelman, A., Jakulin, A., Pittau, M. G., and Su, Y.S.: A Weakly Informative Default Prior Distribution for Logistic and Other Regression Models, Ann. Appl. Stat., 2, 1360–1383, 2008. a, b, c
Global Modeling and Assimilation Office: MERRA2 tavg1_2d_lnd_Nx: 2d,1Hourly,TimeAveraged,SingleLevel,Assimilation,Land Surface Diagnostics V5.12.4, Goddard Earth Sciences Data and Information Services Center, Greenbelt, Maryland, USA, https://doi.org/10.5067/RKPHT8KC1Y1T, 2017. a
Gruber, A., Su, C.H., Zwieback, S., Crow, W., Dorigo, W., and Wagner, W.: Recent advances in (soil moisture) triple collocation analysis, Int. J. Appl. Earth Obs. Geoinf., 45, 200–211, https://doi.org/10.1016/j.jag.2015.09.002, 2016. a, b, c, d, e
Harvey, A. C.: Estimating regression models with multiplicative heteroscedasticity, Econometrica, 44, 461–465, 1976. a
Hoffman, M. and Gelman, A.: The NoUTurn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo, J. Mach. Learn. Res., 15, 1351–1381, 2014. a
Konings, A. G., Piles, M., Das, N., and Entekhabi, D.: Lband vegetation optical depth and effective scattering albedo estimation from SMAP, Remote Sens. Environ., 198, 460–470, https://doi.org/10.1016/j.rse.2017.06.037, 2017. a
Kornelsen, K. and Coulibaly, P.: Reducing multiplicative bias of satellite soil moisture retrievals, Remote Sens. Environ., 165, 109–122, https://doi.org/10.1016/j.rse.2015.04.031, 2015. a
Kurum, M., Lang, R. H., O'Neill, P. E., Joseph, A. T., Jackson, T. J., and Cosh, M. H.: A FirstOrder Radiative Transfer Model for Microwave Radiometry of Forest Canopies at LBand, IEEE T. Geosci. Remote, 49, 3167–3179, https://doi.org/10.1109/TGRS.2010.2091139, 2011. a, b, c
Lawrence, H., Wigneron, J.P., Richaume, P., Novello, N., Grant, J., Mialon, A., Bitar, A. A., Merlin, O., Guyon, D., Leroux, D., Bircher, S., and Kerr, Y.: Comparison between SMOS Vegetation Optical Depth products and MODIS vegetation indices over crop zones of the USA, Remote Sens. Environ., 140, 396–406, https://doi.org/10.1016/j.rse.2013.07.021, 2014. a
Loew, A. and Schlenz, F.: A dynamic approach for evaluating coarse scale satellite soil moisture products, Hydrol. Earth Syst. Sci., 15, 75–90, https://doi.org/10.5194/hess15752011, 2011. a, b, c
MacKay, D. J. C.: Information Theory, Inference, and Learning Algorithms, Cambridge University Press, Cambridge, 2003. a, b, c
Momen, M., Wood, J. D., Novick, K. A., Pangle, R., Pockman, W. T., McDowell, N. G., and Konings, A. G.: Interacting Effects of Leaf Water Potential and Biomass on Vegetation Optical Depth, J. Geophys. Res.Biogeo., 122, 3031–3046, https://doi.org/10.1002/2017JG004145, 2017. a
O'Neill, P., Chan, S., Njoku, E., Jackson, T., and Bindlish, R.: SMAP Enhanced L2 Radiometer HalfOrbit 9 km EASEGrid Soil Moisture, Version 1, NASA National Snow and Ice Data Center Distributed Active Archive Center, Boulder, Colorado, USA, https://doi.org/10.5067/CE0K6JS5WQMM, 2017. a, b
Palecki, M., Lawrimore, J., Leeper, R., Bell, J., Embler, S., and Casey, N.: U.S. Climate Reference Network Processed Data from USCRN Database Version 2, National Centers for Environmental Information, NESDIS, NOAA, US Department of Commerce, Asheville, North Carolina, USA, https://doi.org/10.7289/V5MS3QR9, 2017. a
Parrens, M., Wigneron, J.P., Richaume, P., Al Bitar, A., Mialon, A., FernandezMoran, R., AlYaari, A., O'Neill, P., and Kerr, Y.: Considering combined or separated roughness and vegetation effects in soil moisture retrievals, Int. J. Appl. Earth Obs. Geoinf., 55, 73–86, https://doi.org/10.1016/j.jag.2016.11.001, 2017. a, b
Patton, J. and Hornbuckle, B.: Initial Validation of SMOS Vegetation Optical Thickness in Iowa, IEEE Geosci. Remote Sens. Lett., 10, 647–651, https://doi.org/10.1109/LGRS.2012.2216498, 2013. a, b
Roddell, M. and Beaudoing, H.: GLDAS Noah Land Surface Model L4 3 hourly 0.25×0.25 degree V2.1, NASA GESDISC DATA ARCHIVE, Greenbelt, Maryland, USA, https://doi.org/10.5067/E7TYRXPJKWOQ, 2017. a
Salvatier, J., Wiecki, T. V., and Fonnesbeck, C.: Probabilistic programming in Python using PyMC3, PeerJ. Comput. Sci., 2, e55, https://doi.org/10.7717/peerjcs.55, 2016. a
Schaefer, G. L., Cosh, M. H., and Jackson, T. J.: The USDA Natural Resources Conservation Service Soil Climate Analysis Network (SCAN), J. Atmos. Ocean. Tech., 24, 2073–2077, https://doi.org/10.1175/2007JTECHA930.1, 2007. a
Su, C.H. and Ryu, D.: Multiscale analysis of bias correction of soil moisture, Hydrol. Earth Syst. Sci., 19, 17–31, https://doi.org/10.5194/hess19172015, 2015. a
Su, C.H., Ryu, D., Dorigo, W., Zwieback, S., Gruber, A., Albergel, C., Reichle, R. H., and Wagner, W.: Homogeneity of a global multisatellite soil moisture climate data record, Geophys. Res. Lett., 43, 11245–11252, https://doi.org/10.1002/2016GL070458, 2016. a
Tomer, M., Moorman, T., and Rossi, C.: Assessment of the Iowa River's South Fork watershed: Part 1. Water quality, J. Soil Water Conserv., 63, 360–370, 2008. a
Xu, T., Valocchi, A. J., Ye, M., and Liang, F.: Quantifying model structural error: Efficient Bayesian calibration of a regional groundwater flow model using surrogates and a datadriven error model, Water Resour. Res., 53, 4084–4105, https://doi.org/10.1002/2016WR019831, 2017. a
Yilmaz, M. T. and Crow, W. T.: The Optimality of Potential Rescaling Approaches in Land Data Assimilation, J. Hydrometeorol., 14, 650–660, https://doi.org/10.1175/JHMD12052.1, 2013. a, b, c, d
Zwieback, S., Scipal, K., Dorigo, W., and Wagner, W.: Structural and statistical properties of the collocation technique for error characterization, Nonlin. Processes Geophys., 19, 69–80, https://doi.org/10.5194/npg19692012, 2012. a
Zwieback, S., Su, C.H., Gruber, A., Dorigo, W. A., and Wagner, W.: The Impact of Quadratic Nonlinear Relations between Soil Moisture Products on Uncertainty Estimates from Triple Collocation Analysis and Two Quadratic Extensions, J. Hydrometeorol., 17, 1725–1743, https://doi.org/10.1175/JHMD150213.1, 2016. a, b
Zwieback, S.: Bayesian Triple Collocation, https://doi.org/10.5281/zenodo.1326856, Zenodo; Geneva, Switzerland, 2018. a