Probabilistic inference of ecohydrological parameters using observations from point to satellite scales

Vegetation controls on soil moisture dynamics are challenging to measure and translate into scaleand sitespecific ecohydrological parameters for simple soil water balance models. We hypothesize that empirical probability density functions (pdfs) of relative soil moisture or soil saturation encode sufficient information to determine these ecohydrological parameters. Further, these parameters can be estimated through inverse modeling of the analytical equation for soil saturation pdfs, derived from the commonly used stochastic soil water balance framework. We developed a generalizable Bayesian inference framework to estimate ecohydrological parameters consistent with empirical soil saturation pdfs derived from observations at point, footprint, and satellite scales. We applied the inference method to four sites with different land cover and climate assuming (i) an annual rainfall pattern and (ii) a wet season rainfall pattern with a dry season of negligible rainfall. The Nash–Sutcliffe efficiencies of the analytical model’s fit to soil observations ranged from 0.89 to 0.99. The coefficient of variation of posterior parameter distributions ranged from < 1 to 15 %. The parameter identifiability was not significantly improved in the more complex seasonal model; however, small differences in parameter values indicate that the annual model may have absorbed dry season dynamics. Parameter estimates were most constrained for scales and locations at which soil water dynamics are more sensitive to the fitted ecohydrological parameters of interest. In these cases, model inversion converged more slowly but ultimately provided better goodness of fit and lower uncertainty. Results were robust using as few as 100 daily observations randomly sampled from the full records, demonstrating the advantage of analyzing soil saturation pdfs instead of time series to estimate ecohydrological parameters from sparse records. Our work combines modeling and empirical approaches in ecohydrology and provides a simple framework to obtain scaleand site-specific analytical descriptions of soil moisture dynamics consistent with soil moisture observations.


Introduction
The movement of water from soils, through plants, and back to the atmosphere via transpiration is a critical component of local and global hydrologic cycles and is the largest surfaceto-atmosphere water pathway (Good et al., 2015).A realistic analytical description of soil moisture dynamics is key to understanding ecohydrological processes that regulate the productivity of natural and managed ecosystems.Rodriguez-Iturbe et al. (1999) introduced a simple framework using a bucket model of soil-column hydrology forced with stochastic precipitation inputs where soil water losses are only a function of relative soil moisture or soil saturation.Given this ecohydrological framework, the analytical equation for the probability density function (pdf) of soil saturation depends on simple abiotic characteristics such as average climate and soil texture, and biotic characteristics including soil saturation thresholds at which vegetation can influence soil water losses.However, the shapes of analytical soil saturation pdfs are generally not consistent with observations when literature values for model parameters are used (Miller et al., 2007).Some parameters such as field capacity and wilting point do not correspond to conventional definitions, because of simplifications made to describe soil water loss processes in the model, and need to be calibrated (Dralle and Thomspon, Published by Copernicus Publications on behalf of the European Geosciences Union.2016).To our knowledge, parameters of the analytical soil saturation pdfs have not been directly calibrated to empirical pdfs derived from measurements beyond the point scale.Observation networks provide freely available point-scale, spatially integrated soil moisture observations, while remotely sensed soil moisture observations are available through satellite products.These data sources create an opportunity to (i) evaluate whether analytical soil saturation pdfs are consistent with observations across a range of scales, and (ii) determine average ecohydrological parameters relevant to each scale.
Estimates of ecohydrological parameters are used in a large range of applications for which the stochastic soil water balance framework has been used and adapted, including the effects of climate, soil, and vegetation on soil moisture dynamics (Laio et al., 2001a;Rodriguez-Iturbe et al., 2001;Porporato et al., 2004); ecohydrological factors driving spatial and structural characteristics of vegetation (Caylor et al., 2006;Manfreda et al., 2017); soil salinization dynamics (Suweis et al., 2010); biological soil crusts (Whitney et al., 2017); vegetation stress; optimum plant water use strategies and plant hydraulic failure (Laio et al., 2001b;Manzoni et al., 2014;Feng et al., 2017); vertical root distributions (Laio et al., 2006); plant pathogen risk (Thomspon et al., 2013); streamflow persistence in seasonally dry landscapes (Dralle et al., 2016); and soil water balance partitioning (Good et al., 2014(Good et al., , 2017)).A survey of nearly 400 ecohydrology publications revealed that 40 % of studies relied heavily on simulation, rarely integrated empirical measurements, and were almost never coupled with experimental studies, suggesting a critical need to combine modeling and empirical approaches in ecohydrology (King and Caylor, 2011).Only a few studies have directly confronted the governing equations of the stochastic soil water balance model with observed soil moisture data, and even fewer studies have attempted to optimize model parameters to best fit soil moisture observations.Miller et al. (2007) calibrated soil saturation pdfs to project vegetation stress in a changing climate.Dralle and Thompson (2016) developed an analytical expression for annually integrated soil saturation pdfs under seasonal climates and then calibrated soil saturation thresholds between which evapotranspiration is maximum and zero to compare the model to soil moisture observations at a savanna site.Chen et al. (2008) related evapotranspiration observations at the stand scale to soil moisture values using a Bayesian inversion approach, and Volo et al. (2014) calibrated the soil moisture loss curve to investigate effects of irrigation scheduling and precipitation on soil moisture dynamics and plant stress.The functional form of the soil moisture losses was approximated using conditionally averaged precipitation (Salvucci, 2001;Saleem and Salvucci, 2002) and remotely sensed data (Tuttle and Salvucci, 2014).The timescale of soil moisture dry-downs, derived from the soil moisture loss equations, was parameterized using evapotranspiration measured at micro-meteorological stations (Teuling et al., 2006) and space-borne near-surface soil moisture observations (McColl et al., 2017).These studies indicate that the ecohydrological soil water balance framework is consistent with ground and larger-scale remotely sensed measurements.
Parameters representative of larger-scale observations are necessary to characterize ecohydrological processes at ecosystem scales and are more relevant to ecohydrological modeling.These larger-scale parameters integrate a range of ecohydrological interactions that are poorly understood and difficult to measure.Abiotic controlling factors of soil water balance including rainfall and soil texture can generally be assessed from readily available data, including site measurements, regionalized maps, and satellite observations, but vegetation controls on soil water dynamics are largely unknown and difficult to measure at hydrologically meaningful scales (Li et al., 2017).Vegetation water-use traits are generally observed at the species level and are not easily translated to the simple parameters necessary in soil water balance models.The rate of soil water losses from the near-surface soil layer, where soil moisture measurements are generally made, do not precisely correspond to evapotranspiration observed or calculated from meteorological stations.We thus focused on estimating parameters that are not directly observable, particularly the soil saturation thresholds at which vegetation controls soil water losses and the maximum rate of evapotranspiration from a near-surface soil layer.We use an inverse modeling approach and data that are commonly collected at environmental monitoring sites or measured from satellites.We present an inference framework that provides a means to quantify and compare the sensitivity of soil moisture dynamics at varying scales through estimates of simple ecohydrological parameters.
A number of studies have combined inverse modeling approaches with ground and remotely sensed soil moisture data to extract meaningful hydrologic information (Xu et al., 2006;Miller et al., 2007;Chen et al., 2008;Volo et al., 2014;Wang et al., 2016;Baldwin et al., 2017).Bayesian inference methods are effective in relating prior pdfs of observations to posterior estimates of model parameters (Xu et al., 2006;Chen et al., 2008;Baldwin et al., 2017).The soil water balance model provides a direct analytical equation for soil saturation pdfs that is convenient to use with the Bayesian paradigm because it is a low parameter model with few data inputs.We selected a Bayesian inversion approach instead of a least-squares or maximum likelihood approach because it quantifies the inference uncertainty and improves upon the work of Miller et al. (2007), which used a least-squares approach to calibrate soil saturation pdfs.Measures of inference uncertainty and parameter convergence diagnostics provided by the Bayesian approach can be used to evaluate the validity of model inversion and develop criteria to generalize the presented framework.
We assume that if a sufficient range of soil moisture values are observed at a site, the shape of the empirical soil satura-tion pdf is constrained by the ecohydrological factors driving soil moisture dynamics.We hypothesize that key information required to determine these ecohydrological factors is encoded in empirical soil saturation pdfs and that this information can be extracted by calculating the inverse of the commonly used stochastic soil water balance.The analysis of soil saturation pdfs is a more robust and integrated approach to investigate ecohydrological factors of soil water dynamics than is time series analysis.Soil saturation pdfs are less sensitive to the many sources of uncertainty, sensor noise, and common gaps in soil moisture observations and do not require high-quality, co-located, and concurrent hydrologic measurements that are often lacking.We tested three key assumptions embedded in the proposed method.(i) The analytical soil saturation pdfs properly describe empirical soil saturation pdfs observed in annual data.Annual soil moisture records can be affected by transitional dynamics between wet and dry seasons, and the appropriate level of model complexity must be used.We compare parameter identifiability using an annual and a seasonal formulation of the analytical soil saturation pdfs.(ii) Parameter estimates and their uncertainty at point, footprint, and satellite scales are different and reflect variability in soil water dynamics.We determine whether the inference approach can be applied at point, footprint, and satellite scales to provide appropriate scale-specific parameters for ecohydrological modeling.(iii) The range of realizable soil moisture values is captured by the selected time series and the soil saturation pdf determined from these observations is not truncated.We determine whether the inference method based on soil saturation pdfs is robust against reduced data availability by repeating the model inversions on subsets of the soil moisture time series and show that the method can be applied to sparse datasets.
Our goal was to match empirical soil saturation pdfs derived from point-, footprint-, and satellite-scale observations to a commonly used analytical model.We demonstrate the use of a Bayesian inversion framework to calibrate the ecohydrological parameters of a simple stochastic soil water balance model that best fit empirical soil saturation pdfs.We first present data sources, define the analytical model for soil saturation pdfs including parameter assumptions, and detail the algorithm used in the Bayesian inversion.Then, we present a summary of the goodness of fit of optimal analytical soil saturation pdfs and estimated parameter uncertainty.We evaluated results to test key method assumptions including model complexity and data availability.Finally, we discuss the potential of the approach to provide a simple means to investigate variability in ecohydrological controlling factors at varying spatial scales.Our work combines modeling and empirical approaches in ecohydrology to provide more realistic analytical descriptions of soil moisture dynamics.Estimates of ecohydrological parameters consistent with observed soil saturation pdfs, from point to ecosystem scales, are needed to better characterize site-specific ecohydrological processes.
2 Data and methods

Data
We used daily soil moisture observations from three data products at three spatial scales.We used pointscale soil moisture data at a depth of 10 cm from the FLUXNET2015 data product (http://fluxnet.fluxdata.org/data/fluxnet2015-dataset/, last access: 22 October 2016).We used footprint-scale soil moisture data from the Cosmicray Soil Moisture Observing System (COSMOS) (http: //cosmos.hwr.arizona.edu/Probes/probelist.html,last access: 4 August 2017).The COSMOS soil moisture footprint measures soil moisture at an average depth of 20 cm with a radius ranging from 130 to 240 m, depending on site characteristics (Köhli et al., 2015).Near-surface soil moisture observations at a spatial resolution of 0.25 • were taken from the European Space Agency's (ESA) Climate change Initiative (CCI) project.We used the combined soil moisture product (ECV-SM, version 0.2.2) that merges soil moisture retrievals from four passive (SMMR, SMM/I, TMI, and ASMR-E) and two active (AMI and ASCAT) coarseresolution microwave sensors (Liu et al., 2011(Liu et al., , 2012;;Wagner et al., 2012).Although the ECV-SM sensing depth is < 5 cm, it has been shown to have a close relation to ground-based observations of soil moisture in the upper 10 cm (Dorigo et al., 2015).We compiled daily rainfall time series from the FLUXNET2015 dataset for the point-and footprintscale analysis, and the National Aeronautics and Space Administration's (NASA) Tropical Rainfall Measuring Mission (TRMM) dataset (Huffman et al., 2007) for the satellitescale analysis.
We selected four sites with soil moisture and rainfall data available for the 2012 calendar year (Fig. 1, Table 1).Selected sites spanned a range of land cover types, including crop and grasslands, oak savanna, deciduous forest and pine forest.We determined the dominant soil texture of the upper soil layer from the Harmonized World Soil Database (HWSD) (version 1.2) (FAO/IIASA/ISRIC/ISS-CAS/JRC, 2012) for each site.We used soil porosity values, derived from the HWSD available as ancillary data through the ESA-CCI data product, for the satellite-scale analysis.We used the maximum soil moisture observation during the year 2012 as a site-specific soil porosity estimate for pointand footprint-scale data products.We used soil porosity for each site to calculate soil saturation s (0 ≤ s ≤ 1) from each observed soil moisture value.We do not expect the differences in data quality between data sources and sites to significantly affect empirical soil saturation pdfs and resulting parameter estimates.All sites had full records of daily pointand footprint-scale observations except for US-Me2, which had 55 missing footprint-scale observations during winter when the ground was saturated and frozen.The number of daily satellite-scale observations in the 2012 records ranged from 202 to 283.

Analytical model for soil saturation probability
density functions (pdfs)

Model definition
Our framework is based on a standard bucket model of soil column hydrology at a point forced with stochastic precipitation inputs and in which soil water losses are a function of soil saturation.We followed the simple formulation of soil water losses in Laio et al. (2001a).We applied two as-sociated analytical formulations for the soil saturation pdf detailed below and derived under the assumption of steady state, wherein parameters are constant for a given period of time.The annual model assumed an annual rainfall pattern and the seasonal model accounted for a wet season rainfall pattern and a dry season of negligible rainfall.
The soil water balance model is defined at a point and a daily time step, for a soil with porosity n, and assuming that soil saturation is uniform in the considered soil column of depth Z. Rainfall, the only input to the soil water balance, is Hydrol.Earth Syst.Sci., 22, 3229-3243, 2018 www.hydrol-earth-syst-sci.net/22/3229/2018/ 21.0 (p,f) , 24.4 (s) 9.04 (p,f) , 11.8 (s) 9.3 (p,f) , 16.9 (s) 8.1 (p,f) , 11.6 (s) α w (mm day −1 ) 21.4 (p,f) , 26.8 (s) 9.1 (p,f) , 11.9 (s) 8.7 (p,f) , 16.7 (s) 7.9 (p,f) , 11.6 (s) λ (day −1 ) 0.05 (p,f) , 0.08 (s) 0.24 (p,f) , 0.20 (s) 0.22 (p,f) , 0.10 (s) 0.24 (p,f) , 0.21 (s) λ w (day 1 ) 0.07 (p,f) , 0.08 (s) 0.27 (p,f) , 0.23 (s) 0.39 (p,f) , 0.17 (s)  0.31 (p,f) , 0.27 0.81 (p) , 0.75 (f) , 0.44 (s) 0.93 (p) , 0.86 (f) , 0.69 (s) 0.75 (p) , 0.83 (f) , 0.69 (s) 0.94 (p) , 0.60 (f) , 0.72 (s) s min (-) 0.15 (p) , 0.19 (f) , 0.19 (s) 0.28 (p) , 0.44 (f) , 0.30 (s) 0.11 (p) , 0.22 (f) , 0.17 (s) 0.27 (p) , 0.14 (f) , 0.23 (s) s max (-) 0.44 (p) , 0.42 (f) , 0.33 (s) 0.71 (p) , 0.68 (f) , 0.59 (s) 0.38 (p) , 0.49 (f) , 0.38 (s) 0.64 (p) , 0.35 (f) , 0.50 (s) Standard deviation s (-) 0.21 (p) , 0.19 (f) , 0.11 (s) 0.21 (p) , 0.11 (f) , 0.12 (s) 0.25 (p) , 0.23 (f) , 0.17 (s) 0.25 (p) , 0.16 (f) , 0.18 (s)   Latitude and longitude in parentheses correspond the centroid of the satellite area associated with the site location; MAT, mean annual temperature from long-term FLUXNET2015 data; MAP, mean annual precipitation from long-term FLUXNET2015 data; soil texture taken from the HWSD; n, porosity; K s , saturated soil hydraulic conductivity; b, pore size distribution index; s h , hydroscopic point; s fc , field capacity; α, observed average daily rainfall depth in 2012; the subscript "w" indicates that α was computed for only the wet season months; λ, observed average daily rainfall frequency in 2012; the subscript "w" indicates that λ was computed for only the wet season months; t d , number of days in the dry season; superscripts (p) , (f) , and (s) correspond to values used for the point-, footprint-, and satellite-scale analysis.Citations for each FLUXNET2015 site: Biraud (2002), Novick and Phillips (1999), Law (2002), and Baldocchi (2001).treated as a Poisson distribution characterized by an average event frequency λ and average event intensity α.For simplicity, we assumed that the rainfall applied was equal to the amount that reached the ground surface and that interception by vegetation was negligible.Interception may be a significant component of the soil water balance at forested sites and may need to be considered in future extensions of this work.The daily soil water balance is the difference between the rate of rainfall infiltration ϕ and the rate of soil moisture losses χ : ϕ[s(t); t] is both a stochastic process controlled by rainfall and also a state-dependent process because excess rainfall relative to available soil storage is converted to surface runoff.the soil moisture loss curve, χ[s(t)], includes leakage losses due to gravity and evapotranspiration and is described in stages determined by five soil saturation thresholds (Laio et al., 2001a).These stages are: (i) the saturation point (s = 1), at which all pores are filled with water; (ii) the field capacity (s fc ), at which soil-gravity drainage becomes neg-ligible compared to evaporation; (iii) the point of incipient stomata closure (s * ), at which plants begin to reduce transpiration from water stress; (iv) the wilting point (s w ), at which plants cease to transpire; and (v) the hydroscopic point (s h ), at which water is bound to the soil matrix.Soil water losses are controlled by physical soil properties for saturation states above s fc .The rate of leakage due to gravity is assumed maximum when soil is saturated (K s ) and decays exponentially to zero at s fc (Brooks and Corey, 1964).Soil water losses are controlled by micro-meteorological conditions for saturation states between s fc and s * .The rate of evapotranspiration is assumed to occur at a maximum rate (E max ), independent of the saturation state.Soil water losses are controlled primarily by vegetation for saturation states between s * and s w .Plants close their stomata in response to soil water deficits that drive leaf water potential gradients, as well as to atmospheric vapor pressure deficits, and evapotranspiration decreases linearly from E max to E w at s w .Soil water losses are controlled by soil diffusivity for soil saturation states below s w , and soil evaporation decreases linearly from E w to zero at s h .Soil water losses are negligible for soil saturation states below s h .For this simplified theoretical description of the soil water loss curve and stochastic rainfall forcing, the analytical solution of the steady state probability distributions of soil saturation, p(s) , was given by Laio et al. (2001a): where where b is an experimentally determined parameter used in the Clapp and Hornberger (1978) soil water retention curve, and the constant C can be obtained numerically to ensure the integral of p(s) = 1.We used a simplifying relation E w = 0.05E max to reduce the number of parameters.We adopted Dralle and Thompson's (2016) framework to account for transient dynamics between wet and dry seasons.We defined the dry season as a period of duration t d in which precipitation was negligible and assumed to not contribute to soil moisture.During that period, we assumed soil saturation decayed from an initial value s 0 to s(t d , s 0 ) given by Laio et al. (2001a).For simplicity, we determined t d using rainfall records at a monthly step (see Sect. 2.2.2) and s 0 was the soil saturation value on the last day of the wet season.Note that we did not define s 0 as the soil saturation following the last significant storm of the wet season as was done in prior studies (Dralle and Thompson, 2016).We then calculated the annual soil saturation pdf (p wd (s)) as the weighted sum of the wet and dry season pdfs, p w (s) and p d (s), respectively.
The steady-state solution in Eq. ( 2) was used for the wet season pdf and the dry season pdf is numerically determined by where p 0 (s 0 ) is the pdf of the initial dry season soil saturation, equal to p w (s), and p S d |S 0 (s, s 0 ) is the pdf of dry season soil saturation given an initial condition s 0 .
where η d and η d w are equivalent to η and η w relative to E d max , the maximum evapotranspiration rate during the dry season, and C d is a normalization constant.We used the analytical expression for soil saturation decay, s(t, s 0 ), in the absence of rainfall given by Laio et al. (2001a) to derive p S d |S 0 (s, s 0 ).

Climate, soil, and vegetation parameter characterization
We chose readily available data for rainfall characteristics (λ and α), length of the dry period (t d ), and physical soil parameters (s fc , s h , K s , and b) needed in the analytical models of soil saturation pdfs (Eqs. 2 and 3).We focused on estimating the ecohydrological parameters s * , s w , and E max , which describe vegetation control on soil water losses and are not easily observable.We calculated rainfall characteristics λ and α for the year and wet season months for each site from FLUXNET2015 and TRMM rainfall records following Rodriguez-Iturbe et al. (1984) (Table 1).We used FLUXNET2015 rainfall characteristics for point-and footprint-scale analyses, and we used TRMM rainfall characteristics for the satellite-scale analysis.TRMM rainfall records were generally consistent with ground-based measurements.For each location, we evaluated monthly FLUXNET2015 rainfall depth and categorized consecutive months contributing < 5 % of the site's annual rainfall as dry season months (Fig. 1).We then calculated the length of the dry period (t d ) as the number of days in those dry months.We used physical soil characteristics for soil textures at each site (s h , K s , and b) from Rawls et al. (1982) (Table 1).We estimated s fc from each soil saturation record (Table 1) to be consistent with the assumption that drainage losses are insignificant compared to evapotranspiration losses the day following a rain event.We identified all days in the 2012 record following an observed decrease in soil saturation and estimated s fc as the 95th percentile of the soil saturation value of the selected days.Daily soil saturation below s w and above s fc is rare (Laio et al., 2001a), so we did not expect the average soil texture values for s h and K s to significantly affect the results.Soil depths Z are 10, 20, and 5 cm for the point, footprint, and satellite scales, respectively.E max is only a fraction of the atmospheric moisture demand (or potential evapotranspiration) contributed by that soil depth because we used a soil depth that is shallower than the rooting depth.Consequently our framework includes four (or three if seasonality is ignored) unknown soil water balance parameters, s * , s w , E max , and E d max .We estimated these parameters over the following intervals: where 10 mm day −1 is the pre-defined upper possible boundary for potential evapotranspiration.

Application of the Bayes theorem
We related p(S), the empirical soil saturation pdf of the j = [1, . . ., m] soil saturation observations (s j ), and the analytical soil saturation pdfs in Eqs. ( 2) or (3) derived from the simple soil water balance model in Eq. ( 1) with up to four unknown soil water balance parameters θ = [s * , s w , E max , E d max ] using the Bayes' theorem defined as where the posterior distribution, p(θ |S), is the solution of the inverse problem and describes the probability of model parameters θ given the set S = [s 1 s 2 , . . .s m ] of soil saturation observations.Assuming uninformed prior knowledge, the prior distribution of model parameters θ , p(θ ), were defined by uniform distributions over the intervals (Eq.6).The conditional probability of observations S given model parameters θ , p(S|θ), is the likelihood function of model parameters θ .

Parameter estimation
We used the Metropolis-Hastings Markov chain Monte Carlo (MH-MCMC) technique to estimate the posterior distribution of p(θ |S) by drawing random model samples θ i from p(θ ) and evaluating p(S|θ i ) (Metropolis et al., 1953;Hastings, 1970;Xu et al., 2006).We defined the likelihood function of a model i, p(S|θ i ), as where p(s j |θ i ) is the probability of observation s j given Eqs.
The MH-MCMC technique converges to a stationary distribution according to the ergodicity theorem in Markov chain theory.The sampling algorithm consisted of repeating two steps: (i) a proposing step, in which the algorithm generates a new model θ i using a random function that is symmetric about the previously accepted model θ i , and (ii) a moving step, to determine whether the model should be accepted or rejected, in which θ i is tested against the Metropolis criterion (a) defined as If a > 1, θ i was accepted and θ i+1 = θ i was used for the next sample.If a < 1, a random number p * ∈ [0, 1] was drawn from a uniform distribution and compared to a.If p * < a, θ i was accepted and θ i+1 = θ i was used for the next sample.
If p * > a, θ i was rejected and θ i+1 = θ i was used for the next sample.If θ i was an inconsistent model in which soil saturation thresholds (s w , s * ) were ranked incorrectly or any of the soil water balance parameters (s * , s w , E max , E d max ) were outside of their defined physical bounds, the model likelihood was zero and θ i was never accepted.The log-likelihood was more convenient to compute than the likelihood.The symmetric function used in the proposing step was a Gaussian distribution with a mean value equal to the accepted model θ i and a standard deviation of 1 % of interval range for which each parameter is defined in Eq. ( 6).We selected this value of the standard deviation of each model parameter after a number of test runs to generally ensure an acceptance rate between 20 and 50 % (Roberts and Rosenthal, 2001).We obtained statistics of the estimated parameters in θ from the union of three run samples of 20 000 simulations each.The burn-in period is the number of simulations after which the running mean and standard deviation are stabilized.We considered a burn-in period of 10,000 simulations, which were discarded for each run sample.If the acceptance rate of a run sample was < 1 or > 90 % after the burn-in period, we discarded the run and concluded that the algorithm was stuck in a local minimum that might be physically impossible.We evaluated convergence by the Gelman-Rubin (GR) diagnostic (Gelman and Rubin, 1992) on the run samples.The GR diagnostic determines that the algorithm reaches convergence when the within-run variability (σ w ) is roughly equal to the between-run variability (σ b ), that is, when σ w /σ b approaches one.We verified that the GR diagnostic for each estimated parameter was < 1.1.If the GR diagnostic did not indicate that the three run samples converged, we discarded the run with the lowest likelihood and re-initiated a new run sample until convergence was attained.We counted the number of attempts to quantify how rapidly convergence occurred.We computed mean and standard deviation for each parameter from a total of 30 000 simulations of θ resulting from the three converging run samples.A mean analytical model of soil saturation pdf was determined by applying Eqs. ( 2) or (3) with the mean values of the 30 000 posterior parameter estimates.
M. Bassiouni et al.: Probabilistic inference of ecohydrological parameters

Model evaluation criteria
We did not have direct measurement to validate the parameters s * , s w , and E max estimated through the Bayesian inversion methods.We therefore analysed convergence and uncertainty metrics of the model inversion and goodness of fit between empirical and analytical soil saturation pdfs to evaluate the identifiability of the ecohydrological parameters.We compared the optimum analytical pdf derived from the mean parameter estimates and the empirical pdfs derived from observations.We evaluated the model inversion using the following criteria: i. Convergence of the Bayesian inversion: a GR diagnostic < 1.1 for all unknown parameters is obtained from the union of three run samples and within ≤ 10 sample runs.
ii.Low uncertainty in parameter estimates: the posterior distributions of parameter estimates are physically plausible and have coefficients of variations < 20 %.

Method assessment
Major assumptions and limitations embedded in the proposed inference framework were tested through the analysis detailed below.We assume, for each scale and location, that the shape of empirical the soil saturation pdfs is controlled by the physical constraints used to parameterize the analytical model of soil saturation pdfs, these parameters can be determined with some certainty and reflect variability in soil water dynamics.We expect that estimated soil saturation thresholds have greater certainty when the empirical soil saturation pdf is defined around those values and greater uncertainty when fewer soil saturation values are observed around the thresholds.We acknowledge that pre-defined rainfall characteristics and physical soil parameters based on observations or literature values may not be exactly representative of the processes at each location or scale and could also create biases and uncertainties in the fitted parameters of interest.We used model evaluation criteria (Sect.2.4) to investigate the applicability of the inference framework with varying model complexities, scales, locations and data availability.
i. Analytical expressions for soil saturation pdfs were derived under the assumption of steady state.Annual soil moisture records can be affected by transitional dynamics between wet and dry seasons, and the appropriate level of model complexity must be used.We applied the inversion framework to annual soil saturation using variations of the analytical model for soil saturation pdfs of increasing complexity: (i) the annual model in Eq. ( 2) and (ii) the seasonal model in Eq. ( 3).We determined whether the added complexity of the dry season pdf increases the identifiability of ecohydrological parameters or if the simpler annual model is sufficiently consistent with annual empirical soil saturation pdfs.
ii.We compared co-located parameter estimates and their uncertainty at point, footprint, and satellite scales for each site.We determine whether the inference approach can provide appropriate scale-specific parameters for ecohydrological modeling at each location.
iii.We assumed that the whole range of realizable soil saturation values was captured within the selected time series at each scale and that the resulting soil saturation pdf was not truncated.If the range of observed values is not representative of the soil saturation pdf because it is truncated or affected by noise in the data, parameter estimates may be biased.Minimum and maximum observed soil saturation values during 2012 (Table 1) indicate the range of observed soil saturation values we used to estimate ecohydrological parameters.We determine whether the inference method based on soil saturation pdfs is robust against reduced data availability by repeating the model inversions on subsets of the soil saturation time series and show that the method can be applied to sparse datasets.We performed the model inversion using subsets of each soil saturation record by randomly resampling fractions of the data down to 10 % of the annual timeseries and computed goodness of fit statistics between the resulting analytical models and the empirical models based on the full annual record.
We determined the number of data points necessary to infer converging model parameters that best match observations and whether the proposed inference method based on soil saturation pdf can be reliably used to identify ecohydrological parameters from sparse datasets.
3 Results and discussion

Level of model complexity
For each of the four locations (Table 1), we obtained optimal analytical soil saturation pdfs consistent with the empirical pdfs derived from soil saturation observations using the Bayesian inversion framework and a MH-MCMC algorithm.Model inversions for each site and scale and for both annual and seasonal models met the evaluation criteria (see Sect. 2.4).Our results indicated that the framework of Dralle and Thompson (2016) can be applied to sites with low (US-MMS) and high (US-Ton) seasonality in rainfall patterns.
Posterior probability distributions of soil water balance parameters (s w , s * , E max ) were well constrained overall.The parameter estimates and their coefficient of variation as well as the model goodness of fit statistics are summarized in Table 2. Figures 2 through 5 present a comparison between empirical as well as analytical pdfs and associated quantilequantile plots for point, footprint, and satellite scales at the four study sites and for both annual and seasonal models.The goodness of fit between empirical pdfs and analytical models was only slightly better for the seasonal model than for the annual model.However, the coefficient of variation of the posterior parameter distributions was smaller for the annual model and it converged more rapidly.The Bayesian inversion of the annual model is therefore more computationally efficient.The parameter identifiability was not greatly improved by the more complex seasonal model.The estimated soil saturation threshold s w was consistently smaller for the annual model than for the seasonal model and s * was often higher, which may indicate that s w and s * in the annual model could be biased and may have absorbed dry season dynamics.Previous studies calibrating soil saturation pdf models found ecohydrological parameter values comparable to ours (Table 2).For example, using point-scale observations at US-Ton, best-fit values of s w and s fc were 0.26 and 0.82, respectively (Dralle and Thompson, 2016), and best-fit values of s * and E max were 0.3 and 1.9 mm day −1 , respectively (Miller et al., 2007).We did not compare soil saturation thresholds s * and s w with literature values of soil water potential at which stomata are fully open or closed because the conversion of soil saturation to soil matrix potential is non-linear (Clapp and Hornberger, 1978) and site-and scale-specific soil water retention parameters were unknown.Average parameters derived from soil texture (Rawls et al., 1982) were not compatible with soil moisture data from each scale and site.

Site and scale considerations
Parameter estimates were most constrained for scales and locations at which soil water dynamics are more sensitive to the fitted ecohydrological parameters of interest.In these cases, convergence of the model inversion was attained less rapidly, but ultimately provided better goodness of fit.Soil saturation states at drier sites may be more controlled by soil water loss parameters, while soil saturation states at wetter sites may also be controlled by rainfall characteristics.Estimated soil saturation thresholds had greater certainty if the empirical soil saturation pdfs were defined around those values and had greater uncertainty if there were fewer soil saturation values observed around the thresholds.For example, uncertainty of s w was greater for the humid subtropical deciduous forest site (US-MMS) than for the Mediterranean savanna sites (US-Ton), and uncertainty of s * was greater for US-Ton than US-MMS.Similarly, soil saturation states representing larger spatial scales were less sensitive to specific site characteristics.
Parameter uncertainty for satellite and footprint scales was greater than for the point scale.Estimates of larger-scale soil water balance parameters are more relevant to regional ecohydrological dynamics.Differences in parameter estimates among scales within a site may be associated with differences in soil texture properties, such as porosity and field capacity, that were determined separately for each record.Co-located and concurrent soil saturation pdfs are different at each scale (Figs.2-5) and suggest variability in observed soil water dynamics at each scale.Differences in driving processes among scales were specifically determined from the model inversion for each scale and provided robust scale-specific parameters for ecohydrological modeling.

Data availability
For each spatial scale and site, the annual model was inversed, using random subsamples of 100 to 10 % of the 2012 time series (Fig. 6).For all sites and scales the number of observations did not significantly impact model inference.viation of the randomly selected subsets of annual data were representative of the full record.There was no correlation between the small differences in the mean and standard deviations of the subsamples and the model goodness of fit.
The proposed inference method based on soil saturation pdfs can therefore reliably be used to identify ecohydrological parameters from sparse datasets.Inference methods, which do not require continuous data, are particularly relevant to large-

Number of observations [days]
Figure 6.Goodness of fit and ecohydrological parameters inferred with decreasing number of soil saturation observations (annual model).
For each subsample category, the median results of 10 repeats are plotted and results between the 90th and 10th percentiles are shaded.
Colors correspond to the four sites in the legend.KS, Kolmogorov-Smirnov statistic; NSE, quantile-level Nash-Sutcliffe efficiency; E max , maximum evapotranspiration in mm day −1 ; s * , point of incipient stomatal closure; s w , wilting point.
scale soil moisture measurements, such as satellite products, that are not continuous.

Conclusions
We document a generalizable Bayesian inversion framework to infer parameter values of the stochastic soil water balance model and their associated uncertainty using freely available rainfall and soil moisture observations at point-, footprintand satellite-scales.Empirical pdfs derived from soil saturation observations provided key information to determine unknown ecohydrological parameters s * , s w , and E max .Model assumptions were appropriately met, and optimal analytical soil saturation pdfs were consistent with empirical pdfs.Uncertainty in parameter estimates were small and reflected the sensitivity of the soil water balance model to ecohydrological parameters at varying scales and locations.We demonstrate that the form of the simple ecohydrological model for soil saturation pdfs was consistent with observations from point-, footprint-, and satellite-scales.However, optimal parameters were different at each scale because co-located and concurrent soil saturation pdfs are different at each scale, which may result from spatial heterogeneity in soil water dynamics.We demonstrate the advantage of analyzing soil saturation pdfs instead of time series.We obtained stable results using sparse subsets of the datasets, indicating that the proposed frame-work is robust and can be used with non-continuous data.
Although the seasonal model was conceptually more consistent with our physical understanding of annual soil water dynamics, the annual model provided satisfactory results matching annual empirical pdf sites we analysed.We were not able to determine if some differences in parameters estimated using the seasonal model are physically meaningful because wet and dry season dynamics were better characterized in this more complex model.Our methodology can be customized to characterize site-specific parameters and to test consistency between observed and analytical soil saturation pdfs for any other adaptation of the stochastic ecohydrological framework with more or less complexity depending on the study objectives.We provide a method based a parsimonious soil water balance model, requiring a minimum level of data inputs to estimate ecohydrological characteristics that are not directly observable and for which established estimation methods are not available.Our methods can be applied in future studies to better understand differences in soil water dynamics at different scales and to improve scaling of ecohydrological processes.Results demonstrate the value of large-scale near-surface soil moisture observations to improve characterization of soil water dynamics at ecosystem scales.Relations between the soil saturation threshold values inferred from the near-surface soil moisture data and dynamics in the full active rooting zone are unknown.The datasets we used are freely available from sensor networks and global satellite products, and methods can therefore be applied to a large range of sites or to global analyses to improve understanding of spatial in ecohydrological parameters relevant for local and global water cycle analyses.
Competing interests.The authors declare that they have no conflict of interest.

Figure 2 .
Figure 2. Empirical versus modeled cumulative density functions (CDFs) and soil saturation probability distribution (p(s)) for US-ARM.The mean values of the posterior parameter distributions were used with Eq. (2) in the annual model and Eq.(3) in the seasonal model.The grey shaded areas correspond to the soil saturation thresholds (s h , s w , s * , s fc ) in the water balance model.

Figure 3 .Figure 4 .
Figure 3. Empirical versus modeled CDFs and soil saturation probability distribution (p(s)) for US-MMS.The mean values of the posterior parameter distributions were used with Eq. (2) in the annual model and Eq.(3) in the seasonal model.The grey shaded areas correspond to the soil saturation thresholds (s h , s w , s * , s fc ) in the water balance model.

Figure 5 .
Figure 5. Empirical versus modeled CDFs and soil saturation probability distribution (p(s)) for US-Me2.The mean values of the posterior parameter distributions were used with Eq. (2) in the annual model and Eq.(3) in the seasonal model.The grey shaded areas correspond to the soil saturation thresholds (s h , s w , s * , s fc ) in the water balance model.

Table 1 .
Selected study sites.

Table 2 .
Estimated ecohydrological parameters and goodness of fit of analytical soil saturation pdfs.Values in parentheses correspond to the coefficient of variation of the posterior parameter estimates in percentage.p, analytical model for the soil saturation pdf without seasons, p wd , analytical model for the soil saturation pdf including wet and dry seasons; N, number of 20 000 simulation runs needed to obtain three converging results (see Sect. 2.3.2);NSE, quantile-level Nash-Sutcliffe efficiency; KS, Kolmogorov-Smirnov statistic; Emax, maximum evapotranspiration in mm day −1 (the weighted average wet and dry season Emax is reported for the p wd model); s * , point of incipient stomatal closure; sw, wilting point.