Using satellite observations of precipitation and soil moisture to constrain the water budget of a land surface model

Early warning of agricultural drought can enable decision makers to act to improve food security. Land-surface models are useful tools to inform such monitoring systems, but model errors are problematic. We show that satellite-derived estimates of shallow soil moisture can be used to calibrate a land-surface model at the regional scale in Ghana, using data assimilation techniques. The modified calibration significantly improves model estimation of soil moisture. Specifically, we find a 44% reduction in root-mean-squared error for a 5-year hindcast after assimilating a single year of soil moisture ob5 servations to update model parameters. The use of an improved remotely-sensed rainfall dataset contributes to 10% of this reduction in error. Improved rainfall data has the greatest impact on model estimates during the seasonal wetting-up of soil, with the assimilation of remotely sensed soil moisture having greatest impact during drying down. The significant reduction in root-mean-squared error we find after assimilating a single year of observations bodes well for the production of improved soil moisture forecasts over sub-Saharan Africa where subsistence farming remains prevalent. 10


Introduction
In regions where the population relies on subsistence farming it is soil moisture, rather than precipitation per se that is the critical factor in growing crops.The production of improved soil moisture forecasts should therefore enhance the drought resilience of these regions through improved capacity for early warning agricultural drought (Brown et al., 2017).Soil moisture is also an important variable for weather and climate prediction (Seneviratne et al., 2010), playing a key role in controlling land surface energy partitioning (Beljaars et al., 1996;Bateni and Entekhabi, 2012) and in the carbon cycle (McDowell, 2011).
However, modelling soil moisture is complex and exhibits large sensitivities to meteorological forcing data and land surface model parameterisations (Pitman et al., 1999).
Globally, precipitation is the most influential meteorological driver in the estimation of soil moisture (Guo et al., 2006).
However there is considerable variability in available precipitation data, which in turn has impacts on modelled predictions of soil moisture.When forcing a global land data assimilation system with different precipitation products Gottschalck et (2005) showed that the percentage difference in estimates of volumetric soil water content ranged between -75% to +100%.
Similarly Liu et al. (2011) showed that driving a catchment land surface model with an improved precipitation product (merged gauge and satellite observations vs. a reanalysis product) increased the model soil moisture skill by 14%, when compared to in-situ observations.
There are now a variety of remotely sensed surface soil moisture observational products from both active and passive microwave sensors.Data assimilation (DA) has been used to combine information from these observations with land surface models to improve surface soil moisture estimates (Liu et al., 2011;De Lannoy and Reichle, 2016;Yang et al., 2016).DA refers to the suite of mathematical techniques used to combine models and observations combining available knowledge about their respective uncertainties.These techniques are typically derived from a Bayesian standpoint and can be broadly classified as sequential and variational.Sequential methods adjust the model state and/or parameters at the time when observations are available whereas variational methods adjust state and/or parameters at the beginning of some time window considering all observation within that window.
It has been shown that assimilation of remotely sensed surface soil moisture can significantly improve the prediction of root-zone soil moisture and drought modelling (Bolten et al., 2010).Many recent studies use sequential assimilation methods to update the model soil moisture state at each time step when an observation is available (Kolassa et al., 2017;De Lannoy and Reichle, 2016;Draper et al., 2012;Liu et al., 2011).In addition some studies employing sequential methods estimate the model parameters as well as the state (Montzka et al., 2011;Moradkhani et al., 2005;Qin et al., 2009).Using sequential methods in this way will likely result in parameters that vary over time, which will not be optimal when using land surface models to run forecasts because the time-varying nature of the parameters will not be carried forward.An alternative is to use variational assimilation methods for parameter estimation (Navon, 1998).Variational methods will yield time-invariant parameter estimates over the assimilation time window.For a suitably chosen length of assimilation window (i.e. over one or more whole years) this allows us to avoid seasonally varying parameters.Using variational methods to assimilate remotelysensed observations for land surface model parameter estimation has previously been shown to improve soil moisture estimates in several studies (Rasmy et al., 2011;Yang et al., 2016;Sawada and Koike, 2014).These studies all optimise both model parameters and state.Here we propose an alternative, which is to include the model spin-up within the data assimilation routine so that the initial soil moisture state is consistent with the updated parameters at each optimisation step.
The work in this paper forms part of the Enhancing Resilience to Agricultural Drought in Africa through Improved Communication of Seasonal Forecasts (ERADACS) project.Part of ERADACS is the development of a light-weight system for prediction of agricultural drought in Northern Ghana (TAMSAT-ALERT).Previous work (Brown et al., 2017) has shown that TAMSAT-ALERT's skill for predicting root-zone soil moisture in Ghana ensues largely from accurate knowledge of antecedent soil moisture conditions.In this paper we describe a method for improving soil moisture estimates for the Joint UK Land Environment Simulator (JULES, see section 2.1) over Ghana through the assimilation of remotely-sensed soil moisture and use of improved satellite observed rainfall.Ultimately, we expect that the improved soil moisture estimates will increase the prediction skill of TAMSAT-ALERT, and hence the quality of drought early warning issued to farmers.We use the technique of Four-Dimensional Variational (4D-Var) data assimilation to estimate the soil thermal and hydraulic parameters of JULES by assimilating European Space Agency Climate Change Initiative (ESA-CCI) merged active and passive microwave surface soil moisture observations (Dorigo et al., 2015).We also drive the JULES model with two successive versions of the TAMSAT rainfall dataset (see section 2.2) to investigate the effect of improved precipitation on soil moisture estimates.We assimilate a single year of soil moisture observations (2009), then perform a 5-year hindcast (2010)(2011)(2012)(2013)(2014), driving the model with observed and reanalysis meteorology, to judge the impact of both the precipitation products and data assimilation on the model's representation of soil moisture when compared to independent observations.2 Method

JULES land surface model
The Joint UK Land Environment Simulator (JULES) is a process based land surface model developed at the UK Met Office (Best et al., 2011;Clark et al., 2011).We used the global land configuration 4.0 of JULES designed for use across weather and climate modelling timescales and systems (Walters et al., 2014).JULES is typically run with 4 soil layers, with the top layer being 10 cm deep.We use this layer to compare with the satellite observations.The model is forced with WFDEI data (WATCH Forcing Data methodology applied to ERA-Interim reanalysis data), described by Weedon et al. (2014), for radiation, wind, temperature, pressure and humidity values.The WFDEI data has a 0.5 • spatial resolution and a 3-hourly temporal resolution.
The JULES model was run at a half-hourly timestep, with a soil map being taken from the harmonised world soil database (Nachtergaele et al., 2008).Previously JULES has been used in sequential DA experiments (Ghent et al., 2010), and has been implemented in a variational framework with focus on the carbon cycle (Raoult et al., 2016).

TAMSAT rainfall observations
We replaced the precipitation in the WFDEI data with Tropical Applications of Meteorology using SATellite data and groundbased observations (TAMSAT) rainfall monitoring products (Maidment et al., 2014;Tarnavsky et al., 2014).TAMSAT produces daily rainfall estimates over Africa at a 4 km resolution with data ranging back to 1983.The rainfall estimates are derived from Meteosat thermal infrared images calibrated against an extensive network of African rain gauges.When aggregated over time and space, TAMSAT has been shown to have good skill over much of Africa, in comparison to ground-based observations (Maidment et al., 2013(Maidment et al., , 2017)).On daily time scales, occurrence is better represented than amount (Greatrex et al., 2014), with the magnitude of high intensity rainfall events not captured.For these reasons, TAMSAT tends to be used to monitor drought rather than to provide real-time early warning of floods.Data are available from https://www.tamsat.org.uk.
We ran JULES with WFDEI 3-hourly meteorological forcing data (Weedon et al., 2014) and TAMSAT daily rainfall estimates.Therefore we had to disaggregate the TAMSAT daily estimates to 3-hourly estimates.We did this by merging the TAMSAT data with the WFDEI precipitation data.We divided the WFDEI 3-hourly precipitation values by the corresponding WFDEI daily precipitation and then multiplied by the corresponding TAMSAT daily precipitation values.This spreads the daily TAMSAT estimates over the diurnal cycle.
In this study, we drive the JULES model with two different TAMSAT products (v2.0 and v3.0).The difference between JULES model outputs when forced with these two distinct products will help us to understand the impact of improved precipitation forcing on our estimation of soil moisture.It has been shown that TAMSAT v3.0 has greatly reduced the dry bias present in TAMSAT v2.0 (Maidment et al., 2017) and has eliminated the spatial artefacts.Despite this there are still areas where both products struggle, with coastal regions subject to large amounts of warm rain, and sharp topographic contrasts, being an example of this.For this reason, interannual rainfall variability is less well represented over the south of Ghana than the North.

ESA CCI soil moisture observations
In this study we use the ESA CCI level 3 combined active and passive soil moisture observations.This product merges data from 11 different sensors, using an algorithm described in Dorigo et al. (2017) to give an estimate of surface soil moisture together with its associated uncertainty.These estimates are assumed to represent the top 2-5 cm of soil.However, observations based on different microwave frequencies and soil moisture conditions may be representative of deeper layers (Ulaby et al., 1982).
It has been previously shown that it is best to use both active and passive retrievals together (Draper et al., 2012) and that the ESA CCI merged product performs better than either the active or passive product alone (Dorigo et al., 2015).Figure 1 shows the number of available daily soil moisture observations in the experiment period (2009)(2010)(2011)(2012)(2013)(2014) over Ghana, with the maximum number of possible observations being 2190.We can see that there is higher data availability in the north of Ghana than in the south.There are some pixels in the south for which we have no data, this is due to high vegetation cover.

4D-Variational data assimilation
We use the method of Four-Dimensional Variational data assimilation (4D-Var) to estimate the soil thermal and hydraulic parameters of the JULES land surface model for each grid cell over Ghana.4D-Var aims to find the initial state that minimises the weighted least squares distance to the prior guess while minimising the weighted least squares distance of the model trajectory to the observations over the time window.This is done by minimising a cost function at each grid cell, where x is the vector of model parameters, with x b being a prior guess and x 0 the current update, B is the prior error covariance matrix, y i is the observation at time t i , h i is the observation operator (here the JULES model) mapping the current model parameters (x 0 ) to the observation y i at time t i and R i is the observation error covariance matrix.We chose a variational DA method for parameter estimation over a sequential method, because variational methods ensure that the retrieved model parameters are time invariant over the assimilation window and will hence fit seasonal model dynamics when the window is sufficiently large.
In this study, we updated the percentage of sand, silt and clay in the soil at each minimisation step and then used a set of pedo-transfer functions (Cosby et al., 1984) to relate the new sand, silt and clay proportions to the 8 soil parameters in JULES.we included a model spin-up to ensure that the initial soil moisture state is consistent with the updated parameters.We used the Nelder-Mead simplex algorithm (Nelder and Mead, 1965) to minimise the cost function in equation (1).

Experimental design
For each data assimilation experiment with JULES (driven with TAMSAT v2.0 or v3.0 rainfall) we assimilate a single year of ESA CCI soil moisture observations (2009) and then run a 5-year hindcast (2010)(2011)(2012)(2013)(2014).The hindcast allows us to evaluate the performance of each experiment against independent soil moisture observations.In our results, we consider 4 different model From these 4 distinct experiments we can interrogate the impact of both the DA and use of the updated rainfall product.

Results
We split our analysis over northern and southern Ghana (above and below 9 • N respectively) due to the issues of data quality between the two regions.The data quality of both precipitation and soil moisture is higher in the north than the south and also much of the subsistence agriculture in Ghana takes place in the northern regions, with a higher percentage of cash crops grown in the south (Martey et al., 2013).In Figure 2 we show the results of a data assimilation and forecast for a single grid cell in the north of Ghana, here both the prior (light grey line) and posterior (dark grey line) are forced with TAMSAT v3.0 precipitation (experiments 3 and 4 respectively, described in section 2.5).From Figure 2 we can see that the data assimilation has greatly improved the fit to the observations in the assimilation window (2009), which is to be expected, since these observations are what the model is calibrated against.However, the improved fit continues into the forecast (2010-2014) when comparing against the unassimilated observations.We can see a distinct seasonal pattern for soil moisture in northern Ghana, where there exists a rainy season and corresponding "wetting-up" of soil moisture from approximately March-May and a dry season with "drying-down" of soil moisture from approximately November-January.The model skill for predicting this seasonal cycle is markedly improved after data assimilation.In Figure 3 we show the same model runs for a grid cell in the south of Ghana.The season in the south of Ghana is much less pronounced and this is seen in both the model runs and the observations.However, the observations are of poorer quality in the south due to the higher vegetation cover and cloud cover, adding to the noise seen in Figure 3.Although we do improve the fit to the observations after data assimilation in Figure 3 we do not see the same scale of improvement as for the northern grid cell in Figure 2.This is most likely due primarily to the higher error in both the precipitation and soil moisture observations.In addition, the less pronounced seasonal cycle is more difficult to forecast after just assimilating a single year of data.outperforms experiment 2 (TAMSAT v2.0 with DA) for the majority of years, suggesting that it is precipitation, as opposed to the assimilation of soil moisture, that is most important for improving soil moisture estimates during wetting-up.For southern Ghana (bottom row) the most accurate model run is again experiment 4 (TAMSAT v3.0 with DA), although experiment 2 (TAMSAT v2.0 with DA) is much closer in accuracy than for the north.This suggests that both rainfall products are poor in the south compared to the north.We also note that experiment 1 (TAMSAT v2.0 no DA) is markedly more accurate than experiment 3 (TAMSAT v3.0 no DA) in the south.However, considering the results after DA (experiment 4 outperforming experiment 2) this can be explained by an incorrect specification of the prior soil map in the south rather than TAMSAT v2.0 rainfall outperforming TAMSAT v3.0 (it is expected that both products perform poorly in coastal regions (Maidment et al.,  where soils are often much more sandy/silty in texture (Braimoh and Vlek, 2004), it is possible that for other grid cells we are overfitting to the data.Future work should investigate whether adding additional parameters to the optimisation could alleviate this problem.
We show summary statistics calculated over the whole of Ghana for the 4 experiments in Table 1.In every case we find 5 the lowest RMSE for experiment 4 (TAMSAT v3.0 with DA), with the RMSE being reduced by 44% after data assimilation from experiment 3 (0.0753 m 3 m −3 ) to experiment 4 (0.0420 m 3 m −3 ).From experiment 2 to 4 we can see that, after data assimilation, using TAMSAT v3.0 rainfall over v2.0 has contributed to a 10% reduction in RMSE when calculating statistics over the whole period.These RMSE reductions are similar in both the wetting-up and drying-down periods of the hindcast (2010)(2011)(2012)(2013)(2014).However, we see that using the improved rainfall product has largest effect (judged by comparing experiment 2 10 to 4) on bias reduction during wetting-up (94% reduction in bias) in comparison to drying-down (27% reduction in bias) and over the whole period (39% reduction in bias).This is a similar result as discussed in the analysis on Figure 5. Table 1.Experiment statistics calculated over whole of Ghana in hindcast period (2010-2014), for the whole period, wetting-up (Mar-May) and drying down (Nov-Jan).The units for both RMSE and bias are m 3 m −3 .
1) TAMSAT 2 no DA 2) TAMSAT 2 DA 3) TAMSAT 3 no DA 4) TAMSAT 3 DA   For northern Ghana there exists a prominent seasonal cycle for soil moisture, with observations of higher quality than in the south for both TAMSAT rainfall and ESA CCI soil moisture.We find that soil moisture estimates based on TAMSAT v3.0 outperform v2.0, especially during the wetting-up phase of the seasonal cycle, with the effect of the rainfall dataset less marked during the drying down phase.This is to be expected as little or no rain occurs during drying-down so that it is model dynamics that are the dominant factor in the estimation of soil moisture.Therefore, it is the updating of soil parameters via data assimilation and not improved precipitation that has the greatest impact on soil moisture estimates during drying-down.
Conversely improved rainfall data has greatest impact for estimating wetting-up and constraining the start of the growing season.
For southern Ghana, there exists a much less prominent seasonal cycle than in the north, with poorer quality observations for both TAMSAT rainfall and ESA CCI soil moisture.This is due to large amounts of coastal convective cloud and higher vegetation cover.We find that, after after assimilating soil moisture data, runs forced with TAMSAT v3.0 very slightly outperform those forced with TAMSAT v2.0, but due to the relatively poor data quality the improvement is much less marked.Although we do not have reliable precipitation observations in the south we can still greatly improve our forecast skill for soil moisture through DA.This bodes well for other regions with unreliable or sparse precipitation observations (Crow, 2003).
Overall our DA scheme performs well and allows us to retrieve soil thermal and hydraulic parameters for JULES that improve soil moisture estimates in our hindcast experiments.In this study, we have used a simplex algorithm to minimise the 4D-Var cost function without the use of a model adjoint.Whilst an adjoint facilitates efficient calculation of gradients in the cost-function given in equation 1 it is costly to maintain and keep up-to-date with the latest model version.The only example of an adjoint of JULES for which we are aware is provided by Raoult et al. (2016) and is implemented for version 2.2 of the model, several major versions behind the current release.In future work a 4D-Ensemble-Var (Liu et al., 2008(Liu et al., , 2009) ) approach could prove a useful compromise as it allows for the use of a gradient based descent algorithm, reducing the total number of function calls required to reach a solution without the use of an adjoint.
There is likely an issue of representativity between the satellite derived soil moisture observations and the JULES modelled soil moisture in our DA system.We make the pragmatic assumption that satellite soil moisture is representative of the top 10cm layer of soil in JULES.However, during intense dry periods the satellite will become more sensitive to greater depths and hence less representative of the JULES top level soil moisture.This can be seen in Figure 2 where the model fails to capture the satellite observations during the driest periods, with the JULES model predicting a lower soil moisture than the ESA CCI observations, this same phenomenon appears at a number of grid cells during dry periods.We can also see this consistent dry bias in the bottom row of Figure 4b.More work is needed to understand how best to address this issue between satellite and modelled soil moisture.One option could be to create a multi-layer observation operator for land surface models.Previous DA studies have opted to assimilate satellite retrieved brightness temperature and then use a radiative transfer model on top of their chosen land surface model (Moradkhani et al., 2005;Qin et al., 2009;Montzka et al., 2011;Rasmy et al., 2011;Sawada and Koike, 2014;Yang et al., 2016).The core observation that make up the daily product are, in effect, instantaneous but then merged into a harmonised product.
The ideal situation would be to have precipitation measurements and soil moisture observations that are representative of the same time periods and spatial domains, but there are no such current missions.

Conclusions
Previous studies at the grid cell level have shown that calibrating land surface models with satellite observations improves performance when judged against in-situ observations (Moradkhani et al., 2005;Qin et al., 2009;Montzka et al., 2011;Rasmy et al., 2011;Sawada and Koike, 2014;Yang et al., 2016).In this study we calibrated the JULES land surface model at the regional scale (over Ghana) and show that this reduces both bias and RMSE when judged against independent observations in a set of hindcast experiments.From the results, it is clear that both improved rainfall estimates and the implementation of data assimilation are required in order to improve modelled estimates and forecasts of soil moisture.In the north of Ghana, where the observations are of highest quality, we find that improved precipitation estimates are of greatest importance for accurate al. 1 Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2017-705Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 22 December 2017 c Author(s) 2017.CC BY 4.0 License.

Figure 1 .
Figure 1.Number of available days of ESA CCI soil moisture observations in the experiment period (2009-2014) out of a maximum of 2190 days.

Figures 2 and 3
Figures2 and 3show results from experiments 3 and 4 when forcing the JULES model with TAMSAT v3.0 rainfall.In Figure4we show model bias (judged against ESA CCI observations in the forecast period, 2010-2014 and calculated as the mean absolute deviation) for wet and dry seasons and experiments 1 to 4. Without DA (top row) we can see that for both wet and dry seasons there is a larger dry bias in soil moisture in northern Ghana for TAMSAT v2.0 than v3.0 and a larger wet bias in southern Ghana for TAMSAT v3.0 than v2.0, a finding consistent with the comparisons of precipitation between v3.0 and v2.0 presented byMaidment et al. (2017).After DA (bottom row) we can see that the wet bias in southern Ghana is largely removed for both TAMSAT v2.0 and v3.0.However, in northern Ghana a slightly larger dry bias still remains for TAMSAT v2.0, compared to v3.0.

Figure 5 Figure 2 .
Figure5shows the yearly root-mean-square error for each experiment model run.For northern Ghana (top row) this shows that the most accurate model run is experiment 4 (TAMSAT v3.0 with DA); this is especially true for wetting-up (left) and

Figure 6 Figure 3 .
Figure6compares the prior soil map used as the initial guess in the DA (i.e. from the Harmonised World Soil Data Base) with the posterior soil map retrieved by DA.The posterior soil map shown is the soil map retrieved when forcing JULES with TAMSAT v3.0 rainfall.It can be seen that after DA, the percentage clay is greatly reduced with increased percentages in silt and sand for the majority of grid cells.Although this change is reasonable for some grid cells, particularly in northern Ghana

Figure 4 .
Figure 4. Soil moisture model minus observations for 5 year JULES forecast (2010-2014) driven with TAMSAT v2.0 and v3.0 precipitation and before and after data assimilation.Subplot a) statistics calculated over March to May for wet period, subplot b) statistics calculated over November to January for dry period.White pixels indicate areas where there is no data to calculate statistics (mainly due to high vegetation cover in the south).

Figure 6 .
Figure 6.Prior and posterior soil maps over Ghana showing percentage of sand, silt and clay.
Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2017-705Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 22 December 2017 c Author(s) 2017.CC BY 4.0 License.Our results highlight the importance of having quality observations of both precipitation and soil moisture.TAMSAT rainfall observations and the ESA CCI soil moisture data are available as daily products but at different spatial resolutions and different observation times.TAMSAT data are produced at 4km spatial resolution by calculating cold cloud duration over a 5 day period of 15 minute thermal infra-red observations.The ESA CCI soil moisture data on the other hand are merged from various passive and active microwave observations and available in various spatial resolutions that are typically in the order of 0.25 • .
representation of the start of season soil moisture.In contrast, the assimilation of relevant soil moisture observations with our land surface model gives the largest benefit for improving estimates during drying-down.After assimilation of a single year of soil moisture observations (2009) we reduce the RMSE of a 5-year model hindcast (2010-2014) by 44%, with the improved rainfall product contributing a 10% reduction in error.Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2017-705Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 22 December 2017 c Author(s) 2017.CC BY 4.0 License.