Ensemble-based data assimilation of atmospheric boundary layer observations improves the soil moisture analysis

We revise the potential of assimilating atmospheric boundary layer observations into the soil moisture. Previous studies often stated a negative assimilation impact of boundary layer observations on the soil moisture analysis, but recent developments in physically-consistent hydrological model systems and ensemble-based data assimilation lead to an emerging potential of boundary layer observations for land surface data assimilation. To explore this potential, we perform idealized twin 5 experiments for a seven-day period in Summer 2015 with a coupled atmosphere-land modelling platform. We use TerrSysMP for these limited-area simulations with a horizontal resolution 1.0km in the land surface component. We assimilate sparse synthetic 2-metre-temperature observations into the land surface component and update the soil moisture with a localized Ensemble Kalman filter. We show a positive assimilation impact of these observations on the soil moisture analysis during day-time and a neutral impact during night. Furthermore, we find that hourly-filtering with a three-dimensional Ensemble 10 Kalman filter results in smaller errors than daily-smoothing with a one-dimensional Simplified Extended Kalman filter, whereas the Ensemble Kalman filter additionally allows us to directly assimilate boundary layer observations without an intermediate optimal interpolation step. We increase the physical consistency in the analysis for the land surface and boundary by updating the atmospheric temperature together with the soil moisture, which as a consequence further reduces errors in the soil moisture analysis. Based on these results, we conclude that we can merge the decoupled data assimilation cycles for the land surface 15 and the atmosphere into one single cycle with hourly-like update steps.


Data assimilation
We combine a background forecast x b with observations y o in data assimilation to get an analysed state x a , which should be as close as possible to an unknown true state x t . Both, background forecast and observations, are a disturbed representation of the 95 true state with the identity mapping and a possibly non-linear observation operator H(x) as mapping operators, respectively.
Both have presumably additive Gaussian errors b and o with error covariances B and R, respectively, Based on this Gaussian assumption, we can represent the analysed state as linear combination between the observations and 100 background forecast with H T as linearized and transposed observation operator (Kalman, 1960;Kalnay, 2003), We parametrize the observational error covariance R = (σ o ) 2 I as diagonal matrix with an observational standard deviation σ o = 0.1 K, which is static in time and constant across all observations. To solve Eq.
(3), we need a background forecast x b , the background error covariance B, and a linearized observation operator H. We compare in this study two different ways to 105 define these matrices, the Simplified Extended Kalman filter (SEKF) and an Ensemble Kalman filter (EnKF). In the following, we will shortly describe these schemes.

Simplified Extended Kalman filter
We use a deterministic background forecast for the SEKF. This background forecast is updated at 00:00 UTC based on gridpoint observations at 12:00 UTC and Eq. (3). In the SEKF, we only update the soil moisture. To derive H, which describes 110 the sensitivity of 2-metre-temperature observations to perturbations in soil moisture, we perturb the soil moisture for a finite differences' approximation as described in Hess (2001); de Rosnay et al. (2013). To create the finite differences' approximation, we need only one additional perturbed model run for every soil layer because CLM has a column-based soil model. The background error covariance B = (σ b ) 2 I is assumed to be diagonal and static in time with σ b = 0.01 m 3 m −3 as standard deviation.
With this ensemble approximation, we can update the background state by Eq. (3) and obtain a mean analysis state x a , which represents the most probable analysis state at time t. This analysis state is then a weighted linear combination of the background ensemble members x b i . We use an implementation of the Localized Ensemble Transform Kalman Filter (LETKF, Bishop et al. (2001); Hunt et al. (2007)) in this study, where this weighting is explicitly represented with a column-wise matrix of all 125 background perturbations X b and the mean ensemble weights w, These mean ensemble weights are then estimated with Eq. (3) in the space spanned by the ensemble perturbations X b . To derive a linearized observation operator for Eq. (3), each ensemble member is independently transformed into observational space, and afterwards, the observation operator is linearized around the ensemble mean in observational space The increments for every ensemble member are obtained based on the mean ensemble weights and a deterministic and symmetric square-root filter with w i as additional perturbative member weights for the i-th ensemble member, In our experiments, we use the same model configuration for our nature run as for our data assimilation experiments. These experiments are thus model-error-free compared to the nature run. Nevertheless, the ensemble approximations from Eq. (4) 135 and Eq. (5) induce sampling errors. Because of these sampling errors, we include in our EnKF multiplicative prior covariance inflation as described in Hunt et al. (2007).
The number of ensemble members is low compared to the state dimensions, especially for data assimilation across compartments. This discrepancy in the dimensionality introduces spurious correlations into the background covariances (Miyoshi et al., 2014), which can degrade the analysis. To reduce spurious correlations, the LETKF utilizes observational localization. 140 We base our horizontal and vertical localization scheme on Gaspari-Cohn weighing functions (Gaspari and Cohn, 1999), which are also used for operational data assimilation in the atmosphere (Schraff et al., 2016).

Experiments
In this section, we will describe our experiments. First, we will explain our experimentation strategy. Secondly, we will shortly characterize the weather and soil conditions within our nature run.

Experimentation strategy
In our experiments, we create an ensemble to investigate interactions between temperature in the atmospheric boundary layer and soil moisture. Our experiments are based on a perfect model assumption such that we use the same model configuration for every run, depicted in Table 1. The model configuration for the atmosphere is almost the same as for the COSMO-DE setup that was in operational use at the DWD until 2018, except that we use a smaller area. The model area is based on the 150 Neckar catchment in Baden-Württemberg and spans a region of ∼ 300 km in latitudinal direction and ∼ 280 km in longitudinal direction as shown in Figure 1.
Our data assimilation framework (Finn, 2020a, b) is developed in Python (Van Rossum, 1995), PyTorch (Paszke et al., 2019), Xarray (Hoyer andHamman, 2017), andDask (Dask Development Team, 2016;Rocklin, 2015). This framework is coupled to TerrSysMP by files such that the background and first guess are read-in as output files from the models. The analysis is then 155 based on input files to restart the forecast models.
We define a nature run (NATURE) as our truth in this study and to get our 2-metre-temperature observations. We generate hourly 2-metre-temperature fields by using the diagnostic COSMO 2-metre-temperature output. For the observations, we select 99 grid points (marked as black dots in Figure 1), given by the nearest horizontal neighbour grid point to real measurement 160 sites. To introduce an observational error, we add to these selected observations independent and identically distributed (i.i.d.) unbiased Gaussian noise with a standard deviation of σ o = 0.1 K.
Since COSMO is a limited-area model, we have to define for the experiments lateral boundary conditions, besides initial conditions. In all experiments, including the nature run, we use the same lateral boundary conditions in the atmosphere. The lateral boundary conditions are generated based on the 18-th member of the COSMO-DE EPS ensemble from the DWD with 165 the same horizontal and vertical resolution as our model setup. Accordingly, we do not induce any model perturbations by A single run with a similar model configuration and a spin-up of 6 years builds the foundation for our initial conditions in the atmosphere and soil. Every run has the same initial conditions in the atmosphere, whereas we perturb the initial soil conditions 170 by correlated Gaussian perturbations similar to Schraff et al. (2016). As horizontal correlation function, we use a truncated Gaussian kernel with a standard deviation of 14 grid points (≈ 14 km) and a truncation radius of 42 grid points. The same type of truncated Gaussian correlation is used in vertical dimensions with a standard deviation of 0.5 m and a truncation after 1 m.
For soil moisture perturbations, we perturb the soil moisture saturation, which is the volumetric soil moisture scaled by the saturation point, and we further restrict that the resulting saturation lies between 0 and 1. We use unbiased Gaussian noise with 175 a standard deviation of 0.06 for the soil moisture saturation and 1 K for the soil temperature across all layers.
Based on these initial soil perturbations, we generate 40 ensemble members for our ensemble experiments. We initialize our nature run with the initial conditions of a hypothetical 41-th ensemble member to make sure that the ensemble spread within an open-loop run is representative for the error of its ensemble mean to the nature run. The deterministic run for the SEKF experiment is initialized with the ensemble mean of the 40 different ensemble members to get comparable results between the 180 LETKF and SEKF.
We start the model runs for all of our experiments at 2015-07-30 00:00 UTC. The first 36 hours of simulation are used as spin-up such that perturbations can propagate from the soil into the atmosphere. After this spin-up phase, starting at 2015-07-31 12:00 UTC, we start with our six different experiments. The models are restarted because of our file-based data assimilation, and processes in the turbulent kinetic energy scheme are reset, which can be seen as some kind of model error. To mitigate this 185 possible model error source, we will restart after 2015-07-31 12:00 UTC all model runs hourly. We will simulate a period of one week (seven days) and finish our experiments at 2017-08-07 18:00 UTC. We will shortly describe the experiments in the following; their abbreviations are given in Table 2. . This setting can be seen as weakly-coupled data assimilation experiment and is mainly used as comparison to the SEKF. We enforce the non-negativity of the soil moisture analysis by clipping negative values after the assimilation to zero. We expect that most of the soil-generated perturbations in the atmospheric boundary 195 layer can be found in the atmospheric boundary layer temperature. Based on this expectation, we additionally update the atmospheric temperature together with the soil moisture in the LETKF Soil+Temp experiment. We cast this experiment as baseline experiment for strongly-coupled data assimilation, and we will extensively evaluate this experiment in the second part of the results.
In another experiment, we run an open-loop deterministic forecast without data assimilation (DET). This deterministic fore-200 cast is initialized with the same initial values as the ensemble mean. We expect that the errors of this deterministic forecast are comparable to the open-loop ensemble mean. This deterministic run acts then as baseline experiment for the SEKF experiment, where we assimilate grid-point based the 2-metre-temperature at 12:00 UTC into the soil moisture at 00:00 UTC, the night before. Normally, an optimal interpolation scheme is used to generate the grid-point-based screen-level observations, which would introduce additional errors into the comparison between LETKF and SEKF. We simplify in this study our grid-point   Table 1), representative for 0.21 m depth. Blue colours characterize fewer equivalents per grid point compared to gridpoint-based assimilation (e.g. SEKF), whereas red colours indicate more equivalents.
similar to the observational errors for the LETKF. Because the SEKF is a smoothing algorithm, we already make use of the pseudo-observations at 2015-07-31 12:00 and start our SEKF experiment at 2015-07-31 00:00.

210
In data assimilation experiments, the direct and accumulated assimilation impact is difficult to quantify as each experiment evolves along its own trajectory. To disentangle effects of different approximations on the direct assimilation impact without accumulated increments, we make additional offline data assimilation experiments. In these offline data assimilation experiments, we generate analyses based on the trajectories of the SEKF and the LETKF Soil+Temp experiment without restarting the model system; the background forecasts of these offline experiments are therefore the same as for the SEKF and the LETKF 215 Soil+Temp experiment, respectively. Into these existing background trajectories, we assimilate observations from the NATURE run with different approximations.
For our ensemble-based LETKF experiment, we have to define additional localization radii and an inflation factor. We choose here a horizontal localization radius of 15 km, which is quite small in comparison to operationally used values in the atmosphere (between 50 km and 100 km, Schraff et al. 2016). This small radius represents smaller error-covariance length-220 scales for the land surface and atmospheric boundary layer compared to typical error length-scales in the free atmosphere.
In the atmosphere, we localize vertically in terms of logarithmic pressure and use a typical value (0.3 ln hPa), which is also used in operational settings (Schraff et al., 2016). Observations in the atmospheric boundary layer have their largest impact on soil moisture analysis at root-depth (Muñoz-Sabater et al., 2019), while afterwards the physically explainable impact is negligible. Our vertical localization radius in soil is therefore chosen (0.7 m) such that the innovations for soil levels below the 225 root-depth (5-th layer) are dampened. Because the impact of observations is additive, we can sum up the localization weights and estimate the here so-called potential observational equivalents ( Figure 2). These equivalents tell us how many observations are potentially available per grid point, if we neglect involved ensemble sensitivities, in comparison to a fully-observed field with a one-dimensional assimilation scheme. The mean observational equivalent for the 5-th soil layer (0.21 m depth) is 0.566, indicating that we have roughly half the number of potential observations per grid-point relative to a fully-observed field. We have thus only a limited observability within the LETKF experiments, and in some areas, we would expect no assimilation impact at all. We additionally set the multiplicative prior covariance inflation factor to 1.006, which reflects that we have only sampling errors and no model error.

Weather and soil conditions
The daily mean 2-metre-temperature increases with time in our seven-day simulation period, whereas the soil moisture in 235 root-depth decreases ( Figure 3). The soil is in a mixed soil moisture regime, as indicated by saturation values around 0.5.
The amplitude of the diurnal cycle in the 2-metre-temperature is further quite large. Together with the soil moisture drying and the temperature increase, this amplitude indicates a period with strong solar irradiance and without large precipitation events. The only larger precipitation event is on 2015-08-01, while a smaller event on 2015-08-04 has the largest impact on the 2-metre-temperature. Based on this weather overview only, we would expect a coupling between atmospheric boundary 240 layer temperature and soil moisture in root-depth during day-time. We can therefore say that we expect some data assimilation impact on the soil moisture by assimilating the 2-metre-temperature.

Results
We structure this section into two general parts. In the first subsection, we will compare our experiments and show what we can learn from this comparison. Afterwards, we analyse the LETKF Soil+Temp experiment more in detail with regard to driving 245 factors in the assimilation.
We can expect that assimilating the 2-metre-temperature into soil moisture improves the forecast of the atmospheric boundary layer (e.g. Carrera et al. (2019)). We will analyse in a first step the impact of data assimilation into soil moisture on the prognostic boundary layer temperature (Figure 4) in 10 m height above ground, the lowest prognostic model level. Every data assimilation experiment (SEKF; LETKF Soil; LETKF Soil+Temp) has a substantially lower Root-Mean-Squared-Error (RMSE) to NATURE than their counterpart without data assimilation (DET; ENS, Table 3). Because this result is found throughout the experiments, this improvement is independent of additional updates in the atmospheric boundary layer. This result confirms previous studies that updating the soil moisture with 2-metre-temperature observations has a positive assimilation impact on the forecast of the atmospheric boundary layer.

255
All experiments have a clearly defined diurnal cycle in the RMSE with the highest errors during day-time. We find the same diurnal cycle in the data assimilation impacts with the highest impacts during day-time and only small impacts during collapse of the atmospheric boundary layer in the evening leads to a strong decrease of the atmospheric perturbations. Due to this process, collected information by data assimilation from the day before is also partially lost.
The LETKF Soil+Temp experiment has the smallest error of all experiments, indicating a small positive impact of additionally updating the atmospheric temperature. Nudging the atmospheric temperature to the observations helps us to reduce error components related to a drift of trajectories compared to the NATURE run. By construction of the experiment, the largest 265 part of errors are nevertheless soil-induced, which limits the additional impact of updating the atmospheric temperature. We additionally have a loss of information due to the collapsing boundary layer, as discussed before, and the differences between the LETKF Soil and the LETKF Soil+Temp experiment remain small over the simulation window.
The assimilation impact of the SEKF experiment is similar to the impact of the LETKF Soil experiment, despite the fact that the latter experiment has a smaller absolute magnitude of error. Because the same decreased error can be noticed between 270 the DET and ENS experiment, the smaller errors of the LETKF Soil experiment are mainly accountable to the difference in the type of experiment. Based on this result, both data assimilation methods, the SEKF and LETKF, are similar effective in reducing errors in the atmospheric boundary layer temperature by updating the soil moisture only.
We can expect that the assimilation of 2-metre-temperature observations into soil moisture has also a positive impact on  The LETKF Soil experiment has a smaller error than the SEKF experiment, but they have similar impacts on the boundary layer temperature. This increased impact in soil moisture is a result of filtering instead smoothing, used in the SEKF exper-285 iments. The SEKF can correct foreseeable errors at noon in advance, whereas we only correct instantaneous errors in the filtering framework. Smoothing has thus an advantage compared to filtering for correcting errors in the atmospheric boundary layer based on updates of the soil moisture. For soil moisture, the information content of a single update step is limited by the coupling strength. Hence, hourly updating the soil moisture with the LETKF is capable to extract more information from limited observations than the SEKF with a fully-observed field and a single update step per day.

290
Additional nudging of the simulated boundary layer temperature towards the observed temperature results in a positive impact in the LETKF Soil+Temp experiment compared to the LETKF Soil experiment. By updating the boundary layer temperature, we increase the consistency in the analysis errors, which has also a positive assimilation impact on later cycles. In Up to this point, we only looked into the error development of either the temperature at the lowest atmosphere layer or the soil moisture in root-depth as spatial mean. In the following, we will analyse how the assimilation impact is spatial distributed ( Figure 6) in the LETKF Soil and SEKF experiment. These increments also influence the remaining error of the LETKF Soil experiment compared to the NATURE run ( Figure   310 6, c). Errors are especially dampened in this experiment, if observational position and initial condition errors match. The construction of the ensemble perturbations  and spatial localization in the LETKF lead to a spatial smoother error field than for the SEKF experiment ( Figure 6, d). The SEKF experiment has also higher error amplitudes than the LETKF Soil experiments, showing the effectiveness of the LETKF in this case. Furthermore, the one-dimensional approximation in the SEKF results in error fluctuations across a small area, which are not apparent in the errors and LETKF Soil experiment.

315
The LETKF Soil experiment has thus a spatially more balanced and higher impact than the SEKF experiment, especially in the eastern part of the domain.
In the following (Figure 7), we will show the RMSE for soil moisture in root-depth of the offline experiments based on the SEKF trajectory (Figure 7, a) and on the LETKF Soil+Temp trajectory (Figure 7, b). The comparison between an experiment with perfect observations, extracted from the NATURE run, and disturbed observations allows us to get an impact of the random observational error. For the SEKF base trajectory (Figure 7, a), the difference between an offline experiment with observations from the NATURE run, denoted SEKF-nature, and the original analyses is small. Based on these marginal differences, random observational errors have only a negligible impact on the errors of the SEKF trajectory.

325
In the SEKF-ENS experiment, we replace the finite-differences' approximation for the Jacobians in Eq. (3) by an ensemble approximation from the ENS experiment. We make here the assumption that the ENS experiment is like an external ensemble data assimilation cycle with constrained perturbations in the atmosphere, since we use the same lateral boundary conditions in all experiments. This offline experiment thus resembles the current SEKF implementation at the ECMWF (ECMWF, 2019), except the fact that we do not restart the trajectory within this offline experiment. The error compared to SEKF-nature is reduced,

330
indicating that the ensemble approximation stabilizes the Jacobians in comparison to the finite-differences' approximation.
We take dynamic background covariances from the ENS experiment into account in the SEKF-1D-EnKF experiment, where we use an EnKF instead of a SEKF. In this experiment, we further reduce the error compared to the SEKF-ENS experiment.
This error reduction has two reasons: On one hand, we have dynamic covariances, which resemble the flow-dependent uncertainties. On the other hand, the ensemble spread of ENS experiment is larger than the analysis error of the SEKF experiment 335 and the static background covariances for the SEKF. We thus overestimate the assimilation impact in the SEKF-1D-EnKF experiment, which is then a lower bound for the SEKF error.
We replace the column-based data assimilation with a LETKF-based assimilation of 99 discrete observation points in the SEKF-3D-EnKF experiment. Here, we assimilate with a LETKF, based on the perturbations from the ENS experiment, observations from the NATURE run at 12:00Z into the background trajectory of the SEKF experiment at 00:00Z. This increases the 340 analysis error compared to the SEKF-1D-EnKF experiment, because we have only limited observations compared to a fully observed field. Nevertheless, the error of the SEKF-3D-EnKF experiment is smaller than the SEKF-nature, showing that the ensemble-based assimilation is preferable to a finite-differences-based SEKF.
Similar results can be seen in the offline data assimilation experiments based on the LETKF Soil+Temp experiment (Figure 7 We directly assimilate the soil moisture in root-depth in the LETKF-1D-H2O experiment. With this direct assimilation, we 350 deactivate the source of uncertainty within the vertical covariances, translating from 2-metre-temperature to soil moisture in root-depth. The margin between LETKF-1D-H2O and LETKF-1D-nature is thus representative for the assimilation impact associated to the coupling between atmosphere and land. Based on this margin, the coupling between the 2-metre-temperature and soil moisture dominantly controls the assimilation impact on soil moisture, also during day-time. The sensible heat flux acts as main coupler between soil moisture and 2-metre-temperature, whereas the evapotranspiration has a bigger impact on humidity in the atmosphere. Based on these physical considerations, we will now show the dependency of the sensible heat flux on the soil moisture (Figure 8). The same insensitivity can be found in the moist regime, where the ensemble mean saturation is above 0.5. Plants have in this regime enough water for transpiration and the sensible heat flux is almost insensitive to changes in soil moisture. Hence, the sensible heat flux value is more influenced by other factor, as indicated by higher variances across a soil moisture bin, 365 and we expect here the smallest assimilation impact. In the mixed regime, where the saturation is between 0.2 and 0.5, plants regulate their transpiration based on the soil moisture, leading to higher sensitivities in the sensible heat flux to changes in the soil moisture. We would therefore expect that most available information from the 2-metre-temperature for the soil moisture is encoded within this mixed regime.

370
In Figure 9, we classify the soil moisture with these three regimes to show its influence on the potential assimilation impact in soil moisture itself. Based on the LETKF-1D-nature experiment from Figure 7, we use a potential assimilation impact, which would be the assimilation impact on the soil moisture in root-depth, if we would observe the whole 2-metre-temperature run from the analysis of the LETKF-1D-nature experiment to the background of the LETKF Soil+Temp experiment for soil moisture. The soil moisture saturation clearly determines the potential assimilation impact (Figure 9), as previously expected. We find the highest potential impact in grid points with mixed regime, where the sensible heat flux has the highest sensitivity to changes in the soil moisture. The assimilation has its lowest impact in the moist regime, because the sensible heat flux has here its least sensitivity to changes in soil moisture and is mostly influenced by other factors. For our seven-day simulation, we conclude 380 that the soil moisture itself is a main factor to explain variabilities in the assimilation impact across grid points.
In all regimes, we have a positive assimilation impact during day-time, whereas a negligible impact during night. The solar irradiance is the main driver for the coupling between atmospheric boundary layer and land surface and shapes also the diurnal cycle of the assimilation impact. Nevertheless, in the late afternoon the potential impact deviates from its expected diurnal cycle, which cannot be explained by solar irradiance alone. This potential impact deviation indicates a mechanism, which 385 reinforces the positive assimilation impact in the late afternoon.
In the following, we will reveal that the coupling is additionally controlled by the temporal development of the atmospheric boundary layer (Figure 10), leading to the deviation in the late afternoon. We analyse this temporal development within the LETKF Soil+Temp experiment.
As main driver for the assimilation impact, the coupling between land surface and atmosphere correlates the soil moisture to  (Harrison, 1981), which results in a positive correlation to the 2-metre-temperature. After sunrise, and before perturbations in the boundary layer are accumulated, the sensible heat flux has a direct impact on the 2-metre-temperature.
The same reinforcement mechanism, as in the potential assimilation impact, can be found in the correlations of the sensible heat flux and soil moisture to the 2-metre-temperature. The sensible heat flux follows nevertheless a diurnal cycle without any additional peak (Figure 10, b). In contrast to this diurnal cycle, the reinforcement mechanism also heavily influences the 400 diurnal cycle of the water vapour content. Based on this fact, we can trace the reason of the reinforcement mechanism back to the growth and collapse of the boundary layer. The land surface heats up with increasing solar irradiance in the morning.
With time, the sensible heat flux and evapotranspiration transport the heat into the boundary layer (Stull, 1988), causing an increase in the heat content of the boundary layer. In the afternoon, the solar irradiance decreases again with time such that also differences between boundary layer and land surface decrease, resulting in lower heat fluxes. Together with a growth of the 405 mixed boundary layer, these lower heat fluxes cause a decrease in the heat content few meters above the surface, as seen in the water vapour content. As a consequence of the strong decrease in solar irradiance, the near-surface boundary layer collapses into a thin strongly-stratified boundary layer. Propagated heat is now stored within this thin layer, leading to a rapid increase in the heat content. This rapidly increased heat content then also strengthens the atmosphere-land coupling above the land surface in the late afternoon.
The atmosphere-land coupling controls the information content encoded within the vertical covariances. In the following, we will also take horizontal covariances and the impact of localization into account and take a deeper look into the dependence of the diagonal covariance on the horizontal distance between 2-metre-temperature and soil moisture ( Figure 11). As previously stated, we have negative error covariances during day-time, whereas we have slightly positive error covari-415 ances in the evening and night. The ensemble covariances at 2015-08-01 12:00Z and 2015-08-03 06:00Z resemble the error covariances for local areas. Nevertheless, the ensemble covariances show in both cases too wide horizontal covariances compared to the error covariances, and here, horizontal localization helps to reduce the impact of these spurious correlations. The chosen localization radius of 15 km is too small for 2015-08-01 12:00Z and reduces the impact of horizontal covariances too strongly compared to the error covariances, whereas the radius is well-tuned for 2015-08-03 06:00Z. At 2015-08-01 19:00Z, 420 the ensemble cannot represent the positive error covariances, and we would expect a negative assimilation impact. In this case, the best localization would be 0 km, indicating that a deactivation of the assimilation would be the best choice. The correct localization radius for cross-compartmental data assimilation is therefore highly dependent on the governing processes.
In this study, we investigate how we can use an Ensemble Kalman filter (EnKF) to assimilate sparse 2-metre-temperature ob-425 servations across the atmosphere-land interface. Because we focus on the coupling between temperature in the atmospheric boundary layer and soil moisture, we perturb only initial soil conditions to generate an ensemble of forecasts. All resulting deviations within the ensemble and between different experiments are therefore only a consequence of these initial soil conditions or due to data assimilation. With this idealized experimentation framework, we are able to prove that the soil moisture analysis can be improved by assimilating boundary layer observations.

430
The coupling of the land surface to the boundary layer drives this positive assimilation impact during day-time, whereas we have a neutral impact at night. An EnKF with hourly filtering can exploit this coupling, if the ensemble covariances are representative for the error covariances. To shape the ensemble covariances, a well-tuned horizontal localization is crucial for the cross-compartmental assimilation. In the case of representative ensemble covariances, additional updates of the boundary layer temperature increase the consistency of the analysis increments, which has an additional positive assimilation impact on 435 subsequent soil moisture analyses. This additional assimilation impact hints at a positive consequence of strongly-coupled data assimilation at the atmosphere-land interface.
The EnKF has smaller errors than the Simplified Extended Kalman filter (SEKF) to our NATURE run in both, the soil moisture and boundary layer temperature. The EnKF improves hereby the soil moisture analysis by a larger amount than the boundary layer forecast compared to the SEKF. Our offline data assimilation experiments reveal that this is related to the 440 finite-differences' approximation within the SEKF, which can be stabilized by using ensemble-based covariances. We further improve the soil moisture analysis with hourly-based filtering, as it is commonly used for data assimilation in the atmosphere.
This improvement by filtering indicates that we can include land surface variables in the ensemble-based analysis cycles of the atmosphere.
With a localized EnKF, we can skip the optimal interpolation step to create a 2-metre-temperature analysis. We find with our 445 offline data assimilation experiments that the additional assimilation impact of a fully-observed 2-metre-temperature field is small compared to the general assimilation impact with coarsely-distributed observations. Furthermore, the additional optimal interpolation step creates uncertainties in the temperature observations, which we have not taken into account in our offline data assimilation experiment. Three-dimensional ensemble-based data assimilation of boundary layer observations for the soil moisture is thus possible with localization. 450 We have a non-linear coupling between atmospheric boundary layer and land surface, because the strength of the coupling depends on the soil moisture itself. We only make a local linear assumption around the ensemble mean in the ensemble Kalman filter, and these non-linearities do not have a large impact on the results. The global non-linear structure nevertheless constrains the coupling between the atmosphere and land, and above very dry and humid soils, only limited information content is encoded in observations, which is extractable by direct assimilation of the observations.

455
Beside this dependence of the assimilation on the coupling and on the soil moisture, we also show that the temporal development of the boundary layer has an impact. This impact leads to a peak in information content around noon, whereas we have a decrease in the afternoon. A partial collapse of the boundary layer into a thin layer above the land surface initiates a reinforcement of the atmosphere-land coupling. We can more easily use the temporal development with hourly-filtering, whereas we might have problems with daily-smoothing as done within the SEKF, because we would have to select representative 460 observation times.
We can further exploit the temporal development of the boundary layer with hourly-smoothing instead of hourly-filtering.
Because land surface perturbations need some time to propagate into the atmosphere, one possibility would be to assimilate future observations within a given assimilation window and a 4D-LETKF (Harlim and Hunt, 2007;Kalnay et al., 2007), which would be similar to an Iterative Ensemble Kalman smoothing scheme (Kalnay and Yang, 2010;Sakov et al., 2012;Bocquet 465 and Sakov, 2014). Together with smoothing, we could additionally introduce time-dependent localization to tackle problems related to errors by the ensemble approximation of the covariances.
All in all, our results support the view that assimilation of boundary layer observations has a positive impact on the soil moisture, if the model system can adequately represent the governing processes in the boundary layer and land surface. We can therefore see this study as first step towards the goal of assimilating a unified set of observations across the atmosphere-land 470 interface to improve the analysis for both compartments.

Conclusions
In this study, we assimilate synthetic 2-metre-temperature observations into soil moisture in a fully-coupled limited-area model system for a seven-day period in Summer 2015. Based on our results in idealized twin experiments, we conclude the following: 1. Assimilation of boundary layer observations improves the soil moisture analysis during day-time and has no impact 475 during night; boundary layer observations yield the highest information content for land surface data assimilation above soil moisture saturations between 0.2 and 0.5.
2. Hourly-updating the soil moisture with a Localized Ensemble Transform Kalman filter results in a smaller error for the soil moisture analysis than daily-smoothing with a Simplified Extended Kalman filter, and in addition, we can directly assimilate sparse boundary layer observations across the atmosphere-land interface without an intermediate optimal 480 interpolation step.
3. Ensemble-based approximations of the background covariances and Jacobians stabilizes the analysis increments in a Simplified Extended Kalman filter.
4. Updating the atmospheric temperature together with the soil moisture increases the physical consistency in the analysis for the boundary layer and land surface, which in fact reduces additional errors in the soil moisture analysis.