Data assimilation for continuous global assessment of severe conditions over terrestrial surfaces

LDAS-Monde is a global offline Land Data Assimilation System (LDAS) that jointly assimilates satellite-derived observations of Surface Soil Moisture (SSM) and Leaf Area Index (LAI) into the ISBA (Interaction between Soil Biosphere and Atmosphere) Land Surface Model (LSM). This study demonstrates that LDAS-Monde is able to detect, monitor and forecast the impact of extreme weather on land surface states. Firstly, LDAS-Monde is run globally at 0.25° spatial resolution over 2010-2018. It is forced by the state-of-the-art ERA5 reanalysis (LDAS_ERA5) from the European Centre for Medium Range Weather Forecast (ECMWF). The behaviour of the assimilation system is evaluated by comparing the analysis with the assimilated observations. Then the Land Surface Variables (LSVs) are validated with independent satellite datasets of evapotranspiration, Gross Primary Production, Sun Induced Fluorescence and snow cover. Furthermore, in situ measurements of SSM, evapotranspiration and river discharge are employed for the validation. Secondly, the global analysis is used to (i) detect regions exposed to extreme weather such as droughts and heatwave events and (ii) address specific monitoring and forecasting requirements of LSVs for those regions. This is performed by computing anomalies of the land surface states. They display strong negative values for LAI and SSM in 2018 for two regions: North Western Europe and the Murray-Darling basin in South Eastern Australia. For those regions, LDAS-Monde is forced with the ECMWF Integrated Forecasting System (IFS) high resolution operational analysis (LDAS_HRES, 0.10° spatial resolution) over 2017-2018. Monitoring capacities are studied by comparing open-loop and analysis experiments again against the assimilated observations. Forecasting abilities are assessed by initializing 4and 8-day LDAS_HRES forecasts of the LSVs with the LDAS_HRES assimilation run compared to the openloop experiment. The positive impact of initialization from an analysis in forecast mode is

particularly visible for LAI that evolves at a slower pace than SSM and is more sensitive to initial conditions than to atmospheric forcing, even at an 8-day lead time. This highlights the impact of initial conditions on LSV forecasts and the value of jointly analysing soil moisture and vegetation states.

Introduction
Extreme events are likely to increase in frequency and/or magnitude as a result of anthropogenic climate change (IPCC, 2012, Ionita et al., 2017. Amongst all the natural disasters, droughts are arguably the most detrimental (Bruce, 1994;Obasi, 1994;Cook et al., 2007;Mishra and Singh, 2010;WMO 2017) as about one-fifth of damages caused by natural hazards can be attributed to droughts (Wilhite, 2000). They cost society billions of dollars every year (WMO, 2017). It is therefore important for communities to develop tools that can monitor and predict drought conditions (Svoboda, 2002;Luo and Wood, 2007;Blyverket et al., 2019) as well as their impact on land surface variables (LSVs) and society (Di Napoli et al., 2019). A major scientific challenge in relation to the adaptation to climate change is to observe and simulate how land biophysical variables respond to those extreme events (IPCC, 2012).
Droughts are generally caused by a lack of precipitation. However, different drought types are classified according to the part of the hydrological cycle that suffers from a water deficit (IPCC, 2014;Barella-Ortiz and Quintana-Seguí, 2018). They include meteorological droughts (lack of precipitation), agricultural droughts (deficit of water in the soil), hydrological droughts (deficit of streamflow or water level in rivers) and environmental droughts (a combination of the previous droughts types). Because of the effect of precipitation deficit on the whole hydrological system, all drought types are related (Wilhite, 2000). Complex interactions between continental surface and atmospheric processes have to be combined with human action in order to fully understand the wide ranging impacts of droughts on land surface conditions (Van Loon, 2015). As a consequence, Land Surface Models (LSMs) driven by high-quality gridded atmospheric variables and coupled to riverrouting system, are key tools to address these challenges (Dirmeyer et al., 2006;Schellekens et al., 2017). Initially developed to provide boundary conditions to atmospheric models, LSMs can now be used to monitor and forecast land surface conditions (Balsamo et al., 2015;Balsamo et al., 2018;Schellekens et al., 2017). Additionally, the representation of LSVs by LSMs can be improved by coupling LSM's with other models of the Earth system like atmosphere, oceans and river routing (e.g., de Rosnay et al., 2013Rosnay et al., , 2014Kumar et al., 2018, Balsamo et al., 2018Rodríguez-Fernández et al., 2019;Muñoz-Sabater et al., 2019). Earth Observations (EOs) provide long-term records, which can complement LSMs. Satellite products are particularly relevant for the monitoring of LSVs. Satellite EOs related to the terrestrial hydrological, vegetation and energy cycles are now available globally, at kilometric scales and below (e.g., Lettenmaier et al., 2015, Balsamo et al., 2018. Combining EOs and LSMs through Land Data Assimilation Systems (LDASs) can lead to enhanced initial land surface conditions (e.g. Reichle et al., 2007;Lahoz and De Lannoy, 2014;Kumar et al., 2018;Albergel et al., 2017Albergel et al., , 2018aAlbergel et al., , 2019Balsamo et al., 2018). Subsequently, this can benefit weather forecasts, including temperature and precipitation. It can also indirectly benefit agricultural and vegetation productivity prediction, streamflow prediction, warning systems for floods and droughts and the representation of the carbon cycle (Bamzai and Shukla, 1999;Schlosser and Dirmeyer, 2001;Bierkens and van Beek, 2009;Koster et al., 2010;Bauer et al., 2015;Massari et al, 2018;Albergel et al., 2018a, 2019, Rodríguez-Fernández et al., 2019Muñoz-Sabater et al., 2019). Amongst the current land-only LDAS activities several are led by NASA (National Aeronautics and Space Administration) projects. Examples of such activities are the Global Land Data Assimilation System (GLDAS, detect, monitor and forecast the impact of extreme events on LSVs. The following items are presented and discussed: • An evaluation of LDAS-Monde at a global scale is carried out. This assessment involves the assimilated observations to demonstrate that the system is working as intended. Importantly, LDAS-Monde is then validated using diverse, independent and complementary satellite-derived datasets of evapotranspiration (EVAP) from the GLEAM project (Miralles et al., 2011, Martens et al., 2017, Gross Primary Production (GPP) from the FLUXCOM project (Tramontana et al., 2016, Jung et al., 2017, Solar Induced Fluorescence (SIF) from the GOME-2 (Global Ozone Monitoring Experiment-2) scanning spectrometer (Munro et al., 2006, Joiner et al., 2016 and snow cover data The paper is organised in five sections: section 2 details the various components constituting LDAS-Monde (the ISBA LSM, the data assimilation scheme, the EOs assimilated as well as the different atmospheric forcing datasets used), followed by the experimental and evaluation setup.
Section 3 describes and discusses the impact of the analysis on the representation of the LSVs.
Section 4 details the identification of 2 case studies over regions particularly affected by extreme heatwave events during 2018. Furthermore the detailed monitoring and land surface forecasts of these events are presented at higher spatial resolution. Finally section 5 provides conclusions and prospects for future work.

Material and methods
The following subsections briefly describe the main components of LDAS-Monde: the ISBA LSM, its data assimilation scheme and two other key elements of the setup: atmospheric forcing and assimilated satellite derived observations. The experimental setup and the evaluation datasets used in this study are also presented.  (Calvet, et al., 1998, 2004, Gibelin et al., 2006, multilayer diffusion scheme (Boone et al., 2000, Decharme et al., 2011 version of the ISBA LSM Planton, 1989, Noilhan andMahfouf, 1996)   provides these parameters for each patch and each grid cell of the ISBA model.

CTRIP river routing system
The ISBA-CTRIP river routing system is able to simulate continental scale hydrological variables based on a set of three prognostic equations. They correspond to (i) the groundwater, (ii) the surface stream water and (iii) the seasonal floodplains. It converts the runoff simulated by ISBA into river discharge. The ISBA-CTRIP river-routing network has a spatial resolution of 0.5°g lobally and is coupled daily with ISBA through the OASIS3-LCT coupler (Voldoire et al., 2017). The SEKF used in LDAS-Monde is a 2-step sequential approach in which a prior forecast step is followed by an analysis step. The prior forecast propagates the initial states to the next time step with the ISBA LSM and the analysis step then corrects this forecast by assimilating observations.
The flow-dependency (dynamic link) between the prognostic variables and the observations is ensured in the SEKF through the observation operator and its Jacobians, which propagate information from the observations to the analysis via finite-difference computations (de Rosnay et al., 2013). The Jacobian matrix has as many rows as assimilated observation types (two in our case: SSM and LAI) and as many columns as model control variables requested (8 in our case, soil moisture from layers 2 to 8 and LAI). In addition to a control run (i.e. the forecast step), computing the Jacobian matrix requires perturbed runs, one for each control variable. The eight control variables are directly updated using their sensitivity to observed variables (i.e. defined by the Jacobian). Other variables are indirectly modified through biophysical processes and feedback from the model. Several studies (e.g. Draper et al., 2009;Rüdiger et al., 2010) have demonstrated that small perturbations lead to a good linear approximation of the model behaviour, provided that computational round-off error is not significant. Typically, for those runs, the initial state of the control variable is perturbed by about 0.1% (see Albergel et al., 2017;Rüdiger et al., 2010). The length of the LDAS-Monde assimilation window is 24 hours. A mean volumetric standard deviation error of 0.04 m 3 m −3 is prescribed for soil moisture in the second layer of soil (i.e. the model equivalent of the observations, between 1 and 4 cm), it is 0.02 m 3 m −3 for soil moisture in deeper layers (soil layers 3 to 8, 4-100cm). Both are then scaled by the dynamic range of soil moisture (the difference between the volumetric field capacity and the wilting point, calculated as a function of the soil type, as given by Noilhan et Mahfouf, 1996). The observational SSM error follows the same approach and a value of 0.05 m 3 m -3 is used. This is consistent with errors typically expected for remotely sensed SSM (e.g., de Jeu et al., 2008, Gruber et al., 2016. Based on previous results from Jarlan et al., 2008, Rüdiger et al., 2010and Barbu et al., 2011

Atmospheric forcing
The lowest level of the atmospheric model (about 10 metres above ground level) of air temperature, wind speed, specific humidity and pressure, the downwelling fluxes of shortwave, longwave radiations as well as precipitation (partitioned in solid and liquid phases) are needed to force LDAS-Monde. In this study, LDAS-Monde is driven by several near-surface meteorological fields from ECMWF: • its most recent atmospheric reanalysis (ERA5) to produce an LDAS-Monde global reanalysis • its high resolution Integrated Forecast System (IFS HRES) to monitor and predict the evolution of LSVs for regions under severe droughts and heatwaves.
ERA5 (Hersbach et al., 2018(Hersbach et al., , 2020  ASCAT stands for Advanced Scatterometer, which is an active C-band microwave sensor that is onboard the European MetOp polar orbiting satellites (METOP-A, from 2006, B from 2012 and also C from 2019). From ASCAT radar backscatter coefficients, it is possible to derive information on SSM following a change detection approach (Wagner et al., 1999, Bartalis et al., 2007. The recursive form of an exponential filter (Albergel et al., 2008) is then applied to estimate the SWI using a timescale parameter, T (varying between 1 day and 100 days). T is a surrogate parameter for all the processes potentially affecting the temporal dynamics of soil moisture including soil hydraulic properties, soil layer thickness, evaporation, runoff and vertical gradient of soil properties.
The obtained SWI then ranges between 0 (dry) and 100 (wet). In this study, CGLS SWI-001 (produced with a T-value of 1 day) is used as a proxy for SSM (Kidd et al., 2013). Grid points with an average altitude exceeding 1500 m above sea level as well as those with more than 15 % of urban land cover are rejected as those conditions are known to inhibit the retrieval of SSM from space. Prior to the assimilation, SSM has to be converted from the observation space to the model space. This is done through a linear rescaling as proposed by Scipal et al. (2007), where the mean and variance of observations are matched to the mean and variance of the modelled soil moisture from the second layer of soil (1-4 cm depth). In practice, the rescaling gives similar results to CDF (cumulative distribution function) matching. The linear rescaling is performed on a seasonal basis (with a 3-month moving window) as suggested by Draper et al., (2011) and Barbu et al., (2014).
The LAI GEOV1 observations are based on data from both SPOT-VGT (up to 2014) and PROBA-V (from 2014) satellites. They span from 1999 to present, have 1 km spatial resolution and are produced according to the methodology developed by Baret et al. (2013). LAI GEOV1 observations have a temporal frequency of 10 days at best and no observations are available during cloudy conditions. LAI data are masked in the presence of modelled snow by the ISBA LSM.
As in previous studies (e.g, Barbu et al., 2014, Albergel et  This 9-yr global reanalysis is then used to provide a monthly climatology for estimating anomalies of the land surface conditions. For each month (and variable considered) of 2018 we have removed the monthly mean and scaled by the monthly standard deviation of the 2010-2018 period.
Significant anomalies are used to trigger more detailed monitoring and forecasting activities for a region of interest. A total of 19 regions across the globe have been selected, which are known for being potential hot spots for droughts and heatwaves. They are listed in Table I Table II.

Evaluation datasets and metrics
Both satellite-derived estimates of EOs and in situ measurements are used as reference datasets in this study. The LDAS_ERA5 analysis performance is assessed with respect to the open-loop model run (i.e. no assimilation). The two assimilated datasets, CGLS SSM and LAI, are firstly used to verify that the data assimilation is behaving as expected. Then several independent datasets are used for the validation, namely evapotranspiration from the GLEAM project (Miralles et al., 2011, Martens et al., 2017, version 3b entirely satellite driven), GPP from the FLUXCOM project (Tramontana et al., 2016, Jung et al., 2017, SIF from the GOME-2 (Global Ozone Monitoring Experiment-2) scanning spectrometer (Munro et al., 2006, Joiner et al., 2016  https://www.natice.noaa.gov/ims/). The IMS snow cover product combines ground observations and satellite data from microwave and visible sensors (using geostationary and polar orbiting satellites) to provide snow cover information in all weather conditions. The IMS product is available daily for the northern hemisphere.
Soil moisture is validated using in situ measurements of soil moisture from the ISMN, a pool of stations which consists of 19 networks across 14 countries (see Table S3). In total, 782 stations are represented with at least 2 years of daily data over 2010-2018. In situ measurements at 5 cm depth (SSM) are compared with soil moisture from the third layer of soil (4-10 cm) in LDAS_ERA5. In situ measurements at 20 cm depth are compared with LDAS_ERA5 soil moisture from the fourth layer of soil (10-20 cm, 685 stations from 10 networks). Besides 11 stations located in 4 countries of Western Africa (Benin, Mali, Sénégal and Niger) and 21 stations in Australia, most of the stations are located in North America and Europe (see Table S3).
Evaluation datasets are listed in Table III 1)).
Eq. (1) Regarding the SIF satellite dataset, fluorescence is not simulated directly in the ISBA LSM.
However, photosynthesis activity is simulated through the calculation of the GPP, which is driven by plant growth and mortality in the model. Modelled GPP values are expressed in g(C)·m −2 ·day −1 , while SIF is an energy flux emitted by the vegetation (mW·m −2 ·sr −1 ·nm −1 ). Hence, GPP and SIF cannot be directly compared as they do not represent the same physical quantities. However, several studies (e.g, Zhang et al., 2016, Sun et al., 2017, Leroux et al., 2018 have found a high correspondence in both time and space between those two variables, highlighting the potential of SIF products to support the validation of modelled GPP. Therefore, the correlation between modelled GPP and observed SIF is used as an evaluation metric. Concerning the snow cover dataset, differences between observed and modelled snow cover is considered for the evaluation.
For in situ datasets of soil moisture and evapotranspiration, the standard metrics are considered, namely the correlation coefficient, RMSD, unbiased RMSD and bias. Moreover, a Normalized Information Contribution (NIC, Eq.(2)) measure is applied to the correlation values to quantify the improvement or degradation due to the specific configuration. In addition, for surface soil moisture, the correlation is calculated for both absolute (R) and anomaly (R anomaly ) time-series in order to remove the strong impact from the SSM seasonal cycle (see e.g. Albergel et al., 2018aAlbergel et al., , 2018b. The Nash-Sutcliffe Efficiency score (NSE, Nash and Sutcliffe, 1970, Eq.( 3)) is used to evaluate LDAS_ERA5 experiments ability to represent the monthly discharge dynamics.
Eq. (3) where Q s mt is the monthly river discharge from LDAS_ERA5 ( to the observations. Improvements in the analysis fit are visible between nearly 80° North to about 55° South and areas around the equator are impacted the most from the assimilation. This demonstrates that the data assimilation system is working as intended. A smaller impact is obtained for SSM, GPP and EVAP relative to LAI, which is hardly visible at this scale. The mean latitudinal results show a consistent difference in terms of GPP and EVAP between LDAS_ERA5 and the observational products. These differences are systematic with higher values in tropical regions.  , and open water (e.g. lakes). In ISBA, the fraction of the 12 land cover types over some areas departs from prevalent land cover products such as CLC2000 (Corine Land Cover) and GLC2000 (Global Land Cover). It could potentially impact the distribution of the terrestrial evaporation between GLEAM and ISBA. Further work at CNRM will focus on understanding the differences between ISBA and GLEAM, in particular investigating the sub-components of terrestrial evaporation.
Finally, Figure S1 and Figure S2 illustrate snow cover evaluation. LDAS_ERA5 snow cover is evaluated against the IMS snow cover. Figure S1 shows the averaged northern hemisphere snow cover fraction for the 2010-2018 period. It is complemented by Figure S2   Results on river discharge are illustrated by Figure 8 (panels a and b). neutral impact from the analysis (with a NIC ranging between -3% and +3%) while 26% (254 stations) present a significant impact (with a NIC above +3% or below -3%). When the analysis significantly impacts the representation of river discharge, this impact tends to be positive. Indeed, 74% of this subset of stations (189 stations) have a NIC score greater than 3% while only 26% (65 stations) show NIC score smaller than -3%.
The statistical scores for soil moisture from LDAS_ERA5 open-loop and analysis are presented for the third and fourth layers of soil, corresponding to 4-10 cm depth and 10-20 cm depth respectively.
The soil moisture at layers 3 and 4 is compared with ground measurements over 2010-2018 from the ISMN at depths of 5 cm and 20 cm respectively. The results are displayed in Table S3   LDAS_fc8. It is worth pointing out that for the MUDA area there is a small positive impact of the initialisation on the 4-d and 8-d forecast of surface soil moisture (Figure 13a, b). These results suggest that the fast evolving SSM model variable is more sensitive to the atmospheric forcing than to the initial conditions (at least within the forecast range presented in this study). Results for LAI are different from SSM (lower panels of Figure 12 and Figure 13). Firstly, there is a large Finally, the top panels of Figure 17 illustrate the impact of the analysis on drainage monitoring and forecasts over WEUR. Fig. 17 a) represents drainage from the LDAS_HRES open-loop with values ranging between 0 and 1 kg.m -2 .day -1 . Fig.17 b) shows the drainage difference between LDAS_HRES analysis and open-loop. The analysis impact on drainage is rather small (within ±3%) and more pronounced in areas where the analysis has largely affected LAI (see panels f, g and h of Figure 14). As seen in Figure 17 (c) and (d), the forecasts are also sensitive to the initialisation in areas where the analysis effectively corrected LAI. The bottom panels of Figure 17 illustrate a similar impact on runoff. Although we did not validate drainage and runoff in this study, previous findings suggest a neutral to positive impact of the analysis on river discharge through modifications to drainage and runoff (Albergel et al., , 2018a.

Discussion and conclusions
This study has demonstrated the potential of LDAS-Monde for assimilating Earth Observations shown in this study that if the snow accumulation seems to be represented correctly in the system, the onset of snow-melt is too early in the spring. To overcome this issue, two possibilities will be explored. Firstly, a recently developed ISBA parametrisation, MEB (Multiple Energy Budget), is known to lead to a better representation of the snowpack (Boone et al., 2017). This could be particularly useful in the densely forested areas of the Northern Hemisphere where large differences between LDAS-Monde and the IMS snow cover were found in spring ( Figure S2(i), Aaron Boone CNRM, personal communication June 2019). Another enhancement of LDAS-Monde will be to adapt the current data assimilation scheme to permit the assimilation the IMS snow cover data, which is implemented at NWP centres such as ECMWF (de Rosnay et al., 2014). The current SEKF data assimilation scheme is also being revisited. Even though it has provided good results, one of its limitations is the computational cost of the Jacobian matrix, which needs one model run for each 22 695 700 705 710 715 720 control variable. As the number of control variables is expected to increase, this approach would require significant computational resources. Therefore, more flexible ensemble based data assimilation approaches have recently been implemented in LDAS-Monde, such as the Ensemble Square Root Filter (EnSRF, Fairbain et al., 2015, Bonan et al., 2020. Bonan et al., 2020 have evaluated performances from the EnSRF and the SEKF over the Euro-Mediterranean area. Both data assimilation schemes have a similar behaviour for LAI while for SSM, the EnSRF estimates tend to be closer to observations than those from the SEKF. They have also conducted an independent evaluation of both assimilation approaches using satellite estimates of evapotranspiration and GPP together with river discharge observations from gauging stations. They have found that the EnSRF gives a systematic (moderate) improvement for evapotranspiration and GPP and a highly positive impact on river discharges, while the SEKF lead to more contrasting performance. As for applications in hydrology, the 0.5° spatial resolution TRIP river network is currently being improved to 1/12° globally.
CNRM is also investigating the direct assimilation of ASCAT radar backscatter (Shamambo et al., 2019). This has the potential to improve the way vegetation is accounted for in the change detection approach used to retrieve SSM with an improved representation of its effect. Assimilating ASCAT radar backscatter also raises the question of how to properly specify SSM observation, background, and model error covariance matrices, which are currently based on soil properties (see section 2.1.3 on data assimilation). The last decade has seen the development of techniques to estimate those matrices. Approaches based on Desroziers diagnostics (Desroziers et al., 2005) are computationally affordable for land data assimilation systems and could provide insightful information on the various sources of the data assimilation system.          Table S1 over 2010-2016. (b) Table 1 and Figure 2).   Table I