Satellite soil moisture data assimilation for improved operational continental water balance prediction

A simple and effective two-step data assimilation framework was developed to improve soil moisture representation in an operational large-scale water balance model. The first step is the sequential state updating process that exploits temporal covariance statistics between modelled and satellite-derived soil moisture to produce analysed estimates. The second step is to use analysed surface moisture estimates to impart mass conservation constraints (mass redistribution) on related states and 10 fluxes of the model in a post-analysis adjustment after the state updating at each time step. In this study, we apply the data assimilation framework to the Australian Water Resources Assessment Landscape model (AWRA-L) and evaluate its impact on the model's accuracy against in-situ observations across water balance components. We show that the correlation between simulated surface soil moisture and in-situ observation increases from 0.54 (open-loop) to 0.77 (data assimilation). Furthermore, indirect verification of root-zone soil moisture using remotely sensed vegetation time series across cropland areas 15 results in significant improvements of 0.11 correlation units. The improvements gained from data assimilation can persist for more than one week in surface soil moisture estimates and one month in root-zone soil moisture estimates, thus demonstrating the efficacy of this data assimilation framework.

assimilation merges models and observations in a way that take advantage of their respective strength (e.g. uncertainty, coverage), resulting in improved accuracy, coverage, and ultimately forecasting capability.
These commonalities are such, that for any time step, the time integrated first guess (the forecast) of soil moisture states are adjusted by an amount determined by the difference between observed and modelled soil moisture (the innovation), which is 35 weighted by the respective error variances of modelled and observed quantities (the gain), to generate revised soil moisture states (the analysis). At the end of this process, the revised model soil moisture states are out of balance with the other stores and fluxes, until the model integrates forward to the next time step, whereupon water balance discontinuity is progressively removed through model physics. Soil moisture is the linchpin between atmospheric fluxes, surface-and ground-water hydrology, thus it is important that any changes in modelled state variables are not detrimental to other components of the 40 water balance. As the assimilation of remotely sensed soil moisture or total water storage data may lead to undesired impacts on groundwater or evapotranspiration simulations due to the mass imbalance or random error covariances (Girotto et al., 2017;Tangdamrongsub et al., 2020;Tian et al., 2017). However, studies considering mass conservation in data assimilation often requires extra data sources such as evapotranspiration and runoff as constraints or only redistributing water into states without considering the fluxes (Li et al., 2012;Pan and Wood, 2006). 45 From an operational water balance perspective, is it important that the method of data assimilation be: i) computationally efficient for routine, automated simulation over the whole model domain; ii) robust to data gaps; and iii) make lasting positive improvements to future predictions of soil water stores and fluxes. An additional constraint is that if a DA method is applied to an existing operational system, then it ought to require minimal modification to the system framework, and be as least disruptive as possible to the model performance. Currently, there are few operational continental water balance modelling 50 systems that provide near-real time soil moisture estimates that have been constrained through the assimilation of satellite observations, and mainly at a relatively coarse resolution. Some recent examples include surface soil wetness observations from Advanced Scatterometer (ASCAT) active radar system, on the meteorological operational satellite (MetOp), being assimilated into Unified Model (Davies et al., 2005) through nudging to provide soil moisture analysis at 40 km globally (Dharssi et al., 2011). Additionally, ASCAT data are used in the ECMWF (European Centre for Medium-Range Weather 55 Forecasts) Land Data Assimilation System through a simplified Extended Kalman Filter approach (de Rosnay et al., 2013) to provide near-real time surface soil moisture and root-zone soil moisture at 25-km resolution globally. SMOS (Soil Moisture and Ocean Salinity) brightness temperatures have been assimilated in ECMWF's global NWP (Numerical Weather Prediction) system through the Surface Data Assimilation System, based on the Extended Kalman filter, to produce soil moisture reanalysis https://doi.org/10.5194/hess-2020-485 Preprint. Discussion started: 12 October 2020 c Author(s) 2020. CC BY 4.0 License. as evapotranspiration, runoff and deep drainage. The soil water column is partitioned into three layers (surface: 0-10 cm, shallow: 10-100 cm, and deep: 1-6 m) and simulated separately for deep-and shallow-rooted vegetation. The water storage for the surface-layer soil is termed ! , while " is used for the shallow-layer and # for the deeper-layer. In addition to the modelling of soil columns, the model includes a surface water and a groundwater storage that are simulated at each grid cell and conceptualized as a small unimpaired catchment. In this study, we used forcing inputs from the AWAP (Australian Water 95 Availability Project) gridded climate data including daily precipitation, air temperature and solar exposure (Jones et al., 2009), and interpolated site-based wind speed (McVicar et al., 2008). It is acknowledged that the accuracy of the model estimates is limited in regions with insufficient coverage in the ground-based observation network (e.g. rain gauges) which is the raw source of AWAP gridded data used to force the model. This is limited to very remote and mostly uninhabited arid regions in Australia. 100

Satellite soil moisture (SSM)
To maximize daily spatial coverage, we used two satellite soil moisture products derived from passive L-band systems: the Soil Moisture Active-Passive (SMAP) product from NASA (Entekhabi et al., 2010); and the product from the European Space Agency's (ESA's) Soil Moisture and Ocean Salinity (SMOS) mission (Kerr et al., 2001). The SMAP product is the level-2 enhanced radiometer half-orbit 9-km EASE-grid soil moisture (Chan et al., 2018). The SMOS product is the level-2 soil 105 moisture product on ~ 25-km grid (Rahmoune et al., 2013). Both SMAP and SMOS produce volumetric soil moisture estimates (units m 3 /m 3 ) of approximately the upper 5 cm of soil. Available swath data for each product covering Australia were sourced and collated for each 24-hour period approximating the AWRA-CMS operational time steps and interpolated to a regular 0.05degree grid across the modelling domain from 2015 to 2019. The volumetric soil moisture retrievals from both SMAP and SMOS were converted into water storage units (mm) to be consistent in units and soil depths with model estimates, using mean 110 and variance matching to remove the systematic bias. The result of mean and variance matching in these gauge-sparse areas will flatten the dynamics of SSM time series to zero. To 115 resolve this problem, and fully leverage the information available in the SSM products in order to fill the gaps in modelled outputs for gauge-sparse regions of the continent, we derived a set of coefficients for the rescaling by sampling modelled and SSM data from cells surrounding the gaps. This ensures the assimilation can effectively impart a more realistic spatial pattern of soil moisture over the sparsely gauged regions. 120 https://doi.org/10.5194/hess-2020-485 Preprint. Discussion started: 12 October 2020 c Author(s) 2020. CC BY 4.0 License.

In-situ measurements
Evaluation of the modelled soil water storages was made against measurements from three soil moisture monitoring networks in Australia from 2016 to 2018, namely OzNet (Smith et al., 2012), CosmOz (Hawdon et al., 2014) and OzFlux (Fig.1d). 125 AWRA-L model estimates of water storage in surface soil layer were compared against in situ measurements from the top 10 cm of soil across all three networks. The depths of in situ measurements of root-zone moisture varied across networks from 0-30 cm to 0-1 m. As such, AWRA-L soil water storages over the root-zone were constructed by combining surface-and shallowlayer soil water storage in the appropriate proportions to be consistent with in situ measurement depth. OzFlux sites are also used for the evaluation of AWRA-L evapotranspiration estimates, which were calculated from accumulated latent heat flux 130 measurements at each location. In total, there are 45 sites for soil moisture validation and 14 sites for evapotranspiration validation. Streamflow observations for 110 catchments across Australia have been used in the validation based on the quality and data availability (Fig. 1d).

Vegetation index
In water-limited regions like Australia, shallow-rooted vegetation normally responds quickly to soil water availability, 135 typically within a month. Consistency between root-zone soil water storage and vegetation greenness may be considered as an indirect independent verification of the simulation of root-zone soil water dynamics (Tian et al., 2019a;Tian et al., 2019b). The 0.05-degree monthly Normalized Difference Vegetation Index (NDVI) from Moderate Resolution Imaging Spectroradiometer (MODIS, MYD13C2 v6) were used to evaluate estimates of monthly root-zone soil water storage (the sum of water storage in surface-layer ( ! ) and shallow-layer ( " ) within the AWRA-L soil column) over cropland regions of the continent. The 250-140 m land cover classification map from Geoscience Australia (Lymburner, 2015) was resampled to 0.05 degree over the model domain and used in the identification of cropland areas.

Triple collocation-based error characteristics
Triple collocation (TC) was developed as a method of quantifying error characteristics in geophysical variables when the 145 true error structure is elusive. It was first applied to near-surface wind data (Stoffelen, 1998) and later extensively applied to soil moisture (Chen et al., 2018;Crow and Yilmaz, 2014;Dorigo et al., 2017;McColl et al., 2014;Scipal et al., 2008;Su et al., 2014b;Yilmaz and Crow, 2014;Zwieback et al., 2013) and rainfall (Alemohammad et al., 2015;Massari et al., 2017). The assumption of this approach is that three independent data sets of the same geophysical variable can be used to infer the error variances in each. Here we use TC as a way of inferring error variances from our three independent estimates of surface soil 150 moisture, AWRA-L ! , SMAP, and SMOS from 2015 to 2019. Those three collocated measurements were assumed to be linearly related to the true value with additive random errors. To ensure the errors from the three independent sources were unbiased relative to each other, SMAP and SMOS soil moisture retrievals were rescaled to the reference model estimates (AWRA-L ! ) using temporal mean and variance matching. McColl et al. (2014) shows that the error variances of each data set can be calculated from the temporal variance and covariance between data sets respectively as: 155 (1) where x, y and z denote AWRA-L, SMAP and SMOS soil moisture estimates respectively and Q denotes temporal variance and covariance between the data sets. These estimates of error variance are used in the determination of the weighting of each data source in the data assimilation (Section 3.2).

Sequential state updating 160
The data assimilation method used here is a time sequential updating of model state(s) given observations of relevant model variables (Reichle, 2008). There are two key modelling components in data assimilation: the dynamics operator, which describes the time integration of the system states and fluxes, which in this study is the AWRA-CMS; and the observation operator, which provides the mathematical mapping from state to observation space. The role of the observation operator is to perform a mapping between observation and state space, as often observations are not directly comparable to model states. 165 The common state updating equation for sequential data assimilation is written as: which says that the best estimate of model state, known as analysis ( + , ), is equal to the first guess or forecast ( + -) plus a weighted difference between observations, + , and the model equivalent to the observation, 4 + -5, for that time step. In this study, the AWRA-L model soil water storage in S0 for shallow-rooted vegetation and deep-rooted vegetation at surface layer 170 are updated directly through the sequential data assimilation. Satellite surface soil moisture (SSM) products from both SMOS and SMAP are used as the observations to update the model simulation. The observation operator here is the aggregation of soil water storage in S0 for shallow-rooted vegetation and deep-rooted vegetation, denoting the soil water content for each grid cell. The multiplier, + , is known as the gain factor which contains uncertainty expressed as error variance for both model estimates ( . & ) and observations ( / & ). For a unity observation operator and assuming independence between model estimates 175 and observations, the gain factor typically assumes the form: (3) https://doi.org/10.5194/hess-2020-485 Preprint. Discussion started: 12 October 2020 c Author(s) 2020. CC BY 4.0 License.
The gain factor, , contains information on the error variances of the model and observations. Observation error variance is often estimated through field campaigns (Draper et al., 2009;Panciera et al., 2013), but these rarely represent the spatial and temporal variability of errors in gridded satellite products. Alternatively, data providers often specify error estimates, but their 180 magnitude can be overly optimistic. Here, we applied the triple collocation approach (Section 3.1) to characterise the errors in model estimates and satellite observations (Crow and Van den Berg, 2010). Therefore, the gain factor is temporally constant but spatially varying. The analysis receives higher contribution from observation with smaller error variance (Eq. 2).

Analysis increment redistribution (AIR)
The assimilation of satellite soil moisture temporarily violates mass conservation in the model through the analysis update. 185 The difference between the analysis, 2 3 , and the forecast, 2 4 , (known as the analysis increment) represents an amount of water that has been added or subtracted from the system that was not present at the start of model integration for the given time step. In this study, we use the concept of tangent linear modelling (Errico, 1997;Giering, 2000) to redistribute the analysis increment of surface soil water storage, ! , to all the relevant model states and fluxes as a way of maintaining mass (i.e. water) balance within each model time step. This adjustment is applied after the sequential state updating as the second-190 step in the assimilation framework, which we refer to as analysis increment redistribution (AIR).
The adjoint and tangent linear models were originally used in variational data assimilation (Bouttier and Courtier, 2002) and have been used to estimate the sensitivity of model outputs with respect to input (Errico, 1997).We assume the input perturbation here is the analysis increment after the data assimilation (i.e. 2 3 -2 4 from Eq. 2), then the change in other model outputs due to the change in inputs can be determined through tangent linear modelling. Assuming model variable b is 195 related to the state variable x, the relationship between them can be simply described as: where M denotes the model operator. The change in output variable ∆ at time step t due to the input change ∆ can be determined by In this study, we applied the tangent linear modelling approach to correct the model forecast of soil water storage for shallow-layer ( " ) , and deep-layer soil water storage ( # ), evapotranspiration ( +$+ ) , and total streamflow ( +$+ ) after the state updating of surface soil moisture ( ! ) at each time step. Note that this process ensures that the correction is affecting all model states in proportion to their sensitivity against changes in the ! . 205 https://doi.org/10.5194/hess-2020-485 Preprint. Discussion started: 12 October 2020 c Author(s) 2020. CC BY 4.0 License.

Impact on surface soil water storage estimates
Error variances were derived using TC for AWRA-L model estimates and the SSM products, and showed that for the majority of the grid cells over the continent SMAP soil moisture had smaller error variance than SMOS and the model estimates. This is consistent with other studies that have shown SMAP provides the best-performing satellite soil moisture 210 product over the majority of applicable global land pixels (Chen et al., 2018). Figure 2 shows the relative weightings (derived from the TC error variances) of model estimates, SMOS and SMAP soil moisture in the data assimilation. The analysed surface soil water storage estimates ( ! ) receive a greater contribution from SSM products, in particular SMAP observations, compared to model simulations (Fig. 2). Figure 3 gives an example of the temporal change in modelled ! estimates before and after the assimilation for 2017. The temporal dynamics of ! estimates after the assimilation has been 215 highly adjusted towards SSM retrievals. AWRA-L model simulations are driven by gauge-based rainfall analyses. As such the model has difficulty in adequately simulating soil moisture patterns over regions lacking in rain gauge coverage, such as Western Australia and central Australia (Fig. 1c). Water storage simulations over these regions default to zero, thus very little or no weight was given to the AWRA-L estimates in these regions (Fig. 2a). Figure 4 shows different spatial patterns of daily average ! estimates for 220 December 2019 from model open-loop (OL) without data assimilation and with data assimilation through TC-derived weighting (DA-TC). Data assimilation has the effect of adding moisture to AWRA-L ! simulations over most of gaugesparse areas as shown in Figure 4c. Analysed AWRA-L simulations of ! are dominated by the satellite SSM data as a result of TC weighting in the region which largely eliminates the erroneous artefacts associated with deficient rainfall data forcing.
Reduced water storage in the surface layer of the soil column was found over southeast of Australia, particularly within the 225 Murray-Darling Basin. This suggests that AWRA-L OL simulations underestimated the severity of the drought experienced in the region in December 2019. The analysis increments of AWRA-L ! ( − ) were compared with the difference between in-situ rainfall observations from OzFlux network, :*;<=% and AWAP rainfall forcing, >?>@ , (Fig. 5). The increasing ! simulations align with missing or underestimated rainfall events in the AWAP rainfall forcing ( :*;<=% − >?>@ > 0) and vice versa (Fig. 5). This supports the hypothesis that data assimilation correctly distributes 230 water into the system and mitigates the impact of uncertainty in rainfall forcing.

Impact on root-zone soil water storage and fluxes estimates
If the analysis increment redistribution (AIR) is not applied, the soil water storage in the surface layer ( ! ) is the only state variable directly updated with SSM (DA-TC). Other variables such as root-zone soil water storage, evapotranspiration and streamflow are adjusted with model integration to the next time step using the analysed ! as the surface layer initial condition. 235 Therefore, the observed changes in those variables following DA-TC (Fig.6, centre column) are relatively small when compared to model open-loop estimates (Fig.6, left column). For example, the OL soil water storage of shallow-layer ( " ) estimates in those gauge-sparse regions of Australia remain zero or very low due to the AWAP rainfall forcing. The predictions of " receive relatively small contribution from the analysed ! since the analysis increment of ! is small compared to the field compacity of " . 240 One known issue of sequential state updating is the temporary break of water balance at each time step until the next model integration. The proposed AIR approach (Section 3.2) adjusts variables coupled with surface soil moisture after the state updating at each time step. Significant difference in the spatial patterns of " , +$+ and +$+ after the mass redistribution (DA-TCAIR) can be seen in Fig. 6 (right column) compared to model open-loop or forecasts after only ! updating. The changes in estimates of " and +$+ over coastal regions are relatively small due to more accurate rainfall forcing data with the dense 245 network of rain-gauges. Finally, the +$+ estimates after AIR are lower than the DA-TC and OL. This reduction in streamflow over south-eastern Australia and northern Australia is consistent with the reduced surface soil moisture in those regions (Fig.4c).

Quantitative evaluation
Estimates of surface soil moisture, root-zone soil moisture, evapotranspiration and streamflow after data assimilation (DA-TC) and data assimilation with mass redistribution (DA-TCAIR) were compared with time series of in-situ observations. We 250 compared the model outputs after DA-TC and DA-TCAIR separately to investigate the benefits of maintaining mass balance in data assimilation. Pearson's correlation coefficients were computed from time series of model estimates and observations between January 2016 to December 2018 for each site. The distribution of correlation coefficients for OL, DA-TC and DA-TCAIR are displayed as boxplots in Figure 7. Consistent, significant improvement in modelled surface layer soil water storage estimates (S ! ) were observed across all sites (Fig. 7a) with the single exception of an OzFlux site located in a tropical rainforest, 255 where microwave SSM retrievals are known to be typically poor (Njoku and Entekhabi, 1996). TC-based assimilation (DA-TC) increases the correlation between in-situ surface SM measurements from 0.47 to 0.72 on average for CosmOz sites, 0.54 to 0.69 for OzFlux sites, and 0.56 to 0.77 for OzNet sites compared to OL. This is a significant improvement in AWRA-L simulations of surface soil moisture dynamics with an increase in correlation of 0.23 on average across all in-situ sites.
Overall subtle improvements were observed across the AWRA-L estimates of root-zone soil water storage, evapotranspiration 260 and streamflow after the assimilation (DA-TC) (Fig. 7b, c, d). The level of improvement is not surprising since those variables were not directly updated through DA-TC and are only influenced through the integration of the model to the next time step.
Degradation were found in root-zone soil moisture estimation for a few OzFlux and OzNet monitoring sites. This is likely due to the break of water balance in the assimilation, since the estimates followed by the second step of AIR (DA-TCAIR) slightly increases the correlation with in-situ observations compared to model open-loop and the estimates after assimilation without 265 mass redistribution (DA-TC). Moreover, the model estimates of root-zone soil moisture from model OL are in good agreement with in-situ observations as is with average correlation above 0.8 (Fig. 7b), which leaves little room for improvements. Although the corrections of other water balance estimates from the analysis increments redistribution are relatively small compared to direct state updating, they are improvements nevertheless. Slight improvements were similarly found in streamflow estimates after the AIR (Fig. 7d). Figure 8 shows an example of the OL estimates of streamflow, the analysed 270 streamflow after the application of AIR, and the streamflow observations, +$+ $B" . Also displayed is the streamflow analysis increments, i.e. +$+ , − +$+ for each time step. The negative streamflow analysis increment indicates that water is removed from the surface water store after AIR, which is consistent with the overall general overestimate in streamflow from OL, in this example (Fig. 8). The reduced streamflow is a direct result of the changes in surface soil moisture condition after the assimilation of SSM. This highlights the importance of accurate antecedent soil moisture conditions in the simulation of runoff 275 response as well as maintaining the water balance.
The large spatial disparity between ground measurement and modelling scales are a substantial limitation for wide-area evaluation of root-zone soil moisture estimates. An indirect verification of AWRA-L simulations of root-zone soil moisture was based on comparisons against satellite-derived NDVI. This provided an independent, albeit indirect, way of evaluating the impact of data assimilation over larger areas. We calculated the correlation between time series of monthly average AWRA-280 L root-zone soil moisture from OL, DA-TC and DA-TCAIR simulations against NDVI for cropland across Australia over the period 2015 to 2018. Crop land cover type was selected as we assume the rooting depths are predominantly within with the combined soil depths (0-1m) of the surface-and shallow-layer soil water storages in AWRA. Figure 9 shows the relative change in correlation between root-zone simulations from DA-TC and those from DA-TCAIR data against NDVI data for cropland areas of Australia. The figure shows that for the vast majority of model grid cells, the data assimilation without mass 285 redistribution shows no improvement or degradation in correlations with NDVI comparing to model open-loop (OL) (Fig. 9a).
DA-TCAIR shows significant increase in correlation with NDVI compared to both model open-loop (Fig. 9b) and DA-TC ( Fig. 9c), with an average increase in correlation with NDVI from 0.55 to 0.66 for cropland compared to OL.This demonstrates that enforcing mass balances as part of the SSM data assimilation each time step is essential to improving the simulation of root-zone soil water balance. The improved consistency with NDVI also illuminates the potential of improving agricultural 290 planning with more accurate information of root-zone soil water availability.

Impact on water balance forecasting
To quantify how long improvements in model state last in AWRA-L simulations, we used OL and DA-TCAIR estimates between 1 March 2018 and 28 February 2019. The model states for each day over this one-year period served as initial conditions for 100-day AWRA-L simulations from which we calculated the number of days it took for the simulation from the 295 analysed DA-TCAIR states to converge to within +/-5% of those from OL. Results showed that data assimilation can impact model states and fluxes for weeks and sometimes up to 2-3 months (Fig. 10). The impacts of data assimilation can persist in simulated ! for as long as a week over coastal regions, and longer in central Western Australia and Northern Australia with up to a month persistence in winter and spring (Fig. 10a). There is less impact on ! simulations during wet season (Summer-Autumn) in Northern Australia since the ! can saturate rapidly due to the heavy rainfall. Overall, the longest persistence is 300 found in winter with a continental average of 13 days; the shortest is 6 days on average in autumn and summer. The memory of initial conditions in simulations of " can persist even longer due to the slower response to rainfall variability and higher field capacity (Fig. 10b). Summer persistence for " is the least with a continental average of 30 days; in winter this increased to 45 days.
On average, the impact of antecedent soil moisture conditions on evapotranspiration simulations can persist for 1 week over 305 coastal areas, but up to months in central Western Australia (Fig. 10c). The continental average varies from 13 to 20 days for each season. The areas with the longest persistence are those areas with artefacts of zero rainfall in the forcing. This demonstrates that improvements in AWRA-L estimates after SSM assimilation over regions with sparse rain-gauge coverage can persist in the system for more than 2 months. The impact on runoff varies from 1 week to 3 months over the continent (Fig. 10d). The majority of areas impacted for more than 2 months are in locations of low rainfall and runoff. However, in 310 areas of heavy runoff, e.g. north-eastern Australia, there is between 1-2 week of persistence.

Discussion
In this study, we assimilated SMAP and SMOS data into an operational AWRA-L water balance modelling system through a simple sequential state updating approach, with weightings derived using triple collocation approach (DA-TC), followed by a post-adjustment for mass redistribution (DA-TCAIR). Previous data assimilation studies using the AWRA-L model opted for 315 ensemble-based methods (Renzullo et al., 2014;Shokri et al., 2019;Tian et al., 2019a;Tian et al., 2017;Tian et al., 2019b).
Ensemble based methods rely on a priori knowledge of uncertainty in forcing data and model error variances to derive spatially and temporally varying gain matrices at each time step. However ensembles often require post hoc correction such as state inflation (Anderson et al., 2009) to achieve optimal performance, and many members (> 10) comprised of multiple model runs to infer statistically meaningful error variances, which can be computationally costly. In contrast, the proposed DA-TC/-320 TCAIR framework is simple, effective and computationally efficient and requires minimal modification in the current operational system. The gain factor in the proposed assimilation framework is temporally constant but spatially varying. It is derived from the temporal covariances between modelled and satellite-derived soil moisture for each grid cell across the domain through the widely used triple collocation (TC) method (Chen et al., 2018;Crow and Van den Berg, 2010;Crow and Yilmaz, 2014;Su et al., 2014a;Yilmaz and Crow, 2014). The significant improvements in AWRA-L model surface soil 325 moisture estimation demonstrates the efficiency of the proposed assimilation approach (Fig. 7a). Temporally varying gain factor is considered for future improvement to the approach once a longer time series of SMAP data is available. Pan and Wood (2006) used mass redistribution in a two-step constrained Kalman filter that required error covariances derived from evapotranspiration and runoff observations. However, these observations are often not available for continental scale of studies. Li et al. (2012) redistribute the mass imbalance within soil layers during the assimilation but without the updates of 330 fluxes. Our proposed method based on tangent linear modelling redistributes the mass change across all the states and fluxes related to surface soil moisture states without the need for extra observations. The analysis increment redistribution (AIR) method conserves the mass balance thereby improving water balance estimates (Fig. 7), in particular it can improve the rootzone soil moisture estimates over croplands (Fig. 9). Although the improvements are limited, the streamflow estimates from the AIR are predominantly a better match to observations (Fig. 8). Model physics limits the strength of coupling between an 335 analysed state and resulting fluxes (Kumar et al., 2009;Walker et al., 2001). Thus a small level of improvement in performance in AWRA-L streamflow in response to soil moisture state updating is not unexpected due to a weak coupling between the states and fluxes.
Many studies have demonstrated the assimilation of satellite soil moisture can improve discharge simulations and correct for errors in pre-storm soil moisture conditions (Crow and Ryu, 2009;Pauwels et al., 2001;Scipal et al., 2008). Wanders et al. 340 (2014) found that the assimilation of remotely sensed soil moisture in combination with discharge observation can improve the quality of the operational flood alerts, both in terms of timing and in the exact height of the flood peak. Getirana et al. (2020a) and Getirana et al. (2020b) found that using initial conditions derived from the assimilation of GRACE (Gravity Recovery and Climate Experiment) total water storage observations can improve the seasonal streamflow and groundwater forecast due to the long memory of groundwater and soil moisture. However, few studies quantify how long the impacts of 345 data assimilation can persist in the model system's memory for different states. We found that the impact of different initial conditions of root-zone soil water storage has a long memory in the system, exceeding 2 months (Fig.10b). The impacts on the simulations of surface soil moisture, evapotranspiration and streamflow can persist 1-2 weeks. This highlights the potential gains from data assimilation for agricultural planning and flood forecasting, as a result of improved short-term water balance forecasts. 350

Conclusion
In this study, we proposed a simple and robust framework for assimilating SMAP and SMOS soil moisture products into the operational Australian Water Resources Assessment modelling system. The method involves the sequential (daily) updating of the model's surface soil water storage with satellite soil moisture observations using weights determined through triple collocation (DA-TC). Furthermore, we proposed an additional component to the data assimilation whereby the analysis 355 increment of the upper layer soil water storage is propagated into relevant model states and fluxes as a way of maintaining mass balance (DA-TCAIR). Evaluation against in-situ measurements showed that simulations of surface soil moisture dynamics is improved significantly after TC data assimilation with an average increase of 0.23 correlation units compared with open-loop simulations. An evaluation of the root-zone soil moisture, evapotranspiration and streamflow estimates showed that the TC-AIR appeared to provide marginal, yet positive, improvement over the TC data assimilation method alone. However, 360 in an indirect verification of modelled root-zone soil moisture against satellite-derived NDVI, DA-TCAIR was seen to provide significant improvement over the TC method alone. This demonstrates that by enforcing mass balances as part of the SSM data assimilation each time step, AWRA-L can better represent soil water dynamics such that it has greater consistency with observed vegetation response.

365
The assimilation of satellite soil moisture estimates together with the mass redistribution reduces the uncertainties in model estimates resulting mainly from uncertain forcing and model physics, and provides temporally and spatially varying constraints on model water balance estimates. For example, the assimilation resolves the gaps in rainfall forcing over Western Australia and central Australia. We demonstrate that the impacts of data assimilation can persist in the model system for more than a week for surface soil water storage and more than a month for root-zone soil water storage. This highlights the importance of 370 accurate initial hydrological states for improving forecast skill over longer lead times. Hence, an operational water balance modelling system, with satellite data assimilation, has strong potential to add value for assessing and predicting water availability for a range of decision makers across industries and sectors.

Author contribution
ST developed and led the implementation of the method in AWRA-CMS. ST led the writing of the manuscript and graphics creation. LR co-wrote project plan and co-developed the method. LR guided the application and evaluation. LR contributed to 385 manuscript writing. RP facilitated sharing of data feeds; coordinated the transition of method to operational implementation; editing and review of manuscript. JL guided the selection of streamflow observation and reviewed the manuscript. WS guided the modification to AWRA-CMS and reviewed the manuscript. CD co-wrote project plan and reviewed the manuscript.

Dots above the zero line indicates improved correlation comparing to the reference with greater number of grid cells in red.
https://doi.org/10.5194/hess-2020-485 Preprint. Discussion started: 12 October 2020 c Author(s) 2020. CC BY 4.0 License.