Development and evaluation of 0.05° terrestrial water storage estimates using CABLE land surface model and GRACE data assimilation

Accurate estimation of terrestrial water storage (TWS) at a meaningful spatiotemporal resolution is important for reliable assessments of regional water resources and climate variability. Individual components of TWS include soil moisture, snow, groundwater, and canopy storage and can be estimated from the Community Atmosphere Biosphere Land Exchange 10 (CABLE) land surface model. The spatial resolution of CABLE is currently limited to 0.5° by the resolution of soil and vegetation datasets that underlie model parameterizations, posing a challenge to using CABLE for hydrological applications at a local scale. This study aims to improve the spatial detail (from 0.5° to 0.05°) and timespan (1981 – 2012) of CABLE TWS estimates using rederived model parameters and high-resolution meteorological forcing. In addition, TWS observations derived from the Gravity Recovery and Climate Experiment (GRACE) satellite mission are assimilated into CABLE to 15 improve TWS accuracy. The success of the approach is demonstrated in Australia, where multiple ground observation networks are available for validation. The evaluation process is conducted using four different case studies that employ different model spatial resolutions and include or omit GRACE data assimilation (DA). We find that the CABLE 0.05° developed here improves TWS estimates in terms of accuracy, spatial resolution, and long-term water resource assessment reliability. The inclusion of GRACE DA increases the accuracy of groundwater storage (GWS) estimates and has little impact 20 on surface soil moisture or evapotranspiration. The use of improved model parameters and improved state estimations (via GRACE DA) together is recommended to achieve the best GWS accuracy. The workflow elaborated in this paper relies only on publicly accessible global datasets, allowing reproduction of the 0.05° TWS estimates in any study region.

relatively coarse due to the limitation of sensors and models that primarily focus on global or continental scales (e.g., Rodell 30 et al., 2004, Alkama et al., 2010. At a regional or local scale, the spatial resolution of the TWS estimate is vital, as most applications (e.g., risk management for drought or flood) require accurate information at the county or sub-county level (Quiring, 2009). This motivates the development of TWS estimates at more meaningful spatiotemporal scales, corresponding particularly to an increased interest in exploiting TWS in interdisciplinary studies (e.g., IPCC 2007;NASEM, 2018).
TWS information can be obtained or estimated from ground observation networks, satellite measurements, or model 35 simulations. Each has different strengths and limitations. Ground observations (e.g., soil moisture probe, groundwater well) are considered the most reliable, providing measurements closest to the truth (e.g., Dorigo et al., 2013). However, ground observations have high maintenance costs and incomplete coverage. Also, point measurements only reflect information at one location, not necessarily the entire region. On the other hand, satellite platforms offer an automated measurement with improved coverage ranging from regional to global (e.g., Tapley et al., 2004;Entekhabi et al., 2010a). The challenges of using 40 satellite measurements are the coarse spatial resolution and the sensors' technical limitations (e.g., penetration depth, cloud/vegetation obstruction, background noise). Therefore, its usage is restricted to a sufficiently large region and requires a sophisticated algorithm to retrieve TWS variables (Crow et al., 2012;Castellazzi et al., 2016;Tangdamrongsub et al., 2019).
In addition, satellite measurements can be limited by a relatively short period of record.
TWS components can also be simulated from a land surface model (LSM) such as the Community Atmosphere Biosphere 45 Land Exchange (CABLE, Kowalczyk et al., 2006;Decker 2015). CABLE is a core LSM of the Australian Community Climate and Earth System Simulator (ACCESS, Bi et al., 2013;Kowalczyk et al., 2013). The LSM incorporates various land surface physics into complex numerical sequences to allow the simulation of TWS to be performed at any desired location and spatiotemporal scale (Pitman 2003). The LSM can provide a complete suite of TWS components compared with the ground or satellite measurements that can only measure a single or integrated TWS component (e.g., Tapley et al., 2004;Entekhabi et 50 al., 2010a). However, due to the limitations of input data, the spatial resolution of many LSMs is coarse (e.g., > 25 km), which consequently restricts its application to a sufficiently large region (e.g., Rodell et al., 2004;Alkama et al., 2010;Ke et al., 2012). For CABLE, the spatial scale is currently limited to the 0.5° (~50 km) resolution of model parameter and forcing datasets (Decker 2015;Ukkola et al., 2016;Haverd et al., 2018). Models and their inputs must be reconfigured to increase the spatial detail of TWS estimates for smaller-scale studies (e.g., 5 -10 km). 55 Efforts to improve the spatial resolution (and timespan) of global TWS estimates have been made in many modeling communities (e.g., Wood et al., 2011;Bierkens et al., 2014;. For instance, Ke et al. (2012) improved the resolution of the Community Land Model (CLM) from ~50 km to ~5 km using modified land surface parameters. The World-access: 14 October 2020). A similar effort is also seen in hydrologic model development, such as the PCRaster Global Water Balance (PCR-GLOBWB; , which improves the spatial resolution from ~50 km to ~10 km and extends the timespan to more than a 50-year period. The enhancement of model spatial resolution and timespan receives even more 65 attention at the local level, where spatial detail down to a few km is needed (e.g., Tesfa et al., 2014;Rasmussen et al., 2014;Singh et al., 2015;Beamer et al., 2016;Dong et al., 2020). The effort to increase the regional or local study's spatial resolution has not been considered thus far for CABLE.
On top of the improved spatiotemporal resolution, the improved accuracy of TWS estimates is also a concern in LSM developments. As in most environmental modeling systems, model outputs are associated with a high degree of uncertainty 70 propagated from, e.g., inaccurate meteorological forcings, imperfect model physics, and ineffective parameter calibration. Data assimilation (DA, Reichle et al., 2002;Reichle 2008) techniques can be employed to improve LSM performance. The approach sequentially updates the model's states using an optimal value computed by combining model simulations with observations. A variety of satellite observations reflecting different TWS components can be assimilated into the system (e.g., Kumar et al., 2014;Dong et al., 2018). TWS observations from the Gravity Recovery And Climate Experiment (GRACE) satellite mission 75 (Tapley et al., 2004) offer integrated water column information that can be used to constrain multiple water storage components simultaneously (e.g., Zaitchik et al., 2008;Forman et al., 2012;Tangdamrongsub et al., 2015). GRACE DA has shown positive impacts on most TWS components, including groundwater (e.g., Girotto et al., 2017;Nie et al., 2019), soil moisture (Jung et al., 2019), and snow (Kumar et al., 2016).
This study aims to improve the accuracy, spatial resolution, and timespan of CABLE TWS estimates. Our approach utilizes 80 only publicly available global datasets, so resulting TWS estimates can be reproduced over any target region (see Data availability section for the data access). The spatial detail of CABLE is improved from 0.5° to 0.05° (~5 km) using highresolution forcing data (precipitation in particular) and land surface parameters derived from high-resolution maps of soil and vegetation cover. The development is demonstrated in Australia, where ground observation networks (e.g., surface soil moisture, groundwater, and evapotranspiration) are available to validate the result. The demonstrated simulation period is 1981 85 -2012, coincident with the availability of meteorological forcing data. The GRACE mascon solution (Luthcke et al., 2013) is assimilated into the model to improve the accuracy of TWS components between 2003 and 2012. Assimilating the coarse GRACE observations into a much higher-resolution model is performed using the 3-dimension ensemble Kalman smoother (EnKS 3D; Tangdamrongsub et al., 2017).
The objectives of this paper are 1) to present the development and evaluation of retrospective 0.05° TWS estimates, and 2) to 90 assess the GRACE DA impact on both 0.5° and 0.05° CABLE versions, as well as the benefit of assimilating the coarse resolution satellite data into a fine-scale model. This paper is presented as follows. Sect. 2 provides the details of the study area, model configurations, and data processing. Sect. 3 presents the GRACE DA schematic and the statistical metric used in the evaluation. Sect. 4 presents the assessment and validation of the result. Finally, Sect. 5 summarizes the findings of this study and provides a possible direction for future development.

Study area
This study uses Australia as a case study. Due to its size and geographic location, Australia is influenced by multiple climate drivers (Murphy and Timbal, 2008;Xie et al., 2016) and experiences episodes of severe droughts and floods (e.g., van Dijk et al., 2013). The recent long-term drought, known as the Millennium Drought (Bond et al., 2008), severely affects industrial and 100 agricultural sectors and has led to a significant economic loss nationwide (see, e.g., van Dijk et al., 2013). The need for an accurate prediction of possible water scarcity from climate variations motivated the development of land surface and hydrology models in Australia, e.g., the Australian Water Availability Project (AWAP; Raupach et al., 2008), the Australian Water Resources Assessment -Landscape Model (AWRA-L; van Dijk 2010), and CABLE (Kowalczyk et al., 2006). Recent work has assimilated GRACE satellite data into such water models (e.g., Tian et al., 2017;Schumacher et al., 2018;Tangdamrongsub 105 et al., 2020). Studies relevant to GRACE DA in Australia are summarized in Table 1. Ground observation networks have also been installed across the continent and continuously monitor the water storage and flux variations. Such data records are valuable for validating the accuracy of the model estimates and remote sensing observations. Details regarding the in situ data used in this study are provided in Sect. 2.3.2.

Model configuration
In this study, TWS estimates are derived from CABLE. A history of the model's development can be found in, e.g., Wang et al. (2011), Kowalczyk et al. (2013), Decker (2015), Haverd et al., (2018 Decker, 2015) is considered most suitable for our uses due to its inclusion of comprehensive 115 terrestrial water storage components, especially a groundwater module. The model has been used to simulate global TWS at 0.5° spatial resolution using 0.5°-resolution model parameters and forcing data (e.g., Decker, 2015;Ukkola et al., 2016). The variables used to assess TWS consist of soil moisture storage (SMS), snow water equivalent (SWE), canopy storage (CNP), and groundwater storage (GWS).
The model parameters of CABLE-SSGW are derived from several different sources (Kowalczyk et al., 2006;Wang et al., 120 2011;Decker, 2015). Land use/vegetation type categories are obtained from the International Global Biosphere Project (IGBP) classification from the Moderate Resolution Imaging Spectroradiometer (MODIS; Friedl et al., 2002). Relative volumes of silt, sand, clay, and organic matter in the soil are obtained from the Harmonized World Soil Database (Fischer et al., 2008).
The monthly climatology of the leaf area index (LAI) is computed using a reprocessed MODIS LAI product (Yuan et al., 2011). All derived model parameters are resampled to 0.5° to match the 0.5° model grid space. Comprehensive details of model parameters can be found in, e.g., Kowalczyk et al. (2006), Wang et al. (2011), andDecker (2015).
This study also rederives the model parameters and employs enhanced forcing data to increase the model spatial detail from 0.5° to 0.05°. The vegetation type is derived from the global land cover climatology using MODIS data (Broxton et al., 2014). The CABLE model is forced with precipitation, air temperature, wind speed, humidity, surface pressure, and shortwave and longwave downward radiation. For precipitation, we use the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS; Funk et al., 2015), provided on a 0.05° grid. The other forcing variables are provided on a 0.5° grid by the highresolution global dataset of meteorological forcings for land surface modeling version 2, Princeton University (Princeton, Sheffield et al., 2006). When performing 0.5° model simulations, the CHIRPS data is spatially averaged to 0.5° grid space 140 while other forcing variables maintain their intrinsic 0.5° resolution. When performing 0.05° model simulations, the coarserresolution forcing variables are spatially resampled to 0.05° model grid space using nearest-neighbor interpolation.
Model simulations are performed between 1981 and 2012 (total 32 years), given the Princeton forcing data's availability.
Similar to McNally et al. (2017), temporal disaggregation is applied to CHIRP precipitation data to resample from 1 day to 3 hours, consistent with the Princeton data's time step. The scale factor derived from the 3-hour Princeton precipitation data is 145 used to (temporally) rescale the CHIRPS data. The characteristics of forcing data and the derived model parameters used in this paper are given in Table 2.
(Please insert Table 2 here) In all simulations, initial states are obtained using a 320-year spinup, i.e., performing ten repeated runs between 1981 and 2012. 150

GRACE data
GRACE is a twin satellite-to-satellite tracking mission designed to measure the mean and time-varying components of the Earth's gravity field (Tapley et al., 2004). Every month, GRACE provides a time-varying gravity solution containing information about mass redistribution near the Earth's surface. The monthly gravity change is dominated by a hydrology signal, making the GRACE product beneficial for various hydrological and geophysical applications (e.g., Klees et al., 2008;Mouyen 155 et al., 2018, Rodell et al., 2018Tapley et al., 2019). Different GRACE solutions have been released (see, e.g., https://doi.org/10.5194/hess-2020-665 Preprint. Discussion started: 14 January 2021 c Author(s) 2021. CC BY 4.0 License.
https://podaac.jpl.nasa.gov/GRACE or http://icgem.gfz-potsdam.de), including the mascon solution (e.g., Luthcke et al., 2008;Rowlands et al., 2010). The mascon approach utilizes mass concentration blocks (as a basis function) to determine the Earth's mass variation and is found to provide a more accurate TWS estimate as compared with the spherical harmonic approach (Rowlands et al., 2010). In this study, the mascon product from the Goddard Space Flight Center (GSFC) is used (Luthcke et 160 al., 2013; https://earth.gsfc.nasa.gov/geo/data/grace-mascons). The GSFC-Mascon product contains monthly TWS variations (∆TWS), expressed in equivalent water height (in, e.g., m). The glacial isostatic adjustment correction is applied using the ICE6G model (Peltier et al., 2015). The mascon varies in size and represents the average ∆TWS of the associated grid cell.
The spatial distribution of mascon in Australia is shown in Fig. 2. The GSFC-Mascon product also provides monthly ∆TWS uncertainties, and they are used to represent the observation error of the individual mascon. To convert monthly ∆TWS into 165 absolute TWS (necessary for the GRACE DA process), the temporal mean value of the CABLE-simulated TWS from 2003 to 2012 is added to the GSFC-Mascon product. This process reconciles the observed long-term mean with the model estimates.
(Please insert Figure 2 here)

Satellite-derived products 170
The satellite-derived soil moisture and evapotranspiration (ET) data obtained from the European Space Agency -Climate Change Initiative program (ESA-CCI; Dorigo et al., 2017;Gruber et al., 2019)

and the Global Land Evaporation Amsterdam
Model (GLEAM; Martens et al., 2017) are used to validate the soil moisture and evaporation estimates, respectively. The ESA-CCI COMBINED product combines multiple active and passive satellite-sensor soil moisture products and provides a nearglobal daily volumetric soil moisture product at 0.25° resolution with ~0.04 m 3 /m 3 accuracy (unbiased root-mean-square error). 175 The combined product version 4.7 (v04.7) is used in this study. The product includes approximately eight different satellite observations, including, e.g., SSM/I., AMSR-E, ASCAT, Windsat, and SMOS (see, e.g., Fig. 3 of Dorigo et al. (2017) for complete details) during our evaluation period (1981 -2012). GLEAM is an algorithm that derives the daily global terrestrial evaporation using observations from multiple satellite microwave sensors and reanalysis datasets (Martens et al., 2017). Two product variants are available: the satellite-only and the reanalysis, and the newest release of the latter (version 3.3a) is used 180 in this study due to its consistent timespan with our evaluation period.

In situ data
In situ soil moisture, groundwater, and ET measurements are obtained from different ground observation networks. Groundwater Explorer (http://www.bom.gov.au/water/groundwater/explorer/map.shtml). More than 870,000 monitoring bores are distributed unevenly across the continent. At each bore, the groundwater level measurement is converted to the 190 groundwater level variation by removing the long-term mean associated with the entire data record. The bores are excluded from the analysis if the data record is shorter than three years or having significant missing data. The groundwater level measurements are not converted to groundwater storage due to the absence of accurate knowledge of specific yield.
The in situ ET (i.e., latent heat flux) is obtained from the FLUXNET2015 dataset (Pastorello et al., 2017). FLUXNET is a global network measuring carbon and energy fluxes. More than twenty flux tower sites are distributed across Australia 195 associated with different periods (e.g., 2001 -2014). Only the sites with three years of data or longer are used in our analysis (see site locations in Fig. 1a).

Ensemble Kalman Smoother (EnKS)
The Ensemble Kalman Smoother (EnKS;Evensen, 2003;Zaitchik et al., 2008) is used to assimilate the GRACE-derived 200 ΔTWS into the CABLE model. The 3-dimension EnKS (EnKS 3D) scheme described in Tangdamrongsub et al. (2017) is used in this study. The GRACE DA comprises forecast, analysis, and distributing update steps. The forecast step propagates the model states forward in time for approximately one month. The analysis step computes the monthly model state update using GRACE observations (and uncertainties). The final step reinitializes the ensemble (e.g., initial states, forcing data) and reperforms the forecast step with the DA update (increment) distributed evenly throughout the month. Figure 3 illustrates the 205 concept of the GRACE DA process.
(Please insert Figure 3 here) The meteorological forcings and model parameters are perturbed using N = 100 ensemble members prior to GRACE DA processing. Multiplicative white noise is used to perturb the precipitation and shortwave radiation, while additive white noise is used for the air temperature and model parameters. The characteristics of the uncertainties are given in Table 3. After the 210 perturbation process, CABLE model states are then propagated for approximately one month (the forecast step). The state vector consists of nine model states (n = 9): six soil moisture layers, canopy storage, snow water equivalent, and groundwater storage.
(Please insert Table 3 here) In the analysis step, when a GRACE observation is available, the monthly averaged states ( ) are related to the GRACE 215 observations by https://doi.org/10.5194/hess-2020-665 Preprint. Discussion started: 14 January 2021 c Author(s) 2021. CC BY 4.0 License.
where is an m×1 perturbed observation vector containing the perturbed GRACE mascon for the month of interest, is a measurement operator which relates the ensemble state to the vector , m is the number of GRACE mascon cells used in the calculation, and j indicates ensemble index. The uncertainties in the observations are described by the random error , 220 which is assumed to have zero mean and covariance matrix × . The subscription denotes the dimension of the matrix. Note that is a variance matrix here as only the variance components are provided in the mascon product.
In EnKS 3D, multiple model and observation grid cells (e.g., inside 300 km radius corresponding to GRACE spatial resolution) are simultaneously used to compute the state update.
where is the number of model grid cells inside a mascon i (see, e.g., rectangle A in Fig. 2 for the distribution of model grids inside a mascon cell).
The ensemble of the states is stored in a matrix × = ( 1 , 2 , 3 , … , ), where K = ∑ =1 , and the ensemble 230 perturbation matrix is defined as ′ = − ̅ , where the matrix ̅ contains the mean values computed from all ensemble members. Similarly, the perturbed GRACE observation vector is stored in the matrix × = ( 1 , 2 , 3 , … , ). The analysis equation is then expressed as with 235 where × represents the updated state vector, ∆ × is the monthly averaged update from EnKS 3D and × is the Kalman gain matrix. The superscript denotes a transpose (matrix) operator. The model and observation error covariance matrix ( ) × , ( ) × are computed as where contains the measurement error of all ensemble members. After the monthly averaged update ∆A is obtained, the daily increment (∆ ) of the update is computed by dividing ∆A by the total number of days in that month. Note that only ∆ of the center mascon is saved (see also Tangdamrongsub et al., 2017). The processes described in Eq.
(1) -Eq. (6) are repeated through all mascon cells to obtain all individual ∆ in the study domain. Then, the model is reinitialized using the 245 previous month's initial states, and the simulation is performed again while adding ∆ to the model initial states daily (the distributing update step). The DA process is performed until the last month of the study period (December 2012).

Resample approach
The state estimates are validated against the referenced data described in Sect. 2.3. As the spatial resolution of the model 250 estimate and referenced data are different, a spatial resample is performed before the comparison. The model estimate is resampled to the observation grid space using the nearest neighbor-gridded interpolation when model's resolution is coarser.
Conversely, the estimate is upscaled (spatial averaging) when the model's resolution is higher. The evaluation is conducted at the observation grid cell.

Correlation and root mean square difference 255
The agreement between the estimated variable and the in situ data is assessed using the Pearson correlation coefficient (ρ) and the root mean square difference (RMSD). At a particular grid cell, ρ is calculated as: where the y vector contains the model estimates, the x vector contains the validation data (observations), [ ] is the expectation operator, and (̅, ̅) and ( , ) are the mean and standard derivations of y and x, respectively. The RMSD is 260 computed as: where L denotes the length of the time series.

Long-term trend and seasonal variations
The long-term trend, annual amplitude, and phase of the time series are computed using the least-squares adjustment associated 265 with five parameters, offset (a), long-term trend (b), annual variation (c, d), and semi-annual variation (e, f). A time series ( ) at a particular grid cell can be expressed as: = + + sin + cos + sin 2 + cos 2 , https://doi.org/10.5194/hess-2020-665 Preprint. Discussion started: 14 January 2021 c Author(s) 2021. CC BY 4.0 License.
where the t vector contains time, and T is an annual period. The annual amplitude ( ) and phase ( ) are computed as: 270 = arctan 2 ( , ).

Spatial resolution
The spatial resolution can be determined from the empirical (and isotropic) covariance function ( ) computed as (Tscherning and Rapp, 1974): 275 where ( ℎ , ) are vectors containing data points (h,l) associated with the spherical distance ℎ , and ℎ is the number of data pairs considered in the calculation. The spatial resolution (or correlation length of the covariance function) is defined in this study as the distance at which ( = 0) decreases by half. The diagram in Fig. 4 illustrates how this correlation length is determined. 280 (Please insert Figure 4 here)

Case studies
This paper uses four case studies to quantify the effect of different model grid size and GRACE DA. The case studies are described as follows:

Improvement in spatial resolution
The resolution improvements can be seen by comparing CABLE 0.5° TWS estimates against those from CABLE 0.05°. Both are open-loop (OL) simulations, meaning the model is run without data assimilation. The simulation period is from January 295 1981 to December 2012. After obtaining TWS estimates, the TWS annual amplitude and phase are computed using Eq. (11 -12) and are shown in Fig. 5. Both CABLE versions have similar spatial features, but more localized and much finer details are seen in the CABLE 0.05° simulation than in the CABLE 0.5° simulation. A clear difference in annual amplitude is shown over the Yarra Ranges and the Alpine National Parks (see Fig. 5a vs. 5b insets). CABLE 0.05° provides greater details of the ΔTWS spatial distribution, and the annual amplitude is approximately 30% higher than it is in the coarse-scale version. Differences 300 in spatial details are also seen in the phase estimates (see Fig. 5c vs. 5d).
(Please insert Figure 5 here) The spatial resolutions can be quantitatively determined using the empirical covariance function, Eq. (13), described in Sect. Figure 6 shows the averaged spatial resolution (correlation length) of CABLE 0.5° and CABLE 0.05° for each month between 1981 and 2012. 305

The covariance function is computed for each month's TWS estimates using all grid points in Australia.
The CABLE 0.5° simulations have a spatial resolution of ~ 50 km, consistent with the grid size of the input 0.5° CABLE parameters and forcing data. Larger correlation lengths are found during the rainy seasons: Jan -Apr (in the North) and Aug -Nov (in the South). These results are in line with the TWS's phase estimates in Fig. 5c and 5d. The soil store or aquifer is likely filled during the wet seasons leading to a more uniform (and smoother) spatial feature to a certain extent. The use of CABLE 0.05° significantly improves the spatial resolution by about a factor of two to three. It is noted that the spatial resolution 310 of CABLE 0.05° presented in Fig. 6 reflects the continental averaged value while the finer (higher) spatial resolution is observed in the individual river basin (not shown).
(Please insert Figure 6 here)

Assessment of long-term TWS variations
This study's timespan allows for an assessment of overall trends and decadal variations of TWS estimates. The long-term 315 trends of water balance states and fluxes from CABLE 0.05° between 1981 and 2012 are shown in Fig. 7. A strong relationship between components is observed, particularly in the storage components ( Fig. 7a -7d). The TWS and GWS ( Fig. 7a and 7d) have very similar spatial patterns, where a wetting trend is observed in the northern region, the Indian Ocean Basin, and the western part of the Murray-Darling Basin (see Fig. 1a for the basin's location), and a drying trend is seen in the central part of the continent, the South West Coast Basin, and several parts of the Murray-Darling Basin. The similarity between the TWS 320 and GWS (both spatially and in magnitude) indicates that GWS is a primary driver of the TWS trend in Australia. By contrast, https://doi.org/10.5194/hess-2020-665 Preprint. Discussion started: 14 January 2021 c Author(s) 2021. CC BY 4.0 License. soil moisture stores show different spatial patterns: increases in soil moisture are also found in the central and western parts of the continent (Fig. 7c). The SMS generally accommodates a large portion of the seasonal variation in water storage in Australia (see Sect. 4.2), but its role in the long-term trend is marginal, smaller than that of the GWS by about a factor of two. ET trend estimates show a similar spatial pattern to SSM trends (Fig. 7e vs. 7b), which may be explained by ET pulling moisture from 325 SSM stores. The long-term trend of the total runoff is in line with the TWS/GWS, increasing in the North and decreasing in the southeastern region and Tasmania (e.g., Fig. 7f vs. 7a). In the northern region, the soil moisture or aquifer is likely saturated due to a wet climate, with more significant annual rainfall (than the South) by about a factor of five (not shown). Such a condition leads to a greater magnitude of root zone moisture, groundwater recharge, and surface runoff variations. The opposite scenario is observed in the southeastern region, where the depleted TWS/GWS (induced by droughts) likely reduces runoff 330 generation, resulting in a negative runoff trend.
(Please insert Figure 7 here) Regional water balance components can also be analyzed at interannual and decadal timescales. Figure 8 shows the trends of water balance components in three different decades, 1981 -1990, 1991 -2002, and 2003 -2012 (Fig. 8c). A similar reversal is also seen in the eastern regions: wetting in -1990, drying in 1991, and wetting again in 2003. 340 (Please insert Figure 8 here) Without such a long record, assessment of water resources may have limited reliability. For instance, based only on the ~10year period of GRACE observations, it is difficult to determine whether the negative TWS trend in western Australia (e.g., Fig. 8c; see also Fig. 11d) is caused by anthropogenic or natural processes (e.g., Richey et al., 2015). Evaporation or runoff after the extremely wet conditions prior to 2002 may also produce a similar decreasing trend (e.g., van Dijk et al., 2011;Munier 345 et al., 2012). By assessing the historical TWS time series of the North-Western Plateau Basin obtained from CABLE 0.05° ( Fig. 9), the observed negative trend is more likely governed by a decadal cycle of drought and recovery. This approach demonstrates that there is clear value in utilizing a longer timespan of TWS estimates and can offer a more reliable assessment of regional water resources and climate variations.

Comparison with the satellite products
TWS, SSM, and ET estimates are compared with satellite data from GRACE, ESA-CCI, and GLEAM, respectively (see Sect. 2.2 and 2.3 for each product's description). Note that the remote sensing products may contain biases (caused by, e.g., background model, processing algorithm) and do not necessarily represent the truth. (Ground truth validation is performed in Sect. 4.3.) The inter-comparison performed in this section is only to assess the consistency between two independent estimates: 355 model and satellite. The CABLE estimate is rescaled to the satellite product's grid before comparison, as described in Sect.
3.2.1. Figure 10 shows the correlation and RMSD estimates between CABLE 0.5°/CABLE 0.05° results and the evaluated satellite data. On average, both CABLE results are in good agreement with the satellite products, with correlation values greater than 0.55 (see Fig. 10a, 10b, 10e, 10f, 10i, 10j). CABLE 0.05° shows more robust agreement with satellite-derived variables than does CABLE 0.5 o ; correlation values with TWS, SSM, and ET are 14%, 14%, and 7% higher (respectively) in the former 360 simulations than in the latter. Similar improvements are also observed in the RMSD evaluations, where CABLE 0.05° provides smaller RMSD values than CABLE 0.5° by 4% (TWS), 14% (SSM), and 17% (ET). CABLE 0.05° reduces the RMSD of the SSM estimate to as low as ~0.03 m 3 /m 3 , e.g., over the South West Coast and Lake Eyre basins, and the average RMSD value in Australia is ~0.06 m 3 /m 3 . It is worth noting that the average unbiased RMSD value (Entekhabi et al., 2010b) is 0.037 m 3 /m 3 (not shown), in line with the accuracy of the CCI product (see, e.g., Table 1 of Dorigo et al. (2017)). 365 (Please insert Figure 10 here) The improved agreement between CABLE 0.05° estimates and the satellite products is also seen in the temporal variation (Fig.   11). Compared with the CABLE 0.5° results, CABLE 0.05° increases the dynamic range of the TWS estimates in most basins leading to closer alignment with GRACE observations (Fig. 11a -11c). Similar increases in the agreement are also observed in the CABLE 0.05° SSM and ET estimates (Fig. 11d -11i). The observed improvement may be attributed in part to the 370 rederived model parameters. For example, in the North West Plateau Basin, CABLE 0.05° uses a ~17 % higher area-averaged sand fraction than does CABLE 0.5°, which allows faster infiltration/drainage in the storage compartments leading to greater dynamic ranges of TWS and SSM variations (see Fig. 11a and 11d). Consequently, the ET estimate is decreased as a response to increased water storage following the water balance equation (Fig. 11g). Similar mechanisms are also observed in Lake Eyre and South West Coast Basins, where the improved parameterization leads to improved agreement with the satellite data. 375 The use of coarse resolution forcing data (e.g., precipitation) could also explain the small TWS amplitude observed in CABLE 0.5°. Coarse scale forcing data averages local precipitation signals over a larger area than does the finer resolution forcing data, resulting in a smaller amplitude. and GRACE data). The basin-averaged TWS estimates from CABLE with and without GRACE DA are shown in Fig. 12, alongside the GRACE observations themselves. In most basins, apparent disagreements between the open-loop estimates and the GRACE observations suggest the current CABLE models' limited accuracy. After assimilating GRACE into the models, 385 the estimates (GRACE DA 0.5° and GRACE DA 0.05°) move toward the GRACE observations. The positive impact of GRACE DA on the basin-averaged TWS estimates is similar regardless of the model spatial resolutions. This likely reflects the nature of GRACE observations that provide integrated water storage information at continental or basin-scale (Tapley et al., 2004). However, it should be noted that the GRACE DA application does not degrade the spatial resolution of the model.
The offline analysis shows that the average correlation length of GRACE DA 0.5° and GRACE DA 0.05° remain the same as 390 that of CABLE 0.5° and CABLE 0.05°, respectively (not shown). The EnKS 3D scheme exploits the spatially correlated information from the high-resolution model to disaggregate the coarse-scale observations into a finer grid space, resulting in the preservation of the model's intrinsic resolution.
(Please insert Figure 12 here) The impact of GRACE DA is also observed in water redistribution within the water storage components. Figure 13 shows the 395 contributions of four different water storage components (SMS, GWS, SWE, and CNP) to TWS in different basins before (CABLE 0.05°, Fig. 13a) and after the application of GRACE DA (GRACE DA 0.05°, Fig. 13b). The contribution is calculated as a percent of the annual amplitude of the water storage component and the amplitude of the TWS. In CABLE 0.05 o (Fig.   13a), the SMS is a major contributor to more than 90% of the TWS. The GWS contribution is only ~10 %. After applying GRACE DA (Fig. 13b), the contribution of the GWS is significantly increased. It dominates the entire water column in several 400 basins (e.g., Indian Ocean, Lake Eyre, North West Plateau, South West Plateau). A similar finding is also seen in, e.g., Girotto components (e.g., GWS). The contributions of the SWE and CNP components are negligible across Australia, and the impact of GRACE DA on them is trivial. Similar changes to TWS contributions are also observed between CABLE 0.5° and GRACE DA 0.5° (not shown). It is noted that Fig. 12 and Fig. 13 only present the impact of GRACE DA on storage components and 405 do not assess the accuracy of either version. The accuracy of the OL and DA models is quantified in Sect. 4.3.
(Please insert Figure 13 here)

Validation with in situ data
In situ data from three different ground networks (see Sect. 2.3.2) are used to validate the GWS, ET, and SSM estimates.
Validation is conducted by quantifying the change in correlation values between simulations and observations when one model is used in place of another. The validation period is between January 2003 and December 2012, consistent with the GRACE DA period. Figure 14 shows the validation of GWS estimates between GRACE DA and OL simulations (DA minus OL). A positive value indicates improvement, while a negative value represents degradation. A significance test is performed at the 0.05 level based on the Fisher Z-transform test for correlation coefficients (Zaitchick et al., 2008). The average change in correlation value across all in situ data site are shown in Fig. 14d. Values that failed the significant test are excluded from the 415 averaging.
(Please insert Figure 14 here) We first analyze the impact of GRACE DA on the GWS estimates of CABLE 0.5° and CABLE 0.05° ( Fig. 14a and 14b). A greater correlation improvement is observed when GRACE DA is applied to the 0.5° model, which increases the average correlation value by 0.23 (Fig. 14d1). When GRACE DA is applied to the 0.05 o model (Fig. 14b), the average correlation value 420 improves by 0.12 (Fig. 14d6), ~50% less than in the 0.5° model.
When comparing the two OL runs (CABLE 0.5° vs. CABLE 0.05°), the GWS estimate from CABLE 0.05° has a higher correlation value by 0.12 (see Fig. 14d2). This analysis indicates that the lower correlation improvement seen between CABLE 0.05° and GRACE DA 0.05° is unlikely caused by the reduced impact of GRACE DA on the high-resolution model. Rather, it is a result of CABLE 0.05° having better accuracy to start with. GRACE DA tends to provide the same information to both 425 0.5° and 0.05° models. Still, GRACE DA 0.05° exhibits the best correlation with in situ data, ~0.02 higher than GRACE DA 0.5 o (Fig. 14c and Fig. 14d5). While GWS estimates from both CABLE 0.5° and CABLE 0.05° improve with GRACE DA, we find that CABLE 0.05° (OL) shows a higher correlation value than the GRACE DA 0.5° (DA) by 0.1 (see Fig. 14d3). This denotes that the improved model parameters are as important as the improved model state estimates (e.g., via DA).
SSM and ET estimates are validated against ground measurements from SASMAS and Ozenet, respectively. Correlation 430 coefficients between the case study simulations and observation networks are summarized in Fig. 15. For SSM (Fig. 15a), CABLE 0.05° exhibits slightly improved correlation with observations (by ~0.6 %) than does CABLE 0.5°. The addition of GRACE DA also shows a small but positive impact on SSM estimates, improving correlation with observations by ~1 %. The small impact is attributed to limited GRACE sensitivity. GRACE is sensitive to the low-frequency variation (originated from deeper stores) and cannot effectively capture SSM, which is dominated by a high-frequency signal (e.g., precipitation). As a 435 result, GRACE DA is found to have a minor (or negative) impact on the top soil component in most GRACE DA studies (e.g., Li et al., 2012;Tian et al., 2017;Tangdamrongsub et al., 2020).
(Please insert Figure 15 here) For ET (Fig. 15b), CABLE 0.05° exhibits improved correlation with observations (by ~5 %) than does CABLE 0.5°. The inclusion of GRACE DA also slightly improves correlation values over the associated OL model versions. Greater 440 improvement (by ~2%) is seen between CABLE 0.5° and GRACE DA 0.5° than between CABLE 0.05° and GRACE DA 0.05°. As with SSM, the small improvement of ET is likely attributable in part to small GRACE DA updates (caused by the https://doi.org/10.5194/hess-2020-665 Preprint. Discussion started: 14 January 2021 c Author(s) 2021. CC BY 4.0 License.
limited GRACE sensitivity to high-frequency surface fluxes). The SSM is a primary moisture source for ET, so a trivial change in SSM leads to a similarly small change in ET.

Conclusion 445
This study enhances the spatial resolution and timespan (> 30 years) of regional TWS estimates using the CABLE LSM, highresolution land cover maps and forcing data, and GRACE DA application. By improving the model parameter and forcing data (without GRACE DA), the developed CABLE 0.05° model shows clear improvements in the accuracy of water balance component estimates (e.g., soil moisture, groundwater, evapotranspiration) compared with in situ and independent satellite data. The 0.05° model also improves the spatial resolution by a factor of two to three over the 0.5° version. The extended 450 timespan provides insightful information for long-term assessment of regional water resources and climate variability. The The enhanced CABLE model resolution developed in this study relies on improved parameter and forcing data. The land surface physics remain unchanged. The workflow can be adopted for other CABLE repositories or different LSM with only slight modifications, e.g., number of soil or vegetation types. The CABLE 0.05° TWS estimates can be reproduced at locations outside the area studied here since high resolution forcing data and model parameters are available globally (or near globally). Future development can consider extending the temporal record or further increasing the spatial resolution of 460 TWS estimates. The timespan extension is feasible using reanalysis forcing data from the Modern-Era Retrospective Analysis for Research and Applications, Version 2 (MERRA-2; Gelaro et al., 2017), which would allow TWS simulations to be extended to the near present. Similarly, the TWS estimates could approach sub-kilometer scale grid spacing by using a number of high-resolution land parameter sets, e.g., a 250 m soil map (Hengl et al., 2017) and a 500 m LAI data (Myneni et al., 2015). 465

Data availability
The model and data used in this study are publicly available and can be accessed as follows (last accessed: 6 May 2020):