Development and evaluation of 0.05 terrestrial water storage estimates using Community Atmosphere Biosphere Land Exchange (CABLE) land surface model and assimilation of GRACE data

Accurate estimation of terrestrial water storage (TWS) at a high spatiotemporal resolution is important for reliable assessments of regional water resources and climate variability. Individual components of TWS include soil moisture, snow, groundwater, and canopy storage and can be estimated from the Community Atmosphere Biosphere Land Exchange (CABLE) land surface model. The spatial resolution of CABLE is currently limited to 0.5 by the resolution of soil and vegetation data sets that underlie model parameterizations, posing a challenge to using CABLE for hydrological applications at a local scale. This study aims to improve the spatial detail (from 0.5 to 0.05) and time span (1981–2012) of CABLE TWS estimates using rederived model parameters and high-resolution meteorological forcing. In addition, TWS observations derived from the Gravity Recovery and Climate Experiment (GRACE) satellite mission are assimilated into CABLE to improve TWS accuracy. The success of the approach is demonstrated in Australia, where multiple ground observation networks are available for validation. The evaluation process is conducted using four different case studies that employ different model spatial resolutions and include or omit GRACE data assimilation (DA). We find that the CABLE 0.05 developed here improves TWS estimates in terms of accuracy, spatial resolution, and long-term water resource assessment reliability. The inclusion of GRACE DA increases the accuracy of groundwater storage (GWS) estimates and has little impact on surface soil moisture or evapotranspiration. Using improved model parameters and improved state estimations (via GRACE DA) together is recommended to achieve the best GWS accuracy. The workflow elaborated on in this paper relies only on publicly accessible global data sets, allowing the reproduction of the 0.05 TWS estimates in any study region.


Introduction
Accurate knowledge of terrestrial water storage (TWS) is crucial for assessing water resource and climate variability (Delworth and Manabe, 1988;Koster and Suarez, 2001). TWS consists of soil moisture, groundwater, snow, and canopy storage. Each component plays a significant role in the global water cycle and interacts closely with the landatmospheric water-energy exchange (Koster et al., 2006;Fischer et al., 2007;Seneviratne et al., 2010). The TWS components can be measured or estimated by various platforms (e.g., satellite measurement and model simulation). However, spatial resolutions are coarse due to the limitation of sensors and models that focus on global or continental scales (e.g., Rodell et al., 2004;Alkama et al., 2010). At a regional or local scale, the spatial resolution of the TWS estimate is vital, as most applications (e.g., risk management for drought or flood) require accurate information at the county or subcounty level (Quiring, 2009). This motivates the development of TWS estimates at higher spatiotemporal scales, cor-4186 N. Tangdamrongsub et al.: Development and evaluation of 0.05 • terrestrial water storage estimates responding particularly to an increased interest in exploiting TWS in interdisciplinary studies (e.g., IPCC, 2007;NASEM, 2018).
TWS information can be obtained or estimated from ground observation networks, satellite measurements, or model simulations. Each has different strengths and limitations. Ground observations (e.g., soil moisture probe and groundwater well) are considered the most reliable, providing measurements closest to the truth (e.g., Dorigo et al., 2013). However, ground observations have high maintenance costs and incomplete coverage. Also, point measurements only reflect information at one location and not necessarily the entire region. On the other hand, satellite platforms offer an automated measurement with improved coverage ranging from regional to global (e.g., Tapley et al., 2004;Entekhabi et al., 2010a). The challenges of using satellite measurements are the coarse spatial resolution and the sensors' technical limitations (e.g., penetration depth, cloud/vegetation obstruction, and background noise). Therefore, its usage is restricted to a large region and requires a sophisticated algorithm to retrieve TWS variables (Crow et al., 2012;Castellazzi et al., 2016;Tangdamrongsub et al., 2019). In addition, satellite measurements can be limited by a relatively short period of record (e.g., Flechtner et al., 2014;Karthikeyan et al., 2017).
TWS components can also be simulated from a land surface model (LSM). The LSM incorporates various land surface physics into complex numerical sequences to allow the simulation of TWS to be performed at any desired location and spatiotemporal scale (Pitman, 2003). The LSM can provide a complete suite of TWS components compared with the ground or satellite measurements that can only measure a single or integrated TWS component (e.g., Tapley et al., 2004;Entekhabi et al., 2010a). However, due to the limitations of input data, the spatial resolution of many LSMs is coarse (e.g., > 0.25 • (∼ 25 km)), which consequently restricts their application to a large region (e.g., Rodell et al., 2004;Alkama et al., 2010;Ke et al., 2012).
Efforts to improve the spatial resolution (and time span) of global TWS estimates have been made in many modeling communities (e.g., Bierkens et al., 2015;Bierkens, 2015). For instance, Ke et al. (2012) improved the resolution of the Community Land Model (CLM) from 0.5 • (∼ 50 km) to 0.05 • (∼ 5 km) using modified land surface parameters. The World-Wide Water model (W3; van Dijk et al., 2013a, b) recently allowed the global TWS variables to be estimated at 0.05 • (∼ 5 km). The European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis 5 (ERA5) offers global land surface variables at 0.1 • (∼ 9 km) resolution from 1981 to present (see the data availability section). A similar effort is also seen in hydrologic model development, such as the PCRaster Global Water Balance (PCR-GLOBWB; Sutanudjaja et al., 2018), which improves the spatial resolution from 30 arcmin (∼ 50 m or ∼ 0.5 • ) to 5 arcmin (∼ 9 km or 0.083 • ) and extends the time span to more than a 50-year period. The enhancement of model spa-tial resolution and time span receives even more attention at the local level, where spatial detail down to a few kilometers is needed (e.g., Rasmussen et al., 2014;Singh et al., 2015;Beamer et al., 2016;Dong et al., 2020).
On top of the improved spatiotemporal resolutions, the improved accuracy of TWS estimates is also a concern in LSM developments. As in most environmental modeling systems, model outputs are associated with a high degree of uncertainty propagated from, e.g., inaccurate meteorological forcings, imperfect model physics, and ineffective parameter calibration. Data assimilation (DA; Reichle et al., 2002;Reichle, 2008) techniques can be employed to improve LSM performance. The approach sequentially updates the model's states using an optimal value computed by combining model simulations with observations. A variety of satellite observations reflecting different TWS components can be assimilated into the system (e.g., Kumar et al., 2014;Dong and Crow, 2018). TWS observations from the Gravity Recovery And Climate Experiment (GRACE) satellite mission (Tapley et al., 2004) offer integrated water column information that can be used to constrain multiple water storage components simultaneously (e.g., Zaitchik et al., 2008;Forman et al., 2012;Tangdamrongsub et al., 2015). GRACE DA has shown positive impacts on most TWS components, including groundwater (e.g., Girotto et al., 2017;Nie et al., 2019), soil moisture (Jung et al., 2019), and snow (Kumar et al., 2016).
The Community Atmosphere Biosphere Land Exchange model (CABLE; Kowalczyk et al., 2006) is an open-source global LSM developed and updated by the community. CA-BLE is a core LSM of the Australian Community Climate and Earth System Simulator (ACCESS; Bi et al., 2013;Kowalczyk et al., 2013) that can be used to simulate water storage and fluxes globally. The model has been regularly updated to incorporate state-of-the-art model physics (e.g., Decker, 2015;Ukkola et al., 2016;Haverd et al., 2018). Despite its success, CABLE's spatial scale is currently limited to 0.5 • (∼ 50 km) due to the 0.5 • resolution of its parameter and forcing data sets. This contrasts with other global model developments, where high-resolution versions have already been developed (e.g., van Dijk et al., 2013a, b;Sutanudjaja et al., 2018). CABLE and its inputs must be reconfigured to increase the spatial detail of TWS estimates for smaller-scale studies (e.g., 0.01-0.05 • ). Our effort to increase the study's spatial resolution should narrow this development gap and has not previously been implemented.
This study aims to improve the accuracy, spatial resolution, and time span of CABLE TWS estimates. Our approach utilizes only publicly available global data sets, so resulting TWS estimates can be reproduced over any target region (see the data availability section for the data access). The spatial detail of CABLE is improved from 0.5 to 0.05 • (∼ 5 km) using high-resolution forcing data (precipitation in particular) and land surface parameters derived from high-resolution maps of soil and vegetation cover. The development is demonstrated in Australia, where ground ob-servation networks (e.g., surface soil moisture, groundwater, and evapotranspiration) are available to validate the result. The demonstrated simulation period is 1981-2012, coincident with the availability of meteorological forcing data. Recent studies have shown success in assimilating GRACE data into a coarse-scale CABLE version to improve TWS and groundwater storage (GWS) estimates in the Goulburn River catchment and in the North China Plain Yin et al., 2020). In this study, GRACE observations (Luthcke et al., 2013) are also assimilated into CABLE 0.05 • (and CABLE 0.5 • ) to improve the accuracy of TWS components between 2003 and 2012. Assimilating the coarse GRACE observations into a much higherresolution model is performed using the 3-dimensional ensemble Kalman smoother (EnKS 3D; Tangdamrongsub et al., 2017). This approach will reveal whether assimilating GRACE data can benefit a newly developed fine-scale CA-BLE configuration. Our study will perform a thorough investigation on this issue to address GRACE DA's benefit on CABLE 0.05 • .
The objectives of this paper are (1) to present the development and evaluation of retrospective 0.05 • TWS estimates and (2) to assess the GRACE DA impact on 0.05 • CABLE version, as well as the benefit of assimilating the coarse resolution satellite data into a fine-scale model. This paper is presented as follows. Section 2 provides the details of the study area, model configurations, and data processing. Section 3 presents the GRACE DA schematic and the statistical metric used in the evaluation. Section 4 presents the assessment and validation of the result. Finally, Sect. 5 summarizes the findings of this study and provides a possible direction for future development.
2 Study area and data

Study area
This study uses Australia as a case study. Due to its size and geographic location, Australia is influenced by multiple climate drivers (Murphy and Timbal, 2008;Xie et al., 2016) and experiences episodes of severe droughts and floods (e.g., van Dijk et al., 2013a). The recent long-term drought, known as the Millennium Drought (Bond et al., 2008), severely affected industrial and agricultural sectors and has led to a significant economic loss nationwide (see, e.g., van Dijk et al., 2013a). The need for an accurate prediction of possible water scarcity from climate variations motivated the development of land surface and hydrology models in Australia, e.g., the Australian Water Availability Project (AWAP; Raupach et al., 2008), the Australian Water Resources Assessment -Landscape Model (AWRA-L; van Dijk, 2010), and CABLE . Recent work has assimilated GRACE satellite data into such water models (e.g., Tian et al., 2017;Schumacher et al., 2018;Tangdamrongsub et al., 2020). Studies relevant to GRACE DA in Australia are summarized in Table 1. Ground observation networks have also been installed across the continent and continuously monitor the water storage and flux variations. Such data records are valuable for validating the accuracy of the model estimates and remote sensing observations. Details regarding the in situ data used in this study are provided in Sect. 2.3.2.

Model configuration
In this study, TWS estimates are derived from CABLE. A history of the model's development can be found in, e.g., Wang et al. (2011), Kowalczyk et al. (2013), Decker (2015), and Haverd et al. (2018). Multiple CABLE versions have been developed since 2003 for different objectives (see the data availability section); CABLE SubgridSoil GroundWater (CABLE-SSGW; Decker, 2015) is considered most suitable for our use due to its inclusion of comprehensive terrestrial water storage components and, especially, a groundwater module. CABLE is developed using Fortran and can be executed in a Unix environment. The input/output file format follows the NetCDF Climate and Forecast (CF) convention. The model has been used to simulate global TWS at 0.5 • spatial resolution using 0.5 • resolution model parameters and forcing data (e.g., Decker, 2015;Ukkola et al., 2016). The variables used to assess TWS consist of soil moisture storage (SMS), snow water equivalent (SWE), canopy storage (CNP), and GWS.
The model parameters of CABLE-SSGW are derived from several different sources Wang et al., 2011;Decker, 2015). Land use/vegetation type categories are obtained from the International Geosphere-Biosphere Programme (IGBP) classification from the Moderate Resolution Imaging Spectroradiometer (MODIS; Friedl et al., 2002). Relative volumes of silt, sand, clay, and organic matter in the soil are obtained from the Harmonized World Soil Database (Fischer et al., 2008). The Zobler soil category (Zobler, 1999) is computed empirically from the silt, sand, and clay fractions (Oleson et al., 2010). The monthly climatology of the leaf area index (LAI) is computed using a reprocessed MODIS LAI product (Yuan et al., 2011). All derived model parameters are resampled to 0.5 • to match the 0.5 • model grid space. Comprehensive details of model parameters can be found in, e.g., Kowalczyk et al. (2006), Wang et al. (2011), andDecker (2015).
This study rederives the model parameters and employs enhanced forcing data to increase the model spatial detail from 0.5 to 0.05 • . The vegetation type is derived from the global land cover climatology using MODIS data (Broxton et al., 2014). The soil map is also derived from the Harmonized World Soil Database but at 0.05 • grid spacing. The 0.05 • monthly climatology LAI is derived from the Global Land Surface Satellite product (GLASS; Xiao et al., 2014). All rederived parameters are shown in Fig. 1. The soil layer thicknesses, from top to bottom, are set to 1.2, 3.8, 25, 39.9, 107.9, and 287.2 cm, and the unconfined aquifer's thickness is set to 20 m. The surface soil moisture (SSM; cubic meter per cubic meter, hereafter m 3 m −3 ) is defined as the top two layers, and the SMS (meters) is computed from all soil layers.
The CABLE model is forced with precipitation, air temperature, wind speed, humidity, surface pressure, and shortwave and longwave downward radiation. For precipitation, we use the Climate Hazards Group InfraRed Precipitation with Station data (CHIRPS; Funk et al., 2015), provided on a 0.05 • grid. The other forcing variables are provided on a 0.5 • grid by the high-resolution global data set of meteorological forcings for land surface modeling version 2, Princeton University (Princeton; Sheffield et al., 2006). When performing 0.5 • model simulations, the CHIRPS data are spatially averaged to 0.5 • grid space while other forcing variables maintain their intrinsic 0.5 • resolution. When performing 0.05 • model simulations, the coarser-resolution forcing variables are spatially resampled to 0.05 • model grid space using the nearest-neighbor interpolation.
Model simulations are performed between 1981 and 2012 (a total of 32 years), given the Princeton forcing data's availability. Similar to McNally et al. (2017), temporal disaggregation is applied to CHIRP precipitation data to resample from 1 d to 3 h, consistent with the Princeton data's time step. The scale factor derived from the 3 h Princeton precipitation data is used to (temporally) rescale the CHIRPS data. The characteristics of forcing data and the derived model parameters used in this paper are given in Table 2.
In all simulations, initial states are obtained using a 320year spinup, i.e., performing 10 repeated runs between 1981 and 2012.

GRACE data
GRACE is a twin satellite-to-satellite tracking mission designed to measure the mean and time-varying components of the Earth's gravity field (Tapley et al., 2004). Every month, GRACE provides a time-varying gravity solution containing information about mass redistribution near the Earth's surface. The monthly gravity change is dominated by a hydrol-    ogy signal, making the GRACE product beneficial for various hydrological and geophysical applications (e.g., Klees et al., 2008;Mouyen et al., 2018;Rodell et al., 2018;Tapley et al., 2019). Different GRACE solutions have been released, including the mascon solution (e.g., Luthcke et al., 2008;Rowlands et al., 2010). The mascon approach utilizes mass concentration blocks (as a basis function) to determine the Earth's mass variation and is found to provide a more accurate TWS estimate compared to the spherical harmonic approach (Rowlands et al., 2010). In this study, the mascon (mass concentration) product from the Goddard Space Flight Center (GSFC) is used (Luthcke et al., 2013). The GSFC Mascon product contains monthly TWS variations ( TWS), expressed in equivalent water height (in, e.g., meters). The glacial isostatic adjustment correction is applied using the ICE6G model (Peltier et al., 2015). The mascon varies in size and represents the average TWS of the associated grid cell. The spatial distribution of mascon in Australia is shown in Fig. 2. The GSFC Mascon product also provides monthly TWS uncertainties, and they are used to represent the observation error of the individual mascon. In this study, GRACE data are assimilated into CABLE between January 2003 and December 2012 (due to the availability of GRACE data). To convert monthly TWS into absolute TWS (necessary for the GRACE DA process), the temporal mean value of the CABLE-simulated TWS from 2003 to 2012 is added to the GSFC Mascon product. This process reconciles the observed long-term mean with the model estimates.

Satellite-derived products
The satellite-derived soil moisture and evapotranspiration (ET) data obtained from the European Space Agency Climate Change Initiative program (ESA CCI; Dorigo et al., 2017;Gruber et al., 2019) and the Global Land Evaporation Amsterdam Model (GLEAM; Martens et al., 2017) are used to validate the soil moisture and evaporation estimates, respectively. The ESA CCI COMBINED product combines multiple active and passive satellite sensor soil moisture products and provides a near-global daily volumetric soil moisture product at 0.25 • resolution with ∼ 0.04 m 3 m −3 accuracy (unbiased root mean square error). The combined product version 4.7 (v04.7) is used in this study. The product includes approximately eight different satellite observations, including, e.g., SSM/I, AMSR-E, ASCAT, Windsat, and SMOS (see, e.g., Fig. 3 of Dorigo et al. (2017) for complete details) during our evaluation period . GLEAM is an algorithm that derives the daily global terrestrial evaporation using observations from multiple satellite microwave sensors and reanalysis data sets (Martens et al., 2017). In total, two product variants are available, i.e., the satellite-only and the reanalysis, and the newest release of the latter (version 3.3a) is used in this study due to its consistent time span with our evaluation period.

In situ data
In situ soil moisture, groundwater, and ET measurements are obtained from different ground observation networks. The daily in situ soil moisture data are obtained from the Scaling and Assimilation of Soil Moisture and Streamflow (SAS-MAS; Rüdiger et al., 2007) network in the southeastern part of the Murray-Darling Basin (see Fig. 1). The SASMAS network hosts more than 20 measurement sites and provides volumetric soil moisture (θ ; m 3 m −3 ) data associated with 0-5 cm depth. Only sites with a data record longer than 3 years are used in our analysis. The monthly in situ groundwater level data are collected from the Australian Bureau of Meteorology through the Australian Groundwater Explorer. More than 870 000 monitoring bores are distributed across the continent. At each bore, the groundwater level measurement is converted to the groundwater level variation by removing the long-term mean associated with the entire data record. The bores are excluded from the analysis if the data record is shorter than 3 years or has significant missing data. The groundwater level measurements are not converted to groundwater storage due to the absence of accurate knowledge of specific yield.
The in situ ET (i.e., latent heat flux) is obtained from the FLUXNET2015 data set (Pastorello et al., 2017). FLUXNET is a global network measuring carbon and energy fluxes. More than 20 flux tower sites are distributed across Australia and are associated with different periods (e.g., 2001-2014). Only the sites with 3 years of data or longer are used in our analysis (see site locations in Fig. 1a).

Ensemble Kalman smoother (EnKS)
The ensemble Kalman smoother (EnKS; Zaitchik et al., 2008) is used to assimilate the GRACE-derived TWS into the CABLE model. The 3-dimensional EnKS (EnKS 3D) scheme described in Tangdamrongsub et al. (2017) is used in this study for two reasons. First, it accounts for spatial correlations in model and observation errors. The latter are highly correlated at neighboring 0.5 • × 0.5 • or 0.05 • × 0.05 • grid cells. Second, EnKS does not require interpolation of the observations (as in the ensemble Kalman filter (EnKF); Tangdamrongsub et al., 2015) and mitigates the spurious jump in water storage estimates caused by applying updates at the end of the month only. The additional computational cost is small, for handling large covariance matrices and running the model twice for each month.
The GRACE DA comprises forecast, analysis, and distributing update steps. The forecast step propagates the model states forward in time for approximately 1 month. The analysis step computes the monthly model state update using GRACE observations (and uncertainties). The final step reinitializes the ensemble (e.g., initial states and forcing data) and reperforms the forecast step with the DA update (increment) distributed evenly throughout the month. Figure 3 illustrates the concept of the GRACE DA process. Pseudocodes of GRACE DA can be found in the Supplement.
The meteorological forcings and model parameters are perturbed using N = 100 ensemble members prior to GRACE DA processing. Multiplicative white noise is used to perturb the precipitation and shortwave radiation, while additive white noise is used for the air temperature and model parameters. The characteristics of the uncertainties are given in Table 3. Downscaling and upscaling forcing data also cause errors. In our DA process, when the data are resampled, their errors are also adjusted. The relationship between coarse and fine-scale errors can be expressed as follows: where σ c and σ f represent coarse and fine-scale errors, (h, l) is the index of a grid cell, M is the number of fine-scale grid cells used in resampling, φ is a spherical distance between grid cells, and φ 0 is the considered correlation length (e.g., a coarse scale's grid size). After the perturbation process, CABLE model states are then propagated for approximately 1 month (the forecast step). The state vector consists of nine model states (n = 9), including six soil moisture lay-ers, canopy storage, snow water equivalent, and groundwater storage.
In the analysis step, when a GRACE observation is available, the monthly averaged states (ψ) are related to the GRACE observations by the following: where d j is an m×1 perturbed observation vector containing the perturbed GRACE mascon for the month of interest, H is a measurement operator which relates the ensemble state ψ j to the vector d j , m is the number of GRACE mascon cells used in the calculation, and j indicates ensemble index. The uncertainties in the observations are described by the random error , which is assumed to have zero mean and covariance matrix R m×m . The subscription denotes the dimension of the matrix. Note that R is a variance matrix here as only the variance components are provided in the mascon product.
In EnKS 3D, multiple model and observation grid cells (e.g., inside 300 km radius corresponding to GRACE spatial resolution) are simultaneously used to compute the state update. Figure 2 (see circle B) demonstrates the model and mascon grid cells used in the analysis step to compute the update of the center mascon cell (see also Fig. 7 of Tangdamrongsub et al. (2017) for more details). The H matrix is defined as follows: where k i is the number of model grid cells inside a mascon i (see, e.g., rectangle A in Fig. 2 The analysis equation is then expressed as follows: where A a K×N represents the updated state vector, A K×N is the monthly averaged update from EnKS 3D, and K K×m is the Kalman gain matrix. The superscript "T" denotes a transpose (matrix) operator. The model and observation error covariance matrix (P e ) K×K , (R e ) m×m are computed as follows: where ϒ contains the measurement error of all ensemble members. After the monthly averaged update A is obtained, the daily increment ( A d ) of the update is computed by dividing A by the total number of days in that month. Note that only A d of the center mascon is saved (see also Tangdamrongsub et al., 2017). The processes described in Eqs.
(2)-(7) are repeated through all mascon cells to obtain all individual A d in the study domain. Then, the model is reinitialized using the previous month's initial states, and the simulation is performed again while adding A d to the model initial states daily (the distributing update step). The DA process is performed until the last month of the study period (December 2012).

Resample approach
The state estimates are validated against the referenced data described in Sect. 2.4. As the spatial resolution of the model estimate and referenced data are different, a spatial resample is performed before the comparison. The model estimate is resampled to the observation grid space using the nearestneighbor gridded interpolation when the model's resolution is coarser. Conversely, the estimate is upscaled (spatial averaging) when the model's resolution is higher. The evaluation is conducted at the observation grid cell.

Correlation and root mean square difference
The agreement between the estimated variable and the in situ data is assessed using the Pearson correlation coefficient (ρ) and the root mean square difference (RMSD). At a particular grid cell, ρ is calculated as follows: Table 3. Perturbation settings associated with the meteorological forcing data and model parameters. The comprehensive description of model parameters can be found in, e.g., Decker (2015) and Ukkola et al. (2016). The spatial correlation error is also applied to forcing data (fourth column where the y vector contains the model estimates, the x vector contains the validation data (observations), E[ ] is the expectation operator, and (y, x), and (σ y , σ x ) are the mean and standard derivations of y and x, respectively. The RMSD is computed as follows: where L denotes the length of the time series.

Long-term trend and seasonal variations
The long-term trend, annual amplitude, and phase of the time series are computed using the least-squares adjustment associated with five parameters, offset (a), long-term trend (b), annual variation (c, d), and semi-annual variation (e, f ). A time series (y) at a particular grid cell can be expressed as follows: y = a + bt + c sin ωt + d cos ωt + e sin 2ωt + f cos 2ωt, (10) where the t vector contains time, and T is an annual period. The annual amplitude (A) and phase (ϕ) are computed as follows:

Spatial resolution
Spatial resolution is defined as the minimum distance at which two signals of equal magnitude can be separated. The spatial resolution can be determined from the empirical (and isotropic) covariance function (C) computed as follows (Tscherning and Rapp, 1974): where (p h , p l ) are vectors containing data points (h, l) associated with the spherical distance φ hl , and n hl is the number of data pairs considered in the calculation. The spatial resolution (or correlation length of the covariance function) is defined in this study as being the distance ψ at which C(φ = 0) Figure 4. The concept used to determine spatial resolution (correlation length). The spatial resolution is defined to be the spherical distance φ at which the 0 km covariance (or normalized covariance), C(φ = 0) decreases by half.
decreases by half. The diagram in Fig. 4 illustrates how this correlation length is determined.

Case studies
This paper uses four case studies to quantify the effect of different model grid sizes and GRACE DA. The case studies are described as follows: It is noteworthy that the 0.5 or 0.05 • represents the CABLE grid size, which may differ from the spatial resolution. The term "spatial resolution", as used in this paper, refers to the determined resolution computed from Sect.  (12) and (13) and are shown in Fig. 5. Both CA-BLE versions have similar spatial features, but more localized and much finer details are seen in the CABLE 0.05 • simulation than in the CABLE 0.5 • simulation. A clear difference in annual amplitude is shown over the Yarra Ranges and the Alpine National Park (compare Fig. 5a vs. Fig. 5b insets). CABLE 0.05 • provides greater details of the TWS spatial distribution, and the annual amplitude is approximately 30 % higher than it is in the coarse-scale version. Differences in spatial details are also seen in the phase estimates (see Fig. 5c vs. Fig. 5d). The spatial resolutions can be quantitatively determined using the empirical covariance function, Eq. (14), described in Sect. 3.2.4. The covariance function is computed for each month's TWS estimates using all grid points in Australia. Figure 6 shows the averaged spatial resolution (correlation length) of CABLE 0.5 • and CABLE 0.05 • for each month between 1981 and 2012. The CABLE 0.5 • simulations have a spatial resolution of ∼ 50 km, consistent with the grid size of the input 0.5 • CABLE parameters and forcing data. Larger correlation lengths are found during the rainy seasons (January-April in the north and August-November in the south) and during the dry season (e.g., June). Soil moisture and aquifer storage increase during the wet seasons, leading to more uniform (and smoother) spatial moisture features. Similar uniformity can also be observed during the dry season. At the beginning of the wet season, scattered rainfall in part of the continent likely causes a gradient between dry/wet areas, resulting in smaller correlation lengths. It is noteworthy that our analysis only explains the overall temporal pattern of continental correlation lengths. The temporal pattern may also be affected by the local TWS wet/dry features or by the spatial distribution of model parameters.
The use of CABLE 0.05 • significantly improves the spatial resolution by about a factor of 2 to 3. Note, again, that the spatial resolution of CABLE 0.05 • presented in Fig. 6 reflects the continental averaged value while the finer (higher) spatial resolution is observed in the individual river basin (not shown).

Assessment of long-term TWS variations
This study's time span allows for an assessment of overall trends and decadal variations in TWS estimates. The longterm trends of water balance states and fluxes from CABLE 0.05 • between 1981 and 2012 are shown in Fig. 7. A strong relationship between components is observed, particularly in the storage components (Fig. 7a-d). The TWS and GWS ( Fig. 7a and d) have very similar spatial patterns, where a wetting trend is observed in the northern region, the Indian Ocean basin, and the western part of the Murray-Darling Basin (see Fig. 1a for the basin's location), and a drying trend is seen in the central part of the continent, the South  West Coast basin, and several parts of the Murray-Darling Basin. The similarity between the TWS and GWS (both spatially and in magnitude) indicates that GWS is a primary driver of the TWS trend in Australia. By contrast, soil moisture stores show different spatial patterns; increases in soil moisture are also found in the central and western parts of the continent (Fig. 7c). The SMS generally accommodates a large portion of the seasonal variation in water storage in Australia (see Sect. 4.2), but its role in the long-term trend is marginal -smaller than that of the GWS by about a factor of 2. ET trend estimates show a similar spatial pattern to SSM trends (Fig. 7e vs. Fig. 7b), which may be explained by ET pulling moisture from SSM stores. The long-term trend of the total runoff is in line with the TWS/GWS, increasing in the north and decreasing in the southeastern region and Tasmania (e.g., Fig. 7f vs. Fig. 7a). In the northern region, the soil moisture or aquifer is likely saturated due to a wet climate, with more significant annual rainfall (than the south) by about a factor of 5 (not shown). Such a condition leads to a greater magnitude of root zone moisture, groundwater recharge, and surface runoff variations. The opposite scenario is observed in the southeastern region, where the depleted TWS/GWS (induced by droughts) likely reduces runoff generation, resulting in a negative runoff trend.
Regional water balance components can also be analyzed at interannual and decadal timescales. Figure 8 shows the trends of water balance components in 3 different decades, i.e., 1981-1990, 1991-2002, and 2003-2012. The long-term trends are not monotonic. In other words, there are no "always dry" or "always wet" regions observed between 1981 and 2012. Reversals between increasing and decreasing trends are apparent in all components. For example, northern and western Australia experience a drying trend between 1981 and 1990 (Fig. 8a) and recover between 1991 and 2002 (Fig. 8b) after continuously receiving increased rainfall (not shown). The region experiences another drought episode (van Dijk et al., 2013a) in the first half of the 2000s,  (Fig. 8c). A similar reversal is also seen in the eastern regions, with wetting in 1981-1990, drying in 1991-2002, and wetting again in 2003-2012 (Fig. 8a-c).
Without such a long record, assessment of water resources may have limited reliability. For instance, based only on the ∼ 10-year period of GRACE observations, it is difficult to determine whether the negative TWS trend in western Australia (e.g., Fig. 8c; see also Fig. 11d) is caused by anthropogenic or natural processes (e.g., Richey et al., 2015). Evaporation or runoff after the extremely wet conditions prior to 2002 may also produce a similar decreasing trend (e.g., van Dijk et al., 2011;Munier et al., 2012). By assessing the historical TWS time series of the North Western Plateau basin obtained from CABLE 0.05 • (Fig. 9), the observed negative trend is more likely governed by a decadal cycle of drought and recovery. This approach demonstrates that there is clear value in utilizing a longer time span of TWS esti-mates, and that this can offer a more reliable assessment of regional water resources and climate variations.

Comparison with the satellite products
TWS, SSM, and ET estimates are compared with satellite data from GRACE, ESA CCI, and GLEAM, respectively (see Sect. 2.2 and 2.3 for each product's description). Note that the remote sensing products may contain biases (caused by, e.g., background model and processing algorithm) and do not necessarily represent the truth. (Ground truth validation is performed in Sect. 4.3.) The intercomparison performed in this section is only to assess the consistency between two independent estimates, i.e., model and satellite. The CABLE estimate is rescaled to the satellite product's grid before comparison, as described in Sect. 3.2.1.  satellite data. On average, both CABLE results are in good agreement with the satellite products, with correlation values greater than 0.55 (see Fig. 10a, b, e, f, i, j). CABLE 0.05 • shows a more robust agreement with satellite-derived variables than CABLE 0.5 • does; correlation values with TWS, SSM, and ET are 14 %, 14 %, and 7 % higher (respectively) in the former simulations than in the latter. Similar improvements are also observed in the RMSD evaluations, where CABLE 0.05 • provides smaller RMSD values than CABLE 0.5 • by 4 % (TWS), 14 % (SSM), and 17 % (ET). CABLE 0.05 • reduces the RMSD of the SSM estimate to as low as ∼ 0.03 m 3 m −3 , e.g., over the South West Coast and Lake Eyre basins, and the average RMSD value in Australia is ∼ 0.06 m 3 m −3 . It is worth noting that the average unbiased RMSD value (Entekhabi et al., 2010b) is 0.037 m 3 m −3 (not shown), in line with the accuracy of the CCI product (see, e.g., Table 1 of Dorigo et al., 2017). The improved agreement between CABLE 0.05 • estimates and the satellite products is also seen in the temporal variation (Fig. 11). Compared with the CABLE 0.5 • results, CABLE 0.05 • increases the dynamic range of the TWS estimates in most basins, leading to closer alignment with GRACE observations (Fig. 11a-c). Similar increases in the agreement are also observed in the CABLE 0.05 • SSM and ET estimates (Fig. 11d-i). The observed improvement may be attributed, in part, to the rederived model parameters. For example, in the North Western Plateau basin, CABLE 0.05 • uses a ∼ 17 % higher area-averaged sand fraction than CA-BLE 0.5 • does, which allows faster infiltration/drainage in the storage compartments, leading to greater dynamic ranges of TWS and SSM variations (see Fig. 11a and d). Consequently, the ET estimate is decreased as a response to increased water storage following the water balance equation (Fig. 11g). Similar mechanisms are also observed in the Lake Eyre and South West Coast basins, where the improved parameterization leads to improved agreement with the satel-lite data. The use of coarse-resolution forcing data (e.g., precipitation) could also explain the small TWS amplitude observed in CABLE 0.5 • . Coarse-scale forcing data averages local precipitation signals over a larger area than the finerresolution forcing data does, resulting in a smaller amplitude.

The impact of GRACE DA
GRACE observations are assimilated into the CABLE 0.5 • and CABLE 0.05 • models (called GRACE DA 0.5 • and GRACE DA 0.05 • , respectively) between January 2003 and December 2012 (due to the availability of meteorological forcing and GRACE data). The basin-averaged TWS estimates from CABLE with and without GRACE DA are shown in Fig. 12, alongside the GRACE observations themselves. In most basins, apparent disagreements between the openloop estimates and the GRACE observations suggest the current CABLE models' limited accuracy. After assimilating GRACE into the models, the estimates (GRACE DA 0.5 • and GRACE DA 0.05 • ) move toward the GRACE observations. The positive impact of GRACE DA on the basinaveraged TWS estimates is similar regardless of the model spatial resolutions. This likely reflects the nature of GRACE observations that provide integrated water storage information at the continental or basin scale (Tapley et al., 2004). However, it should be noted that the GRACE DA application does not degrade the spatial resolution of the model. The offline analysis shows that the average correlation length of GRACE DA 0.5 • and GRACE DA 0.05 • remain the same as that of CABLE 0.5 • and CABLE 0.05 • , respectively (not shown). The EnKS 3D scheme exploits the spatially correlated information from the high-resolution model to disaggregate the coarse-scale observations into a finer grid space, resulting in the preservation of the model's intrinsic resolution. The impact of GRACE DA is also observed in water redistribution within the water storage components. Figure 13 shows the contributions of four different water storage components (SMS, GWS, SWE, and CNP) to TWS in different basins before (CABLE 0.05 • ; Fig. 13a) and after the application of GRACE DA (GRACE DA 0.05 • ; Fig. 13b). The contribution is calculated as a percent of the annual amplitude of TWS fluctuations. In CABLE 0.05 • (Fig. 13a), the SMS is a major contributor to more than 90 % of the TWS variation (i.e., annual amplitude). The GWS contribution is only ∼ 10 %. After applying GRACE DA (Fig. 13b), the contribution of the GWS is significantly increased. It dominates the entire water column in several basins (e.g., Indian Ocean, Lake Eyre, North West Plateau, and South West Plateau). This behavior reflects the nature of GRACE in that the groundwater provides a majority of the seasonal changes to terrestrial water mass. GRACE DA has been shown to significantly affect GWS in previous studies (e.g., Girotto et al., 2016;Tangdamrongsub et al., 2018;Li et al., 2019). The contributions of the SWE and CNP components are negligible across Australia, and the impact of GRACE DA on Figure 10. The comparison between CABLE 0.5 • and CABLE 0.05 • estimates and different remote sensing products in terms of correlation coefficient (a, b, f, i, j) and root mean square difference (RMSD; c, d, g, h, k, l). The terrestrial water storage (TWS; a-d), surface soil moisture (SSM; e-h), and evapotranspiration (ET; i-l) are compared with GRACE, ESA CCI, and GLEAM products, respectively. The averaged statistical values (over Australia) associated with each comparison are also given.  them is trivial. Similar changes to TWS contributions are also observed between CABLE 0.5 • and GRACE DA 0.5 • (not shown). It is noted that Figs. 12 and 13 only present the impact of GRACE DA on storage components and do not assess the accuracy of either version. The accuracy of the OL and DA models is quantified in Sect. 4.3.

Validation with in situ data
In situ data from three different ground networks (see Sect. 2.3.2) are used to validate the GWS, ET, and SSM estimates. Validation is conducted by quantifying the change in correlation values between simulations and observations when one model is used in place of another. The validation period is between January 2003 and December 2012, consistent with the GRACE DA period. Figure 14 shows the validation of GWS estimates between GRACE DA and OL simulations (DA minus OL). A positive value indicates improvement, while a negative value represents degradation. A significance test is performed at the 0.05 level based on the Fisher Z transformation test for correlation coefficients (Zaitchick et al., 2008). The average change in correlation value across all in situ data sites is shown in Fig. 14d. Values that failed the significant test are excluded from the averaging.
We first analyze the impact of GRACE DA on the GWS estimates of CABLE 0.5 • and CABLE 0.05 • (Fig. 14a  and b). A greater correlation improvement is observed when GRACE DA is applied to the 0.5 • model, which increases the average correlation value by 0.23 (Fig. 14d1). When GRACE DA is applied to the 0.05 • model (Fig. 14b), the average correlation value improves by 0.12 (Fig. 14d6), which is ∼ 50 % less than in the 0.5 • model.
When comparing the two OL runs (CABLE 0.5 • vs. CA-BLE 0.05 • ), the GWS estimate from CABLE 0.05 • has a higher correlation value by 0.12 (see Fig. 14d2). This analysis indicates that the lower correlation improvement seen between CABLE 0.05 • and GRACE DA 0.05 • is unlikely caused by the reduced impact of GRACE DA on the highresolution model. Rather, it is a result of CABLE 0.05 • having better accuracy to start with. GRACE DA tends to provide the same information to both 0.5 and 0.05 • models. Still, GRACE DA 0.05 • exhibits the best correlation with in situ data, ∼ 0.02 higher than GRACE DA 0.5 • (Fig. 14c and d5). While GWS estimates from both CABLE 0.5 • and CABLE 0.05 • improve with GRACE DA, we find that GRACE DA 0.5 • (DA) shows a higher correlation value than CABLE 0.05 • (OL) by 0.1 (see Fig. 14d3). This indicates that improving model state estimates via DA is more effective than improving model parameters via increased resolution. Despite different study areas, LSMs, and validation data, our finding is in line with, e.g., Girotto et al. (2017) and Nie et al. (2019), who also found a significant impact of GRACE DA on GWS components.
SSM and ET estimates are validated against ground measurements from SASMAS and Ozenet, respectively. Correlation coefficients between the case study simulations and observation networks are summarized in Fig. 15. For SSM (Fig. 15a), CABLE 0.05 • exhibits slightly improved correlation with observations (by ∼ 0.6 %) than CABLE 0.5 • does. The addition of GRACE DA also shows a small but positive impact on SSM estimates, improving correlation with observations by ∼ 1 %. The small impact is attributed to limited GRACE sensitivity. GRACE is sensitive to the low-frequency variation (originated from deeper stores) and cannot effectively capture SSM, which is dominated by a high-frequency signal (e.g., precipitation). As a result, GRACE DA is found to have a minor (or negative) impact on the top soil component in most GRACE DA studies (e.g., Li et al., 2012;Tian et al., 2017;Tangdamrongsub et al., 2020). The small impact on SSM estimates also agrees with Jung et al. (2019), who observed GRACE DA's small (or negative) impact over dry regions in West Africa.
For ET (Fig. 15b), CABLE 0.05 • exhibits a more improved correlation with observations (by ∼ 5 %) than CA-BLE 0.5 • does. The inclusion of GRACE DA also slightly improves correlation values over the associated OL model versions. Greater improvement (by ∼ 2 %) is seen between CABLE 0.5 • and GRACE DA 0.5 • than between CABLE 0.05 • and GRACE DA 0.05 • . As with SSM, the small improvement of ET is likely attributable in part to small GRACE DA updates (caused by the limited GRACE sensitivity to high-frequency surface fluxes). The SSM is a primary moisture source for ET, so a trivial change in SSM leads to a similarly small change in ET.

Conclusion
This study enhances the spatial resolution and time span (> 30 years) of regional TWS estimates using the CABLE LSM, high-resolution land cover maps and forcing data, and GRACE DA application. By improving the model parameter and forcing data (without GRACE DA), the developed CABLE 0.05 • model shows clear improvements in the accuracy of water balance component estimates (e.g., soil moisture, groundwater, and evapotranspiration) when compared with in situ and independent satellite data. The 0.05 • model also improves the spatial resolution by a factor of 2 to 3 over the 0.5 • version. The extended time span provides insightful information for long-term assessment of regional water resources and climate variability. The enhanced model parameterization is found to play a significant role in the improved TWS estimates. Incorporating GRACE DA into the model leads to further improvements in TWS component estimates. The positive impact of GRACE DA is found in the deep storage component (e.g., GWS), while the impact on the surface components and flux estimates (i.e., SSM and ET) is trivial. Of the four case studies investigated here, the most accurate simulation uses CABLE 0.05 • with GRACE DA. The enhanced CABLE model resolution developed in this study relies on improved parameter and forcing data. The land surface physics remains unchanged. The workflow can be adopted for other CABLE repositories or different LSMs with only slight modifications, e.g., number of soil or vegetation types. This means TWS estimates can be reproduced The relative performance of each simulation is indicated with blue/red shading. Blue indicates that the first simulation is better than the second, and red indicates that the second is better than the first. Changes that are not significant at the 0.05 level are displayed in gray. (d) The average change in correlation value from each scenario is the vertically displayed case study minus the horizontally displayed case study (e.g., d1 represents GRACE DA 0.5 • minus CABLE 0.5 • and indicates that the former is significantly better than the latter). with more spatial detail by CABLE 0.05 • at locations outside the area studied here since high-resolution forcing data and model parameters are available globally (or near globally). However, the performance of such simulations might differ from this study due to the uncertainty in model parameters and forcing data that vary with geolocations (e.g., Herold et al., 2017;Tifafi et al., 2018). This remark also applies to the performance of GRACE DA. Although the improvement of assimilating GRACE into CABLE is also seen in other regions, e.g., northeastern China (Yin et al., 2020), it is still difficult to quantify the benefit of GRACE DA over global river basins based on these early developments of CA-BLE and GRACE DA. Validation is highly encouraged to ascertain the accuracy of TWS estimates when performing the simulation in other regions.
Our development is only demonstrated between 1981-2012 due to the availability of the Princeton forcing data. Future development can consider extending the temporal record of TWS estimates. The time span extension is feasible using reanalysis forcing data from the Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2; Gelaro et al., 2017). Despite a slightly coarser spatial resolution than the Princeton data, MERRA2 data sets would allow TWS simulations to be extended to the near present.
Code availability. Pseudocodes of GRACE DA can be found in the Supplement.
Author contributions. NT conceptualized the paper, led the formal analysis, developed the software, validated, visualized, and wrote the original paper. MJ supervised the project and acquired the funding. PS assisted with the editing of the paper.
Competing interests. The authors declare that they have no conflict of interest.
Disclaimer. Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.