GRACE storage-runoff hystereses reveal the dynamics of regional watersheds

. We characterize how regional watersheds function as simple, dynamic systems through a series of hysteresis loops using measurements from NASA’s Gravity Recovery and Climate Experiment (GRACE) satellites. These loops illustrate the temporal relationship between runoff and terrestrial water storage in three regional-scale watersheds (> 150 000 km 2 ) of the Columbia River Basin, USA and Canada. The shape and size of the hysteresis loops are controlled by the climate, topography, and geology of the water-shed. The direction of the hystereses for the GRACE signals moves in opposite directions from the isolated groundwater hystereses. The subsurface water (soil moisture and ground-water) hystereses more closely resemble the storage-runoff relationship of a soil matrix. While the physical processes underlying these hystereses are inherently complex, the vertical integration of terrestrial water in the GRACE signal en-capsulates the processes that govern the non-linear function of regional-scale watersheds. We use this process-based understanding to test how GRACE data can be applied prognostically to predict seasonal runoff (mean Nash-Sutcliffe Efﬁ-ciency of 0.91) and monthly runoff during the low ﬂow/high demand month of August (mean Nash-Sutcliffe Efﬁciency of 0.77) in all three watersheds. The global nature of GRACE data allows this same methodology to be applied in other regional-scale studies, and could be particularly useful in regions with minimal data and in trans-boundary watersheds


Introduction
At the most fundamental level, watershed processes can be described as the collection, storage, and release of water (Black, 1996;McDonnell et al., 2007).The runoff from these processes is governed by threshold mediated relationships across scales that result in storage-runoff hystereses (Spence, 2010).These threshold relationships between storage and runoff (S − R) are not uniform across a watershed, functioning as a series of discontinuous processes in soils and hillslopes that provide an integrated S − R relationship at the watershed scale (Spence, 2010).Kirchner (2009) described the S − R relationship to be non-linear and stated that watersheds typically function as dynamic systems governed by their unique climate and geology.These conceptual models of hydrologic behaviors help provide a process-based understanding of watersheds as dynamic environmental systems (Aspinall, 2010), and identify connections that advance hydrologic science and hydrologic prediction (Wagener et al., 2007).
Published by Copernicus Publications on behalf of the European Geosciences Union.
At the local scale, in situ instrumentation can quantify the non-linear relationship between streamflow and water stored in a watershed as snow, soil moisture, groundwater and reservoirs (Appleby, 1970;Brutsaert, 2008;Kirchner, 2009;Sayama et al., 2011).These four primary storage components, along with climate, topography, and geology govern the fluxes of water through a catchment, and play an important role in the hysteretic nature of storage and runoff dynamics (McGlynn and McDonnell, 2003;Mc-Namara et al., 2011).Knowledge of these processes is fundamental to developing an understanding of a watershed's hydrologic behavior.However, observations over larger regions can be technically challenging and costly, and in situ measurements from small basins do not necessarily represent the complexity inherent to watersheds at more broad scales (Spence, 2010).This scaling problem limits our understanding to predict regional hydrologic processes, which is often the practical scale of watershed management (Blöschl, 2001;Western et al., 2002;Skøien et al., 2003;Peel and Blöschl, 2011;Thompson et al., 2011).
In the absence of broad-scale observations, past hydrological studies have typically relied on in situ measurements as a proxy for regional scale hydrological processes.For example, in higher latitude or mountainous regions measurements of snow water storage have provided a simple metric that has been used in water resource planning for decades (Cayan, 1996;United States Army Corps of Engineers, 2001), and are often correlated to streamflow gauged downstream (Dozier, 2011).While informative, this approach can often provide hydrological forecasts that are misleading, because pointbased measurements do not fully represent the broad-scale variability of rugged mountain terrain (Dozier, 2011;Nolin, 2012;Webster et al., 2014;Ayala et al., 2014).Similarly, measurements of soil moisture in the upper 2000 mm of the soil rely on point-based data that are often distributed at the regional scale, but do not effectively represent the true variability of soil moisture across broad geographic areas (Western et al., 2002;Brocca et al., 2010).A complete understanding of groundwater stores and fluxes (deeper than 2000 mm) at regional scales also remains elusive, despite its increasing importance in water resources management (Wagener et al., 2007;Gleeson et al., 2012;Famiglietti and Rodell, 2013;Barthel, 2014).In addition to contributing to runoff, groundwater serves as an important water resource for consumptive use (Gleeson et al., 2012).
While local-scale methods have been applied with moderate success in the past, current trends in climate and in consumptive water demand suggest that long-term changes in hydrological fluxes will have a major impact at broad scales (Milly et al., 2008).As a result, the supply and demand of water is also expected to shift, especially at the regional scale (Wagener et al., 2010;Gleick, 2014a).
Hydrologic models can help address the questions of scale and bridge the gap between local observations and regionalscale processes by estimating the primary components of wa-ter storage (snow, soil moisture, reservoir, and groundwater) across a larger spatial grid.Regional-scale modeling approaches are integrated into water resource management operations for navigation, human consumptive use, irrigation, and hydropower (Payne et al., 2004;Rodell et al., 2004).Models can also be applied diagnostically to test scientific hypotheses and provide a better understanding of the physical processes that govern real world systems, such as the connections between snowmelt, streamflow, and groundwater (Beven, 2007(Beven, , 2010;;Moradkhani and Sorooshian, 2008;Kirchner, 2009;Clark et al., 2011;Capell et al., 2012).Despite their utility, developing and validating a model can be both time consuming and reliant on multiple data inputs, which even in the most well-instrumented basins provides sparse geographic coverage (Bales et al., 2006;Zang et al., 2012).The lack of an integrated measurement of water storage and streamflow has limited regional-scale hydrologic insights to model-based studies (Koster et al., 2010;Mahanama et al., 2011).
Since 2002, broad-scale measurements of changes in the amount of water stored across and through the earth have been available from NASA's Gravity Recovery and Climate Experiment (GRACE) satellites (Tapley et al., 2004).GRACE measures monthly changes in the Earth's gravitational field that are proportional to regional changes in total water storage (Wahr et al., 2006).GRACE satellites provide a monthly record of terrestrial water storage anomalies (TWSA), which represent the changes in the vertical sum of water at the Earth's surface stored in snow, surface, soil and groundwater.Water losses to runoff and evapotranspiration are implicit in the GRACE storage signal, which greatly simplifies calculations of changes in terrestrial water storage.
GRACE data, coupled with modeled and measured variations of water stored in snow, surface reservoirs and soils, have successfully been decomposed to quantify regional groundwater changes (Rodell et al., 2009;Famiglietti et al., 2011;Voss et al., 2013;Castle et al., 2014) and have contributed to improving water balance calculations (Zaitchik et al., 2008;Li et al., 2012).More recent efforts have quantified the relationship between regional water storage and specific streamflow events (Reager and Famiglietti, 2009;Reager et al., 2014).Riegger and Tourian (2014) coupled GRACE data using data-driven and model-based approaches to better understand the relationship between storage and runoff across climatic zones globally.Their study found that coupled liquid storage is linear to runoff, and that in climatic regions with snow and ice the relationship between storage and runoff is more hysteretic.These novel analyses, which are more diagnostic in nature, have provided new insights into regional watershed hydrology using GRACE measurements as a core data input.These studies have not explored how topography and geology can also help describe the S − R relationship of regional watersheds.Nor did these studies examine the ability of GRACE measurements to predict seasonal runoff.
In this paper, we use terrestrial water storage data from GRACE to better understand the hydrology of regional watersheds and the relationship between storage and runoff.The temporal relationships between coincident TWSA and discharge observations at three scales in the Columbia River Basin (CRB) of western North America are investigated using climate, topography, and geology as a framing principle to describe the shape of the storage-streamflow hysteresis.We associate regional and temporal differences in the hystereses with varying watershed dynamics.Finally, we compare the prognostic abilities of GRACE observations with individual modeled estimates of snow and soil moisture to predict seasonal streamflow at regional scales.

Study area
Our study area is the Columbia River Basin (CRB; 41-53 • N and 110-122 • W; Fig. 1).This basin has dry summers and wet winters.Up to 70 % of annual precipitation falls between November and March, 50-60 % of which occurs as snow (Serreze et al., 1999;Nolin et al., 2012).The spring months (April to June) are also wet, but warmer.Precipitation during the spring combines with snowmelt to swell rivers and potentially exacerbate flooding.Snowmelt also serves as a critical component of the hydrologic cycle, recharging aquifers and filling streams later in the year.These contributions bridge the temporal disconnect between wet winters and dry summers when demand is at its peak as farmers, fish, hydropower and municipal users vie for over-allocated water resources (United States Army Corps of Engineers, 2001;Oregon Water Supply and Conservation Initiative, 2008).However, concerns with winter surplus and summer scarcity are not uniform across the CRB, since climate and geology vary greatly (Nolin, 2012).Two of the study watersheds, the Upper Columbia (155 000 km 2 ) and the Snake River basin (182 000 km 2 ), represent distinctly different climatic, topographic, and geologic provinces of the CRB (described and illustrated in Fig. 1).The Upper Columbia is wet and is characterized by steep topography of fractured rock and poor groundwater storage.In contrast, the arid Snake River basin is bowl-shaped with mountains on three sides.The interior of the Snake River basin is a broad plain with welldeveloped soils underlain by a highly transmissive aquifer (Whitehead, 1992, Fig. 1).The Columbia River at The Dalles (614 000 km 2 ) encompasses the Upper Columbia and the Snake River sub-basins, and its climate and geology are an integration of the two (Fig. 1).A distinct climatic feature of the Columbia River at The Dalles is the western slope of the Cascade Mountains, where over 3000 mm of mean annual precipitation at higher elevations sustains a considerable seasonal snowpack.The scale of this study was constrained to watersheds larger than 150 000 km 2 , the optimal minimum geographic limit of GRACE data (Yeh et al., 2006;Landerer and Swenson, 2012).

Methods and data
We used 108 months of GRACE and streamflow data over nine water years (WY; October-September;2004-2012).These data comprises positive, neutral, and negative phases of the El Niño-Southern Oscillation and negative and positive phases of the Pacific Decadal Oscillation (Feng et al., 2014;Iizumi et al., 2014).As a result, the data provide years of above-and below-average precipitation, snowpack, and streamflow for the region.The three watersheds were delineated upstream from United States Geological Survey (USGS) stream gages at 1 • resolution, which is the resolution of GRACE data.In the CRB, these grid cells represent a dimension of approximately 80 by 120 km.The Upper Columbia consists of the area upstream of the Columbia River at the International Boundary gage (USGS 12399500), just downstream of the confluence of the Columbia and Pend-Oreille Rivers.The Pend-Oreille is a major watershed in the upper portions of the CRB.The Snake River gage at Weiser (USGS 13269000) provides gauged streamflow data above Hell's Canyon Reservoir, the largest impoundment in the Snake River basin.The USGS gage at The Dalles (USGS 14105700) provides the most downstream streamflow data for the CRB.Monthly mean runoff (R; mm) was calculated for each of the three gages using the USGS streamflow data.
Measurements of TWSA were obtained from the GRACE RL-05 (Swenson and Wahr, 2006;Landerer and Swenson, 2012) data set from NASA's Tellus website (http://grace.jpl.nasa.gov).The errors present in the gridded GRACE data exist primarily as a result of truncation (i.e., a low number of harmonics) in the spherical harmonic solution, and smoothing and systematic noise removal (called "de-striping") that is applied after GRACE level-2 processing to remove spatially correlated noise (called "stripes"; Swenson and Wahr, 2006).This smoothing tends to smear adjacent signals together (within the radius of the filtering function), resulting in smaller signals being lost, and larger signals having a coarser footprint and a loss of spatial information.
To restore the GRACE signal lost during processing, the data were scaled using 1 • Land-Grid Scale Factors produced by putting a 1 • land surface model through identical processing (truncation and filtering) as the GRACE solutions, then measuring the decrease in the signal amplitude at each 1 • grid.These procedures are described on the Tellus website and detailed in Landerer and Swenson (2012).Monthly 1 • GRACE estimates of TWSA, and the associated 1 • leakage and measurement errors, were spatially averaged over each of the three study watersheds following the procedures described in the Tellus website.
GRACE represents monthly storage anomalies relative to an arbitrary record-length mean value, analogous to the amount of water above or below the long-term mean storage of a bucket, and should balance with the equation: where all components are at monthly time steps; GW represents groundwater, SM represents soil moisture (from 0-2000 mm depth), SWE represents snow water equivalent (the equivalent depth of water held in snowpack), and RES represents reservoir storage.The used here represents the anomaly from the study-period mean, rather than a monthly change.To isolate monthly groundwater storage anomalies ( GW = GWSA) in the above equation, SM, SWE and RES estimates were subtracted from the monthly TWSA data using methods described in Famiglietti et al. (2011).Similarly, the combined signal of water storage anomalies of subsurface moisture (TWSA sub ), SM and GW, was isolated by subtracting SWE and RES from TWSA values.
Monthly SM values over the study basins were obtained from the mean of the North American and Global Land Data Assimilation Systems (NLDAS at 1/8 • resolution (Cosgrove et al., 2003) and GLDAS at 1/4 • resolution (Rodell et al., 2004), respectively), and were spatially averaged over the three study watersheds.Monthly 1 km resolution SWE values were obtained from the mean of NLDAS and Snow Data Assimilation System (SNODAS; National Operational Hydrologic Remote Sensing Center, 2004) and were spatially averaged over the three watersheds.SNODAS data were used in place of the GLDAS data product, which considerably underestimated SWE in mountainous areas when compared to point-based measurements.Changes in monthly reservoir storage were calculated for the five largest reservoirs in the CRB (see Table A1).Other smaller reservoirs in the CRB were excluded when it was determined that fluctuations in their levels were below the detection limits of GRACE.
Like all measurements, estimates of TWSA from GRACE contain error.For all of the study basins, the range of error is well below the TWSA signal strength, approximately an order of magnitude below the annual amplitude (200-300 mm) of the TWSA signal in the CRB.The basin-averaged TWSA errors (time invariant) for the three study basins are 37 mm (Upper Columbia), 22 mm (Snake), and 25 mm (The Dalles).
The model data from LDAS and SNODAS simulations are driven by in situ measurements, and represents the best available data for broad scales.We address any structural error from an individual model by using an ensemble of outputs.Calculation of the error in individual terms followed standard methodologies (Famiglietti et al., 2011), where error in SM is the mean monthly standard deviation, and standard errors for SWE and RES are 15 % of mean absolute changes.GWSA and TWSA sub anomaly errors are calculated as the sum of basin-averaged errors (added as variance) in the individual terms in each respective calculation (Eq.1), including the error in TWSA (Swenson et al., 2006).The basinaveraged error variance for GWSA (time invariant) in the three study basins are 45 mm (Upper Columbia), 26 (Snake), and 33 mm (The Dalles).For TWSA sub these values are 37 (Upper Columbia), 22 (Snake), and 25 mm (The Dalles).The individual error components (SM, SWE, RES respectively) for each basin are Upper Columbia (24, 6, 0.01 mm), Snake (14, 3, 0.01 mm), and The Dalles (21, 4, 0.01mm).Note that these error estimates are distributed across an entire regional watershed and do not represent the error at individual monitoring sites.A time series of these values and basin-averaged errors is provided in Fig. 2.
Based on an approach similar to Reager et al. (2014) and Riegger and Tourian (2014), we plotted the temporal relationship between TWSA and R to examine hysteresis relationships in all three of the study watersheds for each individual water year and for the monthly mean across all water years.Expanding from the integrated terrestrial component of water storage, we also plotted the relationships of TWSA sub and GWSA with R. We examined the branches of these hysteresis plots to better understand how the size, shape, and direction of the hystereses varied across years in each of the three regional watersheds.
In order to verify groundwater hysteresis, we compared the GRACE-derived GWSA to groundwater depths from well measurements at 33 sites throughout the study region (Fig. 1 and Table A2).These data were normalized by their standard deviation, and the mean of the 33 wells was calculated.The standard deviation of the GRACE-derived GWSA for The Dalles was normalized to provide a direct comparison of GWSA and in situ measurements.
We further hypothesized that because peak SWE accumulation occurs between February and April, that TWSA for these months could be used to predict R for an individual month and the cumulative seasonal runoff (R season ) that occurs after peak SWE accumulation.To test this prognostic hypothesis we used a two-parameter power function (The MathWorks, 2013): where R predicted is runoff for the predicted time interval; TWSA month represents terrestrial water storage for an individual month, and a, b, and c are fitted parameters from the power function.
We tested this relationship for TWSA in February, March and April to predict R season (April-September) and for the individual months of July (R July ), August (R Aug ), and September (R Sep ); these represent the lower-flow months when demand is near its peak.Additionally, we tested and compared the modeled-values of SWE from NLDAS and SNODAS and SM from NLDAS and GLDAS, and the model-derived values of TWSA sub to predict R season and for the individual months using the same power-function analysis.
Because our data set was constrained to nine water years, we used a double-pass approach to fit and test the empirical relationship between S −R.This approach allowed us double our data inputs for calculating standard hydrologic evaluation metrics such as Root Mean Square Error (RMSE), standard Nash-Sutcliffe Efficiency (NSE) and Coefficient of Determination (R 2 ; Legates and McCabe, 1999;Nash and Sutcliffe, 1970).The 9 years were divided into two sets (Set 1, even years 2004-2012; Set 2, odd years 2005-2011).The first pass calculated the power function of S − R to Set 1, and the parameters were then tested against Set 2. The roles of the data sets were then reversed and the data sets were again tested against each other.The empirical model results from Sets 1 and 2 were then combined into a single set of solutions for the model fit and tested against measured values to calculate RMSE, NSE, and R 2 .In order to maximize the limited data inputs, once we tested the two independent sets for model performance, we combined the input data values into a single set for all 9 years to calculate a single power function curve.The observed data were tested against the simulated data from the complete, but limited data record.The final model curve was fit to these data, and the evaluation metrics include all of the years for each respective data set.

Storage-runoff hysteresis
The filling and emptying of the study basins at the regionalscale over the course of an individual WY results in a hysteretic relationship between storage and runoff (Fig. 3a).The hysteresis loops begin at the onset of the wet season in October, with TWSA increasing (Figs.3a, 4a-c) as precipitation is stored as snow and soil moisture.An increase in storage that is not offset by an increase in discharge indicates a predominance of snow inputs and the freezing of soil water.The lower branch of the hysteresis plot (storage increase unmatched by runoff) can be used to estimate cumulative snow water equivalent and soil moisture in the basin.This is the water that later contributes to streamflow and groundwater recharge in the spring.
The hysteresis shifts direction from February-April (inflection 1, Fig. 3a), and represents each watershed's transition from storage to release.This response is evident (Figs.3a, 4a-c, and 5), as each hysteresis loop contains a vertical branch of the curve during which storage is relatively constant, but streamflow increases rapidly.This timing also represents the groundwater recharge branch of the loop.As snow melts and the ground thaws, runoff is generated, recharge into soils occurs, and basins tend to be at peak storage during this branch.Storage losses and additional precipitation inputs during this period are re-organized internally.A second shift (inflection 2, Fig. 3a) occurs from April-June when peak TWSA begins to decrease, representing spring snowmelt and a switch from precipitation that falls primarily as snow to rain; these combine to contribute to peak R.
Once peak R values are reached, the loop shifts direction a third time (inflection 3, Fig. 3a), receding on both axes as contributions from snowmelt diminish while presumably groundwater sustains streams and provides a source for irrigated agriculture.During this period, the relationship between TWSA and discharge is more linear, corresponding to baseflow-driven runoff processes in which each monthly change in storage causes a proportional change in the generation of streamflow.
The hysteresis plots of TWSA−R for an individual water year demonstrate that the timing and quantity of precipitation governs the size of a hysteresis loop for an individual WY (Figs. 3a,5).For instance, wet years (e.g., 2008) have bigger loops, while dry years (e.g., 2005) are more compressed along both axes.However, the general shape of the loops is distinct for each basin.Plotting multiple WYs provides a family of curves for each basin that helps describe how climate, topography, and geology govern the timing and magnitude of the relationship between TWSA and R (Figs. 3a, 5).

Subsurface water (TWSA sub ) -runoff hysteresis
The TWSA sub hysteresis curve contracts horizontally when the snow signal is removed from TWSA values for both the Upper Columbia and The Dalles (Figs. 3b, 4d-f), which collapses the loops and takes a form similar to a plot-scale hysteresis of soil.In the initial stages of the WY, the direction of the curve changes directions 2-3 times along the TWSA axis.Similar hysteresis fluctuations have also been documented at the more local scale as the soil profile moves towards saturation (Penna et al., 2011).Peak TWSA sub occurs in June, which corresponds to the spring melt of mountain snowpack and the end of the wet season (Figs.4d-f).In contrast to the near linear relationship between TWSA sub and R in the Upper Columbia and The Dalles, the Snake River retains a more complex relationship.In this watershed the hysteresis curve still retains a loop, but the timing of maximum TWSA sub is also earlier, reaching its peak during March and April (Fig. 4e).It is also noteworthy that in the Snake River the TWSA sub − R hysteresis loop temporally progresses in the opposite direction, but stays in phase with precipitation inputs.

Groundwater-runoff hysteresis
The hysteresis loops describing the temporal relationship between GWSA and R are equally informative, with one distinct difference -they temporally progress in opposite directions of the hysteresis loops of TWSA and R (Fig. 3).For all three watersheds, GWSA decreases from October-February/March (Fig. 4h-j), which is out of phase with the onset of the wet season.GWSA does not shift towards positive gains until early spring and the initial stages of melt before reaching its maximum in June.From June-September, GWSA decreases minimally across all years during the runoff recession limb, indicating groundwater contributions to streamflow.This decreasing GWSA signal does not stand out in Fig. 2, as during WY 2011 the GSWA increased from 12.8 in June to 71.2 mm in September due a large snowpack that melted several months later than normal.This considerable anomaly muted the overall GWSA recession from June-September that is found in all other water years (supplementary data and the interactive visualizations described subsequently).
The  year, and the mean of the normalized standard deviations of well levels was close to zero for all months.The temporal variability for the well data provides no discernable correlation with the derived GWSA signal (Fig. A1).In contrast to the rapid response of the Upper Columbia, the Snake River receives ∼ 60 % less annual precipitation, but has an annual TWSA range that is only 22 % less (median annual range = 192 mm; R =7 mm; Figs. 4, 5, and 6).However, the TWSA hysteresis loops for the Snake River are collapsed vertically (Fig. 4b).In the more arid Snake River, removing the snow signal does not collapse the TWSA sub −R hysteresis loops (TWSA sub = 89 mm).Similarly, the GWSA loops suggest that subsurface moisture plays a more prominent role in the Snake River.
The climate, topography, and geology of the Columbia River at The Dalles are an integration of the Upper Columbia and Snake River, seen in the shape of the hysteresis loops (Figs. 4,5,6; median annual range TWSA = 195 mm; R = 27 mm).The period from February-June more closely resembles the Snake River basin, with gradual increases in TWSA and sharp increases in R. The slope of the recession from June-September has the same general shape for The Dalles as the Upper Columbia (Figs. 4a, c), presumably from snowmelt-generated runoff.

Streamflow forecasting
We next present how TWSA was applied prognostically to predict streamflow.Using the double-pass calibration and validation approach, TWSA Mar provided the best overall predictive capabilities for R season with a mean NSE (NSE) and mean R 2 (R 2 ) of 0.75 and 0.91, respectfully (Fig. 7a, Table 1), for all three basins.The Dalles had the highest NSE and R 2 , and lowest RMSE values (0.98, 0.98, 6 mm; Table 1).The results in the Upper Columbia were also robust (0.82, 0.86, 33 mm; Table 1), while the Snake River performed with less skill (0.46, 0.59, and 14 mm, Table 1).Applying TWSA April also provided similar results, but with a lower degree of skill in predicting R (NSE = 0.57, R 2 = 0.69).TWSA Apr provided improved predicted capabilities in the Upper Columbia (0.87, 0.88, and 28 mm, Table 1), but inferior results in the other two watersheds.TWSA Feb had a low degree of skill in predicting R in all three watersheds (Table A3).
TWSA Mar and TWSA April also served as a good predictor of monthly runoff in July and August for the Upper Columbia and to a lesser degree in The Dalles (Tables 1 and A3).In the Snake River, TWSA did not serve as a good predictor for R in an individual month.Snowpack and soil moisture play a considerable role in the hydrology of the CRB and are commonly used to help predict water demand and availability later in the year (Koster et al., 2010).We compared the capabilities of the modeled snow (SWE) and soil moisture (SM) products to predict R to the  1).In the Upper Columbia and The Dalles, TWSA Mar predicts seasonal and monthly runoff (July and August) with considerably more skill than SWE or SM (Fig. 7, Table 1).In the Snake River, SM Mar has a higher degree of skill than TWSA Mar in predicting R season and R Aug .SWE Mar provided inferior results in all three watersheds, but with some predictive skill in the Upper Columbia and The Dalles (NSE of 0.24 and 0.46 respectively, Table 1).In all three watersheds, TWSA sub provided extremely poor predictions (Tables 1 and A3).
When the results of the empirical model using two independent sets of data proved robust for some of the storage metrics, the observed data were tested against the simulated data from the complete, but limited data record.The performance of the empirical model improved using the complete data set (Tables 2 and A4), with the same general results.TWSA Mar provided the best model fit for seasonal runoff in the Upper Columbia (NSE = 0.93, RMSE = 19.8mm) and The Dalles (NSE = 0.98, RMSE = 5.7 mm).In the Snake River, predictive capabilities improved more dramatically (NSE = 0.83, RMSE = 7.4 mm), but soil moisture still served as a better predictor of seasonal streamflow (NSE = 0.93, RMSE = 5.2 mm).Similarly, TWSA Mar provided the best model fit for runoff in August, one of the drier months when demand is at its peak (Tables 2 and A4).

Storage-runoff hysteresis
Decades of data collection and monitoring at individual gage sites indicate that watersheds collect, store and release water.Using one integrated measurement from the GRACE satellites, our results show these same processes at the regional scale in the hysteresis loops of storage (TWSA) and runoff (R).While hysteretic processes have previously been identified in local-scale measurements (McDonnell, 2003;McGlynn and McDonnell, 2003), only recently has streamflowstorage hysteresis been identified at the regional scale (Riegger and Tourian, 2014).
Our work builds on Riegger and Tourian's (2014) results, and employs GRACE data to describe how regional watersheds function as integrated, non-linear systems governed by climate, topography, and geology.Climate controls the size of the hysteresis loops by providing a first-order control on hydrologic inputs and the storage of solid water, which in turn governs the ranges of TWSA and R.However, runoff response to precipitation and snowmelt does not act independently from topography and geology (Jefferson et al., 2008;Tague et al., 2008), which controls how liquid water is stored and routed through a watershed, even at the regional scale.The climatic, topographic, and geological characteristics of each watershed provide an explanation of the S − R relationship that govern the shape and size of its respective hystere- sis curve.GRACE offers a single, integrated measurement of changes in water storage through and across a watershed that can be applied to predict regional streamflow using an empirical model.Where these predictive capabilities succeed and fail help better describe the climatic, topographic, and geological characteristics in each watershed.For example, in the Upper Columbia, steep topography and wet climate fills subsurface storage quickly before reaching a threshold in April or May.This transition represents the watershed's transition from winter storage to spring runoff.After this watershed-scale threshold is reached, the steep topography moves snowmelt and rain quickly through the terrestrial system and into the river channel until cresting in June (Figs. 4,5,and 6), followed by declines in TWSA and R from June-September.These large fluxes of water create a more open hysteresis loop, expanding non-linearly on both the horizontal and vertical axes.
The Upper Columbia also has the broadest range of annual TWSA sub and GWSA during the study period (Figs. 5  and 6), despite having limited aquifer capacity.Conceptually, this demonstrates that the upper limit of storage is greater than in the Snake River or The Dalles, but that it also loses the most water.Its minimums at the end of the WY are also the lowest (median TWSA Sep = −98 mm; Figs. 5 and 6).This range across TWSA, TWSA sub , and GWSA supports the conceptual model that the watershed fills during the wet season, and is then drained more quickly due to steep topography and limited water storage.The predictive capability of TWSA also strongly suggests that the components and temporal relationships of storage across this watershed are interconnected, and that incorporating April snowpack improves the model results.
In contrast, the arid Snake River basin provides a very different family of hysteresis curves (Figs. 4, 5) that identify groundwater and soil moisture as primary components of watershed function.The curves are compressed vertically (R) as compared to the Upper Columbia, and are more constrained horizontally (Fig. 6).The onset of spring melt runoff in February does not deplete TWSA for the Snake River.Instead, TWSA continues to increase until May, when peak runoff occurs.As TWSA decreases to the end of the water year in September, the median TWSA Sep measurement (−78 mm) is 20 mm greater than in the Upper Columbia.This indicates that the lower drainage threshold of the Snake River watershed is relatively greater than the Upper Columbia, potentially explained by a less severe topography and higher aquifer capacity.
The TWSA sub hysteresis curves in the Snake River retain a similar shape to the TWSA signal.While they reverse direction they do stay temporally connected to the onset of the wet season in October, indicating that subsurface moisture is a central control on the filling of the watershed through May.The capabilities of SM to empirically predict R better than TWSA further highlight the importance of subsurface water in this watershed.The intra-annual range of GWSA in the Snake River is also more limited than in the more hydrologically responsive Upper Columbia.This more limited range of data supports the conceptual model of a watershed that retains comparatively more winter precipitation in soils and aquifers throughout the spring season, and that sustains flow later in the year and until the onset of melt the following winter.
The greater Columbia River Basin upstream from The Dalles integrates the climatic, topographic, and geologic characteristics of the Snake River and Upper Columbia as well as other areas within the CRB.The western slope of the Cascades (Fig. 1), which is outside of the Upper Columbia, accumulates up to several meters of SWE each winter.Due east of the Cascades, a broad basalt plain that provides aquifer storage helps dampen the snowmelt pulse in the spring.The hysteresis loops for The Dalles reflect these combined characteristics.
Storage at The Dalles increases along the horizontal axis (TWSA) until peak storage is reached in March or April (Figs. 3,4,and 5).This TWSA threshold of approximately 100 mm responds with an increase in R that continues through June.In July, the hysteresis begins to recede along both axes closing out the loop.GWSA has the most limited range, potentially explained by the extensive basalt aquifer moderating the relationship between storage and runoff.In The Dalles, TWSA Sep has a median value of −88 mm (Fig. 6), between the lower drainage thresholds of the Upper Columbia and Snake River watersheds; indicating an integration of the contributing climate, topography, and geology.

Distinguishing the difference between TWSA sub and GWSA
Conceptually TWSA sub represents changes in the amount of water stored as soil moisture and groundwater, where as GWSA represents water changes greater than 2000 mm below the soil surface.The goals of evaluating these metrics were to see if monthly changes in soil moisture were linked to changes in groundwater storage, and the role of snowpack in the S − R relationship.The TWSA sub hysteresis curves in the Upper Columbia and The Dalles collapse into a more linear relationship that is more commonly associated with the S − R relationship of a soil matrix (Fig. 3 and 4).These stand in contrast to the Snake River where the TWSA sub −R hystereses retain a loop, indicating a more complex relationship between storage and runoff.The hydrological processes that create these differences warrant investigation, but lie outside the scope of this study.
Although the annual fluctuations of SM are similar in all three basins (Fig. 2), its impact in the Upper Columbia is more pronounced.This watershed has poor groundwater storage, and relies on soils to provide seasonal storage of rain and snowmelt.In the Upper Columbia once the SM signal is removed, the intra-annual range of GWSA shifts to considerably lower values (Fig. 6).Shifts of similar magnitudes in GWSA are not found in the Snake River and The Dalles.These watersheds have excellent groundwater storage and are less reliant on soils to provide seasonal storage of rain and snowmelt.Fluctuations in reservoir levels are minimal with regards to the water fluxes in this region, and have minimal impact on calculations of GWSA.
The GWSA−R hystereses are represented by loops that show an out-of-phase relationship between precipitation and groundwater recharge from the start of the wet season in October until February or March.The TWSA sub and GWSA hysteresis plots demonstrate that in the Upper Columbia and The Dalles changes in monthly soil moisture are not always temporally aligned with GWSA.This can be explained by the physical reality that soil moisture and groundwater are not always interconnected, and that there is not a fixed depth (i.e., 2000 mm) that separates the two components of water storage.
GRACE-derived calculations of GWSA also provide insights into the hydrological processes governing groundwater recharge and depletion, as evidenced in the GWSA hysteresis loops.The GWSA−R curves show an out-of-phase relationship between precipitation and groundwater recharge from the start of the wet season in October until February or March.
This response in all of the GWSA−R hystereses suggests that even at the watershed scale groundwater recharge requires soils and geologic materials to fill to a certain moisture threshold and for the onset of snowmelt (Figs. 3a,5, and web-based interactive visualizations https://public.tableau.com/profile/sprolese#!/vizhome/ GRACE_hystereses/WSADash). This also suggests that snowmelt inputs to groundwater are considerable.In the CRB this is critical as current climate trends are projected to reduce snowpack accumulation and exacerbate melt in the region (Wu et al., 2012;Rupp et al., 2013;Sproles et al., 2013).
Additionally, our analysis identifies summer as the time of peak groundwater storage in all three regional watersheds.This finding is of value for groundwater management and policy decisions, as peak groundwater levels in June correspond to the timing of groundwater pump tests that are used to develop groundwater withdrawal regulations (Jarvis, 2011(Jarvis, , 2014)).Our data suggest that groundwater pump tests should not be limited to an individual month, and should also include periods of reduced storage particularly during the winter months.The inclusion of multiple pump tests throughout the year could be particularly relevant as the population and water demand is projected to increase in the region.
The point-specific well data are not conclusive and show considerable variability with no consistent pattern regarding the timing of recharge and peak groundwater levels.This is presumably a function of how site characteristics (i.e., usage, depth, location, elevation) are extremely variable across a region.Rather than excluding these results or selecting individual wells that match GRACE data, we discuss the results from all 33 wells to help demonstrate the high variability that exists from well to well, and that measurements of groundwater changes at a fixed location do not represent watershedscale characteristics (Jarvis, 2011(Jarvis, , 2014)).The disconnect between sites also highlights the concept brought forward by Spence (2010), that storage is not uniform across a watershed, and functions as a series of discontinuous processes at the watershed scale.

Applying the S − R relationship as a predictive tool
We applied these climatic, topographic, and geologic insights to develop and test the hypothesis that spring TWSA could predict R later in the year, based on two observations: first, the shapes of the hysteresis curves for each basin are similar (Figs.4a-c, 5), but vary by magnitude of annual TWSA.Second, peak TWSA occurs before the peak runoff.We show that the integrated GRACE signal is a good baseline measurement to empirically predict seasonal streamflow across a range of water years with regards to precipitation and streamflow.In essence, our data suggest that the water stored across and through the Columbia River Basin in March describes the water available for the remainder of the water year.
In the CRB and in the northwestern United States, peak snowpack occurs in March or April, and is commonly used as a metric for predicting spring runoff.Despite the importance of snowpack to the hydrologic cycle of the region, measurements of TWSA Mar from GRACE provide a better prediction of R season , R July , and R Aug than model-derived estimates of snowpack.GRACE TWSA Mar also provided a better prediction for runoff than soil moisture, except for the Snake River watershed.There SM Mar provided a better indicator of runoff for the rest of the year.TWSA Feb provided inferior predictive capacity, as the annual maximum TWSA values have not been reached.These results are promising with regards to using GRACE as a predictive tool for water resources in both wet and dry years.Our limited data record represents a wide-range of conditions with regards to climate and streamflow, which is captured in our empirical models and is shown in the box plots to the right of Figs.7a-b.These same results also indicate that R is insensitive to TWSA Mar values below 100 mm in the Upper Columbia and at The Dalles and below 85 mm in the Snake River watershed.This limitation is not the empirical model, which works well, but that the basins have a limited response (i.e., R season in the Snake River only changes about 60 mm, from around 20 to 85 mm).Given these responses the model works, and provides a lower threshold that describes with some certainty the amount of runoff that will be available for operations for the remainder of the year.
We recognize that all three of these regional watersheds are managed through a series of dams and reservoirs that create an altered runoff signal.Water resources managers use point-specific and model-based estimates of water storage in the region to optimize their operations for the water year.Additionally, in the fertile plains of the Snake River and lower CRB, broad-scale agriculture relies on both ground-and surface water for irrigation.Water withdrawals would be implicit in the TWSA signal and reduce R.However, a more detailed analysis of withdrawals lies outside the scope of this study.
Regardless of the length of record or anthropogenic influence, climate, topography, and geology still provide the firstorder controls on water storage that are found in the hysteresis loops.GRACE encapsulates these hydrologic processes through measurements of TWSA.The hysteresis loops expand and contract accordingly during wet and dry years, as the intra-annual relationship between TWSA and Q represents the fluxes of water into and out of the watershed.Despite intra-annual differences, a family of hysteresis curves can describe each of the sub-regional watersheds.The predicative capability using TWSA, the vertical sum of water, as compared to snowpack and soil moisture further highlights the integrated nature of water storage in regional hydrology.These predictive capabilities highlight the potential of GRACE to improve upon seasonal forecast predictions and regional hydrological models.

GRACE as an analysis tool for regional watersheds
Where previous approaches to modeling watershed behavior have focused on separate storage compartments, new approaches should include the magnitude and direction of hysteresis (Spence, 2010).This integrated approach would provide new ways forward to classify watersheds not only by runoff, but also on the first-order controls that govern the non-linear hydrological processes.
Even though GRACE is somewhat of a blunt instrument with regards to temporal (monthly) and spatial (1 • ) resolution, this emerging technology provides a new dimension to regional watershed analysis by providing an integrated measurement of water stored across and through the Earth.These measurements continue to prove their value in retrospective analysis of regional hydrology (Rodell et al., 2009;Castle et al., 2014).However, the hysteresis loops presented by Riegger and Tourian (2014) and further developed in this paper demonstrate the ability of GRACE data to help develop a process-based understanding of how regional watersheds function as simple, dynamic systems.As the temporal record of GRACE continues to increase, its value as both a diagnostic and predictive tool will continue to grow.In the mean time, these data have value in augmenting existing management strategies.
Perhaps one of the most important facets of GRACE data is that it does not distinguish political boundaries.It is not linked to a specific in situ monitoring agency with limited data access and has the capacity to bridge sparse and inconsistent on-the-ground hydrologic monitoring networks that exist in many regions of the world.Previous GRACEbased analysis has shown its value in highlighting negative trends in terrestrial water storage in trans-boundary watersheds (Voss et al., 2013;Castle et al., 2014), and resulting regional conflict exacerbated by water shortages (Gleick, 2014b).GRACE provides an objective measurement of a region's water resources that can provide valuable insights into potential shortages or surpluses of water resources, and simple empirical predictions of seasonal and monthly runoff that are easily deployable in places with limited data.

Conclusions
We have shown how GRACE-based measurements of TWSA distill the complexity of regional hydrology into a simple, dynamic system.TWSA and derived estimates of GWSA reveal hysteretic behavior for regional watersheds, which is more commonly associated with hydrologic measurements at local scales.While the magnitude of the hysteresis curves vary across years, they retain the same general shape that is unique to each watershed.We demonstrated the utility of these hysteresis curves by showing how the complete TWSA record during March and April can be used to empirically predict R for the remainder for the water year (TWSA Mar , mean NSE = 0.91) and during the drier summer months (TWSA Mar , mean NSE for July = 0.76, August = 0.72; Tables 1 and 2).
Because GRACE TWSA can augment prediction, managers could start to interpret each year's hysteresis curve for the upcoming spring and summer, providing greater clarity and validation for model-based forecasts presently used by water resource managers.Our results demonstrate a way forward, expanding GRACE from a diagnostic tool, into a conceptual model and predictive resource.
Although this study focused on the CRB, which has a rich data record, GRACE data are available at a global scale and could be readily applied in areas with a paucity of data to understand how watersheds function and to improve streamflow forecasting capabilities.GRACE does not discern political boundaries and provides an integrated approach to understanding international watersheds (Voss et al., 2013).This resource could serve as a valuable tool for managers in forecasting surplus and scarcity, and in developing strategies that include changes in supply and demand due to human consumptive needs and current climate trends (Wagener et al., 2010;Gleick, 2014a).

Figure 1 .
Figure 1.Context map and descriptions of the three study watersheds and the locations of the groundwater wells used in the study.

Figure 2 .
Figure 2. Monthly storage anomalies for Runoff, TWSA, and the subcomponents of terrestrial water for the three watersheds.Standard errors and error variance for hydrological component are noted in each sub-figure and represented by the blue shading.All units on the vertical axes are on the same scale and in mm.Note the different vertical scales for Runoff and RES.Horizontal axes are in months and on the same scale.
Figure 3. (a-c) Annotated hysteresis curves of terrestrial water storage anomalies (a), the subsurface water storage anomalies (TWSA sub ; b), and groundwater storage anomalies (c) based upon the 9-year mean for the Columbia River at The Dalles.These curves describe the fluxes of water moving through the watershed.

Figure 4 .
Figure 4. (a-f) Individual hysteresis curves for the three study watersheds for TWSA (a-c), TWSA sub (d-f), and GWSA (h-j).TWSA sub in the Upper Columbia and The Dalles collapses to represent a shape more commonly associated with the hysteresis of a soil matrix.The Snake River retains a similar looping shape.The grey areas in the TWSA sub and GWSA plots provide a visual reference of the TWSA error variance for each watershed.The low topography and high storage capacity of the Snake aquifer provides a consistent groundwater signal, as compared to the limited aquifer of the Upper Columbia, which fills and drains quickly.Note the different scales on the y axes.

Figure 5 .
Figure 5. Plots of the hysteresis curves for TWSA in each of the three study watersheds across all 9 water years.For visual clarity, each plot contains 3 water years and the 9-year mean.Note the different scales on the y-axes for each basin.

Figure 6 .
Figure 6.The intra-annual range of TWSA, TWSA sub GWSA, and R for the 9 water years of the study period.
Figure 7. (a-b) Measurements of terrestrial water storage anomalies in March (TWSA Mar ) effectively predict the cumulative runoff for April-September (R season ; (a) and help describe how these three regional watersheds function as simple non-linear systems.TWSA Mar also predicts mean runoff for August (R Aug ; (b) one of the driest months of the year when demand for water is at its peak.The hashed lines represent the 95 % confidence intervals.The box plots to the right of each plot represent the range of R for the respective watershed from WY's 1969-2012.Note the semi-log y axis on (b).For complete results and parameters from the empirical model please refer to Tables 1, 2, 3, A3, and A4.

Table 1 .
Comparison of performance metrics using the dual-pass approach to apply GRACE TWSA data, model-derived snow (SWE), and soil moisture (SM) products in predicting seasonal (R season ) and August (R Aug ) runoff by watershed.Average values for the three basins are also provided.RMSE values are in mm.Complete results can be found in Appendix TableA3.
skill of measured GRACE TWSA data (Table

Table 2 .
Comparison of performance metrics from applying all 9 water years of GRACE TWSA data, model-derived snow (SWE), and soil moisture (SM) products in predicting seasonal (R season ) and August (R Aug ) runoff by watershed.Average values for the three basins are also provided.RMSE values are in mm.R 2 values are the same as NSE for this linear regression.Complete results can be found in Appendix TableA4.

Table 3 .
Parameters from the power function curves in each of the three watersheds using TWSA to predict streamflow.Figure7provides these results visually.

Table A3 .
Comparison of performance metrics using the dual-pass approach to apply GRACE TWSA, model-derived snow (SWE), soil moisture (SM), and subsurface (TWSA sub ) data in predicting seasonal (R season ) and August (R Aug ) runoff by watershed.RMSE values are in mm.

Table A4 .
Comparison of performance metrics from applying all 9 water years of GRACE TWSA, model-derived snow (SWE), soil moisture (SM), and subsurface (TWSA sub ) data in predicting seasonal (R season ) and August (R Aug ) runoff by watershed.RMSE values are in mm.R 2 values are the same as NSE for this linear regression.