On the similarity of hillslope hydrologic function: a clustering approach based on groundwater changes

. Hillslope similarity is an active topic in hydrology because of its importance in improving our understanding of hydrologic processes and enabling comparisons and paired studies. In this study, we propose a holistic bottom-up hillslope clustering based on a region’s integrative hydrodynamic response quantiﬁed by the seasonal changes in groundwater levels (cid:49)P . The main advantage of the (cid:49)P clustering is its ability to capture recharge and discharge processes. We test the performance of the (cid:49)P clustering by comparing it to seven other common hillslope clustering approaches. These include clustering approaches based on the aridity index, topographic wetness index, elevation, land cover, and machine-learning that jointly integrate multiple data. We assess the ability of these clustering approaches to identify and categorize hillslopes with similar static characteristics, hydroclimate, land surface processes, and subsurface dynamics in a mountainous watershed – the East River – located in the headwaters of the Upper Colorado River Basin. The (cid:49)P clustering performs very well in identifying hillslopes with six out of the nine characteristics studied. The variability among clusters as quantiﬁed by the coefﬁcient of variation (0.2) is less in the (cid:49)P and the machine learning approaches than in the others ( > 0.3 for TWI, elevation, and land cover). We further demonstrate the robustness of the (cid:49)P clustering by testing its ability to predict hillslope responses to wet and dry hydrologic conditions, of which it performs well when based on average conditions.


Introduction
The ability to delineate areas into spatially defined regions for their use in characterizing hydrologic flow and transport behavior is important for several reasons, including the assessment, monitoring, and modeling of water quantity and quality. Hillslopes are the scale at which hydrologic flow and transport processes can be tractably and frequently measured. It is also the scale at which flow and travel time are quantified and the instrumentation, conceptualization, and modeling of hydrologic processes occur (Fan et al., 2019. While advancements have been made in the general understanding of hillslope dynamics over the last several decades, there is yet to be a globally agreed-upon classification and/or clustering for this important scale of interest in hydrology (McDonnell and Woods, 2004). Hydrologic signatures within hillslopes are the result of several simultaneous and nonlinear above-and below-ground processes. The uniqueness of a given location's characteristics (for example, the topography, geology, and vegetation) limits our ability to draw general hypotheses and to develop a similarity framework (Beven, 2000). Nevertheless, a classification is needed to provide guidance on catchments and hillslopes comparisons (McDonnell and Woods, 2004), paired studies (Andréassian et al., 2012;Bosch and Hewlett, 1982;Brown et al., 2005), and improve our understanding of the changes in hydrologic processes across the world. Further, hillslope similarity is potentially an important step toward developing reduced-order models and machine learning algorithms, where identifying regions based on their similar-ities can substantially reduce computational costs (Chaney et al., 2018). The scaling of hillslope to catchment classifications can also be useful in the prediction of hydrologic behavior in ungauged basins (Sivapalan et al., 2003), an exceedingly important challenge.
Hillslope similarity clustering approaches include the topographic wetness index (TWI) (Beven and Kirby, 1979), which was proposed to quantify the topographic control on hydrology as topography plays a key role in the movement of water. Many other variants of this index have been later proposed to improve the definition of topographic similarity (Grabs et al., 2009;Hjerdt et al., 2004;Loritz et al., 2019). Other clustering approaches are based on hydroclimate , soil type and texture (Bormann, 2010), and land cover type (e.g., forest, urban; Wagener et al., 2007). These indices assume that hillslopes with similar topography and land cover will have similar hydrologic responses. However, given that hydrologic processes are governed by many characteristics of the hillslope, clustering approaches relying on multiple landscape characteristics have also been proposed (Aryal et al., 2002;Sawicz et al., 2011). These topdown clustering approaches assume that areas with similar physical characteristics will lead to similar hydrologic processes (Oudin et al., 2010). Other clustering approaches use a bottom-up approach, where similarity is based on the hydrologic process. This clustering allows the estimation of the "hidden" hillslope characteristics, such as soil texture and geology that may drive similar hydrologic responses . Among the process-based clustering approaches existing in the literature, we can cite the Péclet number characterizing the diffusive and advective transfer of water at a hillslope scale (Berne et al., 2005;Troch, 2007, 2010) and the catchment seasonal water balance (Berghuijs et al., 2014). Other authors have derived hillslope similarities from subsurface flow dynamics (Harman and Sivapalan, 2009).
One challenge in developing a similarity framework is the inherent heterogeneity of a given hillslope. For example, snow water equivalent (SWE), infiltration (I ), and actual evapotranspiration (ET) distributions can range over an order of magnitude within a single hillslope . Defining a single integrative measure that can capture this spatio-temporal variability is difficult. However, groundwater fluctuations are often tightly linked to seasonal changes in climate and have been shown to play an important role in surficial processes such as ET (Maina et al., 2022;Maina and Siirila-Woodburn, 2020;Maxwell and Condon, 2016). Thus, groundwater measures may serve as a good proxy for the aggregated hydrologic response. Groundwater dynamics could help overcome the issue of uniqueness of place because even if there are strong differences in the characteristics of the hillslope, the integrated response may be similar, as some of the processes might not be important. Finally, the implications of groundwater changes are also important. For example, many regions are characterized by groundwater-dependent ecosys-tems or are hypothesized to have water table fluctuations affecting bedrock weathering rates and therefore, the concentration and fluxes of metal and nutrient exports (e.g., Winnick et al., 2017).
In this study, we define a holistic bottom-up hillslope clustering using the integrative hydrologic response quantified by the seasonal changes in groundwater levels. A caveat to this clustering is that groundwater dynamics are difficult to quantify and their measurements are frequently scarce. Hence, there are very few studies that use this variable to develop a hillslope similarity classification (Aryal et al., 2002;Lyon and Troch, 2007). However, today, thanks to advances in integrated hydrologic modeling (Brunner and Simmons, 2012;Maxwell and Miller, 2005), accurate quantification of the groundwater dynamics at high resolution in both time and space and their interaction with the key land surface processes and features is now feasible. These models (e.g., HydroGeoSphere, Brunner and Simmons, 2012;ParFlow, Maxwell and Miller, 2005;Advanced Terrestrial Simulator, Coon et al., 2016) that can be constrained with ground observations and measurements at ultra-high resolutions through aerial or remote sensing (i.e., drones, planes, or satellites) account for the two-way interactions between groundwater and land surface processes. Spatially resolved hydrologic flow models also enable us to jointly quantify other hydrologic variables useful to identify hillslopes with similar hydrologic responses, namely, trends in ET, SWE, and I . Nevertheless, we acknowledge that groundwater dynamics in some regions such as arid areas could be disconnected from land surface processes and less dependent on many key physical features of the hillslope, which may impede the ability of the proposed clustering in these regions.
We test the proposed hillslope clustering on the site of the Department of Energy's (DOE) Watershed Function Scientific Focus Area (SFA) located in the headwaters of the Upper Colorado River Basin. The East River watershed is not only representative of many headwater catchments in the western United States in terms of its spatial heterogeneity of above and below-ground characteristics, but also serves as an important proxy of water quantity and quality trends, which ultimately impact a large population of water supply in the western United States for municipal, agriculture, and industrial use (Hubbard et al., 2018). We test the robustness of the proposed hillslope clustering by comparing it to seven other common hillslope clustering approaches based on the aridity index (AI), TWI, elevation, land cover, and machine-learning approaches that jointly integrate multiple input data layers, such as elevation, land cover, and geology, and model outputs including ET and SWE. We assess the ability of these clustering approaches to identify and categorize hillslopes with similar physical characteristics (land cover and elevation), hydroclimate (precipitation and temperature), land surface processes (ET and SWE), and subsurface dynamics (soil saturation, water table depth (WTD), and seasonal changes F. Z. Maina et al.: On the similarity of hillslope hydrologic function 3807 in groundwater). We aim to provide answers to the following questions: -What are the best clustering approaches for identifying hillslopes with similar hydrologic processes?
-Is a similarity index based on the seasonal groundwater variations sufficient to capture all the complex processes taking place at a hillslope scale? We use the integrated hydrologic model, ParFlow, which has the advantages of simulating the water and energy balance from the bedrock to the lower atmosphere and therefore, connects groundwater dynamics with land surface processes. ParFlow solves the subsurface flow using the threedimensional mixed form of the Richards equation (Richards, 1931), given by the following equation: where S S is the specific storage [L −1 ], S W (ψ P ) is the degree of saturation [-] associated with the subsurface pressure head ψ P [L], t is the time [T], φ is the porosity [-], k r is the relative permeability [-], z is the depth [L], q s is the source/sink term [T −1 ], and K(x) is the saturated hydraulic conductivity [L T −1 ], which is assumed to be a diagonal tensor with entries given as k x (x), k y (x), and k z (x). In this work, we assumed that the domain is isotropic and that the tensor is equal to 1 for all the three directions at each cell of the discretized model. In the unsaturated zone, both S W and k r depend on the ψ. The relationships between S W and k r and ψ are described by the van Genuchten model (van Genuchten, 1980). Overland flow (Eq. 2) is solved by the kinematic wave equation in two dimensions: where ψ 0 is the ponding depth, ||ψ 0 , 0|| indicates the greater term between ψ 0 and 0, υ is the depth averaged velocity vector of surface runoff [L T −1 ], and q r is a source/sink term representing rainfall and evaporative fluxes [L T −1 ]. Surface water velocity at the surface in x and y directions, (υ x ) and (υ y ), respectively, is computed using the following set of equations: where S f,x and S f,y friction slopes along x and y, respectively, and m is the Manning's coefficient. ParFlow employs a cell-centered finite difference scheme along with an implicit backward Euler scheme and the Newton-Krylov linearization method to solve these nonlinear equations. The computational grid follows the terrain to mimic the slope of the domain (Maxwell, 2013). ParFlow is coupled to the Community Land Model (CLM, Dai et al., 2003), which allows for the simulation of important land surface processes such as ET and SWE, and the quantification of water leaving or entering the surface and subsurface (q s and q r , respectively, in the Richards and kinematic wave equations). CLMs model the thermal processes by closing the energy balance at the land surface given by where R n is the net radiation at the land surface [W m −2 ], a balance between the shortwave and longwave radiation, LE is the latent heat flux, [W m −2 ] which captures the energy required to change the phase of water to or from vapor, H is the sensible heat flux [W m −2 ], and G is the ground heat flux [W m −2 ]. All terms are a function of θ , the water content, which is computed by ParFlow.
Computing the different components of the energy balance requires meteorological forcing, vegetative parameters, and soil moisture. The latter is computed by ParFlow using Eqs. (1) and (2). Meteorological forcing includes precipitation, temperature, east to west and north to south wind speed, longwave and shortwave solar radiation, air pressure, and relative humidity. Vegetative parameters include maximum and minimum leaf area index, stem area index, aerodynamic roughness height, optical properties, stomatal physiology, roughness length, and displacement height. More details about the coupling between ParFlow and CLM and the equations governing the snow dynamics and ET can be found in the following papers: Jefferson et al., (2015), Maxwell and Miller (2005), and Ryken et al. (2020). ParFlow-CLM has been used in many studies to understand the interactions between groundwater dynamics and lower atmosphere (Maina et al., 2022;Maina and Siirila-Woodburn, 2020) at different scales from watershed (Foster and Maxwell, 2019;Maina et al., 2020) to continental scale (Maxwell and Condon, 2016).

East River watershed model setup
The East River watershed (Fig. 1), located in the Upper Colorado Basin, is one of the two major tributaries that form the Gunnison River, which, in turn, accounts for just under half of the Colorado River's discharge at the Colorado-Utah border. The total area of this watershed is approximately 255 km 2 and the elevation varies from approximately 2700 to 3900 m. The watershed is characterized by strong heterogeneities in vegetation, geomorphology, and bedrock composition (Hubbard et al., 2018). The vegetation includes grasses, conifers, mixed conifers, aspens, and meadows, and lies on a complex geologic terrain, which is comprised of a diverse collection of Paleozoic and Mesozoic sedimentary and unconsolidated rocks. The watershed is also characterized by a strong hydroclimate gradient. The average precipitation is 1200 mm yr −1 , while the average temperature is around 0 • C. Because of its very low cold winter with temperature below 0 • C, most of the winter precipitation is in the form of snow.
ParFlow-CLM used here is based on a previous version of the East River watershed model, as described by Foster and Maxwell (2019). Five layers constitute the model in the vertical direction, with varying thickness from 0.1 m at the land surface to 21 m at the bottom of the domain. The land use and land cover are derived from the high-resolution airborne remote sensing NEON campaign Falco et al., 2020;Goulden et al., 2020). From the hyperspectral spectrometer and lidar readings, four major types of land cover are grouped as follows: forests (i.e., conifers and aspens), mixed forests, grasses, and bare soil. Parameterization of these different land cover types is derived from the International Geosphere-Biosphere Programme (IGBP) database (IGBP, 2018).
The subsurface of the study area is heterogeneous in both vertical and horizontal directions. The subsurface of the top 1 m corresponds to three soil layers as defined by the Soil Survey Geographic Database (SSURGO) database, and then corrected based on the land cover and geologic maps to in-clude the outcropping of the bedrock. Two main types of soil are distinguished within the area: sandy loam and clay loam. The geology of the subsurface between 1 and 8 m below the ground was defined with United States Geological Survey (USGS) maps, which were further improved by local knowledge by Pribulick et al., (2016). This subsurface region is highly heterogeneous with different formations such as crystalline, sedimentary rocks, unconsolidated rocks, alluvial deposits, and debris flow. The bottom layer of the domain (extending from 8 m below the ground surface to the bottom of the model) is assumed homogeneous and represents the fractured bedrock.
We simulated the water year (WY) of 2015, which was a relatively average WY in the region based on average precipitation and temperature patterns. The meteorological forcing of the model has a resolution of an hour and is derived from two gridded data sets: Parameter-elevation Regressions on Independent Slopes Model (PRISM) and North American Land Data Assimilation System (NLDAS). The PRISM data set (Daly et al., 2008) is used for precipitation and temperature because of its accuracy and high spatial resolution (800 m). However, the daily resolution of PRISM impedes its ability to be used to reproduce diurnal cycles, an important factor when studying land surface processes requiring hourly forcing. Phase 2 of the North America Land Data Assimilation System NLDAS-2 forcing (Cosgrove et al., 2003), on the contrary, provides hourly changes in precipitation and temperature, yet is only available at coarser 1/8 • resolutions.
As such, we employ a mass-conservative temporal interpolation, which disaggregates the total daily PRISM precipitation into an hourly time series, based on the signal of the NLDAS-2 precipitation and temperature trends. For the other forcing variables (i.e., shortwave and longwave radiation, wind speed, atmospheric pressure, and specific humidity), we use NLDAS-2 forcing, (Cosgrove et al., 2003). Simulated river stages and SWE were compared to observations in previous studies (Maina et al., 2022;Foster and Maxwell, 2019). Groundwater measurements are scarce in the watershed and the majority of the measurements are performed near a station measuring changes in river stages. Therefore, river stages and groundwater measurements at this point provide similar information.

Hillslope delineation
As shown in Fig. 1b, 127 hillslopes are delineated in the East River watershed based on the elevation following Noël et al., (2014) and using Topotoolbox developed by Schwanghart and Scherler (2014). A threshold of flow accumulation was set to match the stream observations at major tributaries of the East River . Because the hillslope delineation could be sensitive to the threshold of the drainage area, we tested different threshold values to find that the selected threshold value (810 000 m 2 ) represents the scale of hillslope at which the within-hillslope variability of key properties (such as elevation and aspect) is minimized and hillslope-averaged properties can account for the majority of watershed-scale variability .

Hillslope clustering approaches
We use eight hillslope clustering approaches: 1. The P 1 clustering proposed in this study identifies hillslopes with similar groundwater dynamics. Figure 2 shows the temporal variations of the simulated SWE and WTD at a selected hillslope (see its location in Fig. 1). All hydrologic variables have been computed at a hillslope scale by computing the arithmetic average of all cells in each hillslope. In this mountainous watershed where the largest changes in WTD are mostly a result of snowmelt, WTD decreases from the beginning of the WY (i.e., October) to the beginning of snowmelt (i.e., starting from April). As the snow starts to melt and precipitation starts to fall as rain instead of snow, WTD starts to rise. The shallowest WTD is June and July when the snow has completely melted and has had time to percolate through the unsaturated zone into the groundwater. This period also corresponds to the period of high ET, because both the evaporative demand and the water availability are high.
The dynamics show two periods that characterize the dynamics of the hillslope: from the initial conditions to the baseflow conditions when the hillslope is losing water, then from baseflow conditions to the peak of WTD when the hillslope is gaining water. To characterize these groundwater dynamics, we define two variables: -P 1 represents the change in WTD between the beginning of the water year and the deepest WTD during the baseflow conditions. This variable quantifies the amount of water released by the hillslope during the dry period at the beginning of the water year. It thus contains information about the amount of water that the hillslope typically releases/loses, mainly by ET and discharge, given its physical characteristics and climate dynamics.
-P 2 represents the changes in WTD between the peak flow (i.e., the period with the shallowest WTD) and the baseflow conditions. P 2 quantifies the amount of water gained in the hillslope by recharge and thus contains information about the recharge ability of the hillslope given its physical characteristics and climate dynamics.
These two key variables allow us to quantify water release ( P 1 ) and recharge ( P 2 ) within a hillslope; two key dynamics of the watershed hydrologic function (Sivapalan, 2006;Wagener et al., 2007. We note that these dynamics are also illustrated by the changes in measured groundwater levels as depicted in Appendix A.
2. Elevation: in mountainous watersheds, because the differences in hydroclimate are primarily driven by elevation, hillslopes with similar elevations could potentially have similar land surface processes signatures.
3. Land cover: hillslopes can also be clustered by their dominant land cover. Land cover shapes land surface processes, which in turn affect subsurface dynamics and the water balance at the hillslope scale.
4. TWI: The topographic wetness index commonly used to cluster hillslopes is given by ln( α tan(β) ), where α is the upslope draining area and β is the local angle. 5. AI: the AI (ETP/precipitation, where ETP is the potential evapotranspiration) represents the ratio of the average demand for moisture to the average supply of moisture. We derive the spatial distribution of the AI in the East River from the Global Aridity Index data set (CGIAR-CSI, 2019).
6. Machine-learning-based clustering: we define the hillslope similarity using the clustering of ParFlow-CLM input and output data layers. Clustering was performed in three different ways using the following data: (1) model input (elevation, percentage of the main land cover type, Figure 2. Temporal variations of water table depth (WTD) and SWE at an example hillslope. The location of the hillslope is shown in Fig. 1.
TWI, and AI), referred to hereafter as the "clustering input" (C.I.) method, (2) model output (ET, SWE, WTD, and P 1 ), referred to hereafter as the "clustering output" (C.O.) method, and (3) both model input and output data layers, referred to hereafter as the "clustering inputoutput" (C.I.O.) method. We use hierarchical clustering, which is a decision-tree-based method that divides data points based on a series of binary splits (Devadoss et al., 2020;Kassambara, 2017;Wainwright et al., 2022). We define the linkage (or the distance) between any two clusters based on the Euclidian distance and the Ward method that computes the variance within each cluster, measuring the distance between each observation and the cluster's mean, and then taking the sum of the distances' squares.

Hillslope clustering comparisons
To test the ability of the eight selected clustering approaches to identify and categorize hillslopes with similar static characteristics and dynamics, we assess each clustering's ability to describe several characteristics of the hillslope. These are elevation, land cover, hydroclimate (i.e., precipitation), land surface processes (SWE and ET), and subsurface dynamics (WTD values and variations). For each clustering, we define three zones. For each variable, zone, and clustering, we compute the mean (µ) of the hillslope values and the corresponding coefficient of variation (CV). We also calculate the mean of the CV of the different zones for each variable and clustering.
3 Results Figure 3 shows the spatial distribution of hillslope temperature, precipitation, SWE, ET, WTD, and P 1 . As expected, the hillslopes characterized by high SWE have high precipitation and low temperatures in contrast to the hillslopes with low SWE. However, ET shows a different pattern, as it depends on both water availability and ET demands, which depends on the type of land cover. The midelevation zone (i.e., Zone 2) with a high coverage of forests has high ET. Hillslopes with high P 1 have a deep WTD on average because the WTD increases significantly during baseflow conditions and reaches very large values as quantified by P 1 . Hillslopes with high P 1 values generally correspond to hillslopes with high precipitation and low temperature and therefore, high SWE values. To better understand the relationship between P 1 and the hillslope physical characteristics and hydrologic processes, we study the Pearson correlation coefficient between P 1 and the elevation, the percent of the dominant land cover, TWI, AI, ET, SWE, and WTD (Fig. 4).

Hillslope characteristics
Results for P 2 are not shown because P 2 is strongly correlated to P 1 . Bare soil, TWI, AI, SWE, and P 1 are strongly correlated (we define this as Pearson's correlation coefficient higher than 0.7) with elevation (Fig. 4, column 1, lines d, e, f, h, and j). In particular, elevation has a dominant control on AI and SWE, with a Pearson's correlation coefficient higher than 0.9. We observe nonlinearity such that TWI increases in the lower elevation and that AI becomes constant at the lower elevation. A high percentage of forest is only found in mid-elevation (Fig. 4, 2a), whereas a high percent-age of grassland is well correlated to low elevations (Fig. 4,  3a). ET is positively correlated to the percent of forests (Pearson's correlation coefficient is higher than 0.9, Fig. 4, 2g). P 1 has a Pearson's correlation coefficient higher than 0.7 for 6 out of 9 studied variables (elevation, percent of bare soil, TWI (correlation coefficient equal to 0.67), AI, SWE, and WTD, Fig. 4, line j, columns 1, 4, 5, 6, 8 and 9). It therefore indicates that changes in P 1 can reflect the changes of these variables. The two variables with low correlations with P 1 are ET and the percent of forests (Fig. 4, line j, columns 2 and 7). ET is related to groundwater dynamics in a nonlinear way (Condon et al., 2013;Ferguson and Maxwell, 2010;Rahman et al., 2016). As shown in these studies, regions with shallow WTDs have the highest ET fluxes and this flux typically decreases significantly with WTD. When WTD reaches a critical depth, the groundwater and the atmosphere disconnect and changes in WTD do not impact ET.

Hillslope clustering
For each clustering, we identify three zones (Fig. 5). For the P 1 , elevation, TWI, and AI clustering approaches, we define the thresholds of each zone by analyzing the distributions of the hillslope values of the following indices: 1. P 1 : Zone 1 comprises hillslopes whose P 1 is less than 1.5 m, P 1 of hillslopes in Zone 2 are comprised between 1.5 and 2.5 m, and Zone 3 groups all hillslopes with P 1 greater than 2.5 m.
2. Elevation: Zone 1 characterizes low elevation areas (average hillslope elevation is less than 3000 m), Zone 2 is mid (average hillslope elevation comprises between 3000 and 3500 m), and Zone 3 is high elevation (hillslopes with an average elevation greater than 3500 m).
3. Land cover: Zone 1 describes hillslopes that have predominantly grasses as land cover, Zone 2 describes hillslopes with more than 50 % of forest, and Zone 3 describes hillslopes where bare soil is the dominant land cover.
5. AI: Zone 1 comprises hillslopes with AI less than 0.45, Zone 2 describes hillslopes with AI between 0.45 and 0.55, and hillslopes of Zone 3 have an AI greater than 0.55.
6. Machine-learning-based clustering: the approaches automatically regroup the similar hillslopes into three zones. Table 1 depicts the mean (µ) and the corresponding coefficient of variation (CV) of hillslope values for each variable, zone, and clustering.

Similarities in hillslope structure
Elevation plays an important role in the hydroclimate of a region, especially in mountainous watersheds where it controls snow accumulation and controls the downstream hydrology. Figure 6 shows the elevation frequency distributions associated with the three zones derived from the eight clustering approaches. By classifying the hillslopes using their similarity in P 1 , we observe that hillslopes with low P have the lowest elevation, while the hillslopes of Zone 3 (high P 1 ) have the highest elevation. Unsurprisingly, the second clustering (i.e., elevation) clearly identifies the hillslopes with similar elevation. The AI clustering also identifies hillslopes with similar elevation, as shown in Figs. 4 and 6. The TWI clustering performs moderately, where Zones 1 and 2 are characterized by similar elevation distributions. Hillslopes with lower TWI  are mostly located in high elevation areas compared to the low elevation hillslopes. In the land cover clustering, most of the grassed hillslopes (Zone 1) are in low elevation, forests (Zone 2) are in mid-elevation, and hillslopes whose landscape is mainly bare soil (Zone 3) are in high elevation areas above the tree line. The three clustering approaches using machine learning allow the identification of hillslopes with similar elevation, as their coefficients of variation are of the same order as the elevation clustering. Table 2 describes the average hillslope ratio of land cover type (forests, grassland, and bare soil) for each zone and type of clustering. The land cover clustering indicates that grassland is the dominant land cover of Zone 1, forests in Zone 2, and bare soil in Zone 3. Only the machine-learning clustering approaches using outputs lead to a similar conclusion, whereas while the other clustering approaches capture the characteristics of Zone 1 and 3, they do not identify a distinct forested Zone 2. For the P 1 clustering, this could be attributed to the disconnection between groundwater dynamics and land surface processes that takes place in certain forested zones. Since clustering based on landscape characteristics and P 1 do not identify such a distinct zone, it suggests that this zone may not be indicative of distinct hydrologic behavior.  Figure 7a and b depicts the distributions of precipitation and temperature obtained with the eight selected clustering approaches. The AI clustering allows the identification of hillslopes with a similar hydroclimate because they have low values of coefficients of variation. Zone 1, located in low elevation, has low precipitation and high temperatures, contrary . Hillslope clustering approaches are located across the x axis. Note that we plotted the distributions of the eight clustering approaches on the same graph, and between each dotted line (with a frequency from 0 to 0.5), the frequency distributions of the three zones derived from the clustering are plotted.

Similarities in hydroclimate
to Zone 3. Zone 2 is characterized by a hydroclimate that is in between those of Zone 1 and 3. Our P 1 clustering leads to conclusions similar to the machine-learning-based clustering and AI. These three clustering approaches have the same average CV and are the only methods that allow the identification of hillslopes with a similar hydroclimate. However, we note that in the three machine-learning-based clustering and in the P clustering, Zones 1 and 2 have a similar hydroclimate, which is not the case in the AI clustering. While the land cover clustering approaches clearly identify the typical hydroclimate of the hillslopes of Zone 3, the two remaining zones have the same hydroclimate. The TWI clustering does not identify hillslopes with a similar hydroclimate because it relies on the hydrologic processes driven by the topography. TWI shows that clustering that includes only hydroclimate would miss important information on distinct hillslope hydrologic processes that strongly affect the response of the hillslope to meteorological forcing.

Similarities in hydrologic processes
In this section, we study the ability of the selected clustering approaches to identify hillslopes with similar hydrologic processes, which are snow dynamics, evapotranspiration, and WTD values and variations.

Land surface processes
A robust clustering in mountainous watersheds should identify hillslopes with similar snow dynamics. Figure 8a illustrates the SWE frequency distribution associated with each zone and clustering. Because SWE dynamics are primarily driven by elevation and precipitation, the AI and machinelearning-based clustering have the lowest average of the CV, followed by the land cover and the P 1 clustering. The land cover spatial distribution contains information about elevation, especially in high elevation areas where some hillslopes are located above the tree line. The P 1 clustering accounts for SWE dynamics because P 1 is highly correlated to SWE, as discussed in Sect. 3.1. The spatial distribution of ET is controlled by many factors, including soil moisture, land cover, and subsurface flow. The land cover clustering performs well in identifying hillslopes with similar ET because the latter strongly depends on the land cover (Fig. 8b). Consistent with the aforementioned results, the other clustering approaches that perform well are the machine-learning-based clustering and the AI. The TWI and elevation clustering approaches do not separate hillslopes by their ET values because they do not account for varying land cover and soil properties that influence ET. The average CV of the P 1 clustering is close to those of the land cover and AI clustering. As stated in many studies (Ferguson and Maxwell, 2010;Maina and Siirila-Woodburn, 2020;Maina et al., 2022), subsurface flow affects ET as this information about subsurface flow contains valuable information about the ET, even if the correlation between P 1 and ET is nonlinear.

Similarities in subsurface hydrodynamics
We investigate the ability of the eight selected clustering approaches to identify hillslopes with similar subsurface hydro- Clustering approaches are based on P 1 , elevation, land cover (LULC), topographic wetness index (TWI), aridity index (AI), and machinelearning approaches (with inputs C.I., outputs C.O., and inputs and outputs C.I.O.). Hillslope clustering approaches are located across the x axis. Note that we plotted the distributions of the eight clustering approaches on the same graph and between each dotted line (frequency from 0 to 0.5), the frequency distributions of the three zones derived from the clustering are plotted. dynamics. We study the average saturation of the first 10 cm of the soil throughout the WY, the yearly average of WTD, and P 2 . Soil saturation is a key feature in both subsurface and atmospheric dynamics; it controls ET and groundwater recharge. The average CV of the P 1 , TWI, AI, land cover, and clustering approaches are very similar (Fig. 9a). As the land cover clustering adequately regroups hillslopes with similar ET, it also allows the regrouping of hillslopes with similar soil saturation. Because the TWI describes the characteristics that drive flow, it serves as a good indicator of soil saturation like the AI. Similar to the results above, the machine-learning-based clustering perform well. The P 1 clustering has a low average CV due to the strong connection between the changes in WTD and soil saturation. It is only the elevation clustering that fails to identify hillslopes with similar soil saturation, where the distributions of the three defined zones show overlap.
WTD is an important variable for determining groundwater storage. Here, we rely on the average WTD throughout the year. As expected, the P 1 clustering identifies hillslopes with similar WTD (Fig. 9b). Zone 1 located in low elevation has the shallowest WTD and the lowest P 1 , contrary to Zone 3. Zone 2 exhibits a behavior that is in between those of Zone 1 and 3. The TWI and land cover clustering approaches also are good for identifying hillslopes with similar WTD. Hillslopes with low TWI (Zone 3) have the deepest WTD, contrary to the hillslopes of Zone 1. The TWI identifies hillslopes with similar WTD because of the high relief of the watershed that drives its hydrology (Fan et al., 2019). The land cover clustering indicates that most of the forest (Zone 2) and bare soil (Zone 3) hillslopes have deep WTD, whereas grass (Zone 1) hillslopes have the shallowest WTD. The elevation clustering does not accurately identify hillslopes with similar WTD and its average CV remains higher than the 4 other clustering approaches. The AI, like the elevation, is not a good variable for identifying hillslopes with similar WTD. All their three zones overlap in terms of WTD. Results from the machine-learning-based clustering are similar to the P 1 clustering with a CV of the same order, yet there is no clear distinction between Zone 1 and 2 in these machine-learningbased clustering approaches. Figure 9c illustrates the distributions of the P 2 for each clustering approach and zone. P 1 clusters hillslopes with similar P 2 , as expected. Another suitable clustering approach for hillslopes with similar P 2 is the land cover. Zone 3, characterizing bare soil hillslopes, has the highest P 2, unlike, zones 1 and 2. The AI clustering shows that the majority of Zone 3 hillslopes have high P 2 , whereas Zone 2 hillslopes have low P 2, , followed by Zone 1 hillslopes. In terms of P 2 similarity, the elevation clustering outperforms the TWI. The machine-learning-based clustering approaches are good for identifying hillslopes with similar P 2 , especially the clustering using inputs variables (CI). The two other machine-learning-based clustering approaches (outputs and inputs and outputs) do not distinguish Zone 1 from Zone 2. Figure 9. Frequency distributions of hillslope (a) saturation, (b) WTD, and (c) P 2 . Clustering approaches are based on P 1 , elevation, land cover (LULC), topographic wetness index (TWI), aridity index (AI), and machine-learning approaches (with inputs C.I., outputs C.O., and inputs and outputs C.I.O.). Hillslope clustering approaches are located across the x axis. Note that we plotted the distributions of the eight clustering approaches on the same graph and between each dotted line (frequency from 0 to 0.5) the frequency distributions of the three zones derived from the clustering are plotted.

Discussions
In this section, we discuss the advantages of the proposed P clustering compared to the other clustering approaches, and its ability to capture dry and wet hydrologic conditions.

Advantages of the P clustering
Depending on the purpose of the identification of similar hillslopes, the appropriate clustering may change. Nonetheless, it is important for any clustering approach to identify hillslopes with similar hydrologic processes. As demonstrated here, the advantage of using P 1 to identify similar hill-slopes is that many hydrologic processes are embedded in P 1 . Our comparisons have shown that by using a P 1 clustering, one is able to identify hillslopes with not only similar subsurface hydrodynamics, but also similar land surface processes. Because these processes are intimately linked to the physical characteristics of the hillslope, its hydroclimate, and its land cover, the P 1 clustering also allows for the identification of hillslopes with the aforementioned similar characteristics.
However, we highlight that other clustering approaches may outperform the P 1 when looking at a single characteristic. For instance, our results show that the elevation and AI (or other clustering approaches based on the hydroclimate e.g., Carrillo et al., 2011) clustering approaches may be excellent at identifying hillslopes with a similar hydroclimate and snow dynamics. The land cover clustering allows for better identification of hillslopes with similar land surface processes, such as ET and soil saturation. Lastly, the TWI clustering (e.g., Beven and Kirby, 1979;Grabs et al., 2009;Hjerdt et al., 2004;Loritz et al., 2019) allows the identification of hillslopes with similar groundwater dynamics and soil saturation values, as it describes the topographic flow. In terms of overall performance, our results show that for the study site considered here, the machine-learning-based clustering approaches are also very good at identifying similar hillslopes. Wainwright et al., (2022) use an unsupervised clustering method and remote sensing data layers, which include elevation, SWE, radiation, resistivity, and normalized difference vegetation index (NDVI) to define seven clusters in the East River watershed. While their clustering method has more zones (7) than ours (3), it leads to similar conclusions as our study, where Zones 1 and 2 are characterized by low elevation, high TWI, and low SWE values, contrary to Zones 5 and 6.
Other hillslope clustering approaches based on hydrologic processes relied on the Péclet number (Berne et al., 2005;Troch, 2007, 2010), which describes the subsurface hydrological response and is derived from an analytical solution of the subsurface flow (e.g., the Boussinesq storage equation, Lyon and Troch, 2010). However, while the threedimensional Richards equation has the advantage of better representing the subsurface flow, it cannot be solved analytically. Hence, these indices cannot be applied to integrated hydrologic models. Our approach has demonstrated that the P helps quantify the subsurface hydrologic responses without using these indices and therefore overcomes the limitation of the use of attributes -such as the Péclet number -on integrated hydrologic models to categorize hillslopes.

Similarities in hydrologic responses to wet and dry conditions
According to McDonnell and Woods (2004) and Wagener et al., (2007), any classification should be able to predict the dynamics of hillslopes. We test the ability of the P 1 clustering to predict the dynamics of hillslopes in wet and dry conditions. A possible limitation of a clustering based on a hydrologic process is that the latter may be linked to the conditions of the selected year. Hydrologic responses are, by essence, nonlinear and may strongly change from year to year. In addition, compared to the intrinsic characteristics of the hillslope (elevation, topographic index, and land cover), which are only variable if long periods of time are considered, the scale at which hydrologic processes change is much shorter. Therefore, a clustering based on a hydrologic process may be time-dependent. We previously quantified P 1 using an average WY. In this section, we compare the response of each zone to dry and wet conditions. We extend our simulation from the WY 2015 to include the WYs 2016, 2017, and 2018, then, we analyze WYs 2017 and 2018. This four-year simulation covers a relatively wet (2017) and dry (2018) WY. The annual average precipitation in 2017 was ∼ 15 % higher than the annual average precipitation in 2015.
After this wet WY, the watershed is characterized by a dry climate in 2018, with average precipitation almost 50 % below the normal conditions. Figure 10 shows the distributions of hillslope annual average values of precipitation and ET, the hillslope P 2 associated with the defined P 1 zones, and for both the wet WY 2017 and the dry WY 2018. We have selected the key variables describing the hydroclimate (precipitation), land surface processes (ET), and subsurface hydrodynamics ( P 2 ). At first glance, for both dry and wet years and selected processes, all zones remain distinct. Zone 1 with hillslopes with low P 1 , located in low elevation, remains with low precipitation and high ET through both wet and dry years. Zone 3 describing hillslopes with high P 1 , has the highest precipitation in the area during both the wet and dry years. The hillslopes of Zone 2, located in mid-elevation, have most of their hydrologic dynamics in between those of Zone 1 and 3, except their ET, which is the highest in the area due to the presence of forests. Our results show that although we defined hillslope clustering based on a hydrologic process during an average WY, our clustering approach can predict the similarity of the dynamics of these hillslopes in wet and dry conditions. The P 1 clustering is, therefore, robust in predicting similarity in hydrologic responses under both wet and dry conditions.

Summary and conclusions
In this study, we use seasonal changes in groundwater levels -termed P 1 -to identify and categorize similar hillslopes.
P 1 is an important variable controlled by many hydrologic processes, including land surface processes and hydroclimatic. We defined three zones based on their similarity in P 1 . For a test case site in the East River watershed, Zone 1 characterizes hillslopes with low P 1 ; these hillslopes are mostly located in low elevation areas, their main land cover is grassland, and their ET is high because their WTDs are shallow. Zone 3, on the opposite of Zone 1, is located in high elevation areas and has high P 1 ; the hydroclimate leads to high snow accumulation and low ET. Hillslopes of Zone 3 are mostly bare soil. Zone 2 is in between these two zones and most of the hillslopes of this zone are covered by forests.
We tested the performance of the proposed P 1 clustering by comparing it with other existing clustering approaches. This was based on elevation, land cover, aridity index, a topographic wetting index, and three clustering approaches based on machine learning, which uses multiple data layers includ- Figure 10. Frequency distributions of hillslope annual average daily rates of precipitation and evapotranspiration (ET), and the hillslope seasonal changes in groundwater levels ( P 2 ), in 2017 (wet WY) and 2018 (dry WY) of the three zones derived from the WY 2015 P 1 .
ing model inputs and outputs. Our results show that the P 1 clustering is robust as it reasonably identifies and categorizes hillslopes with similar elevation, land cover, hydroclimate characteristics, land surface processes (ET and SWE), and subsurface hydrodynamics (WTD, soil moisture, and P 1 ). In general, the other clustering approaches are good in identifying similarity in a single characteristic related to the variable determining the clustering. Our work also demonstrates that a clustering using machine learning, either based on topdown (inputs) or bottom-up (outputs), performs well. Nevertheless, these clustering approaches, like the P 1 , require multiple data sets, each one with its own associated uncertainty. We further demonstrate the robustness of the proposed P 1 clustering by testing its ability to predict hillslope responses to wet and dry hydrologic conditions. The P 1 values are derived from a model and could be a limitation for sites where simulated outputs are unavailable or the spatio-temporal resolution of groundwater observations are limited. In addition, one of the main limitations of the proposed clustering is that due to the disconnection between land surface processes and structures and the subsurface dynamics in some regions, this clustering approach cannot be used in these conditions. Future studies could aim to identify similar hillslopes using P 1 and sophisticated machine learning approaches or optimization procedures. Our results are limited to one catchment, which has snow-dominated hydrology. Future studies could expand the comparison shown here to other watersheds to include additional clustering approaches and for different hydroclimate and durations of time (for example, sub-annual or multi-annual clustering).
Appendix A Figure A1. (a) Measured groundwater levels in WY2016 at a station located in blue in (b).
Author contributions. FZM was responsible for conceptualizing the study, undertaking the formal analysis, writing and preparing the original draft of the paper, and reviewing and editing the paper. HW was responsible for conceptualizing the study, undertaking the formal analysis, writing and preparing the original draft of the pa-per, and reviewing and editing the paper. PJDF was responsible for conceptualizing the study, writing and preparing the original draft of the paper, and reviewing and editing the paper. ERSW was responsible for conceptualizing the study, undertaking the formal analysis, writing and preparing the original draft of the paper, and reviewing and editing the paper.