Introduction
Spatial heterogeneity plays a critical role in the terrestrial
water, energy, and biogeochemical cycles from local to continental and global
scales. This has been recognized for decades in hydrology, ecology,
geomorphology, and soil science where it has been observed repeatedly that at
multiple temporal and spatial scales, land surface processes have deep ties
to an ecosystem's spatial structure and function. As a result, the macroscale
behavior of the water, energy, and biogeochemical cycles cannot be
disentangled from their fine-scale processes and interactions
.
Recognizing the importance of multi-scale heterogeneity in the Earth system,
in the 1980s and 1990s there was a strong emphasis on including its role in
large-scale land surface models
.
These sub-grid schemes, however, rarely were designed to handle the sub-grid
multi-scale coupling of the water, energy, and biogeochemical cycles as
intended in contemporary applications e.g., Earth system
models,. This is especially relevant as large-scale models
begin to include human influence on land surface processes
. Acknowledging these constraints, in
recent years there has been a renewed emphasis on improving the
representation of sub-grid heterogeneity through a more robust representation
of soil, topographic, urban, and microclimate heterogeneity and by enabling
explicit subsurface and surface interactions among sub-grid mosaic “tiles”
or hydrologic response units
.
Although these emerging approaches have the potential to considerably improve
the representation of sub-grid heterogeneity in macroscale models, their
added value depends on both the data and the approaches used to inform the
sub-grid schemes on the underlying heterogeneity of the physical environment
– the primary driver of spatial heterogeneity in land surface processes
(e.g., topography). The continual growth in the availability, detail, and
wealth of Earth system data over the past decades provides an invaluable
asset to make this possible. The use, harmonization, combination, and
reinterpretation of field surveys, in situ networks, and satellite remote
sensing have led to petabytes of vegetation, topography, climate,
meteorology, and soil data over continental to global extents with spatial
resolutions ranging between 10 and 1000 m
.
These data, although uncertain, provide invaluable very high-resolution
snapshots of the sub-grid physical environment and its impact on ecosystem
spatial structure and function.
In most cases, environmental models use these data in two ways: (1) running
the model at the native spatial resolution of the data (i.e., fully
distributed model) and (2) running the model on a coarser grid by upscaling
the data (i.e., lumped model). Both options are inadequate for Earth system
models; the first is computationally unfeasible, while the second mostly
disregards the role of sub-grid heterogeneity. The question then remains: how
can these data be used to the fullest extent while minimizing both
computation and storage requirements? This challenge is analogous to image
compression where the goal is to maximize an image's information content
while minimizing disk storage (e.g., clustering)
. For environmental data, this equates to
effectively and efficiently summarizing the data while minimizing information
loss. This concept is the underlying basis for mosaic schemes in land surface
models and hydrologic similarity in hydrologic models
.
Commonly, within macroscale models, the use of similarity amounts to binning
(i.e., one-dimensional clustering) maps of variables that are used as proxies
of the drivers of spatial heterogeneity (e.g., topographic index is used to
represent the role of topography in subsurface flow) to assemble a set of
representative sub-grid tiles. However, recently there have been efforts to
formally connect the concept of similarity to the clustering of an
n-dimensional space – in this case, the n-dimensional space is composed
of the proxies of spatial heterogeneity .
take this concept a step further by building a model
(HydroBlocks) that enables the explicit interaction among the tiles assembled
via multivariate clustering – the connectivity between the tiles is learned
from the elevation data. In this case, hydrologic connectivity was enforced
in the clustering algorithm primarily through flow accumulation area derived
from the DEM while mostly disregarding the basin's channel, hillslope, and
sub-basin structure; this oversimplification of catchment structure can lead to
overly complex and at times unrealistic inter-tile connections – critical to
accurately simulating baseflow production. Thus the need remains for a
clustering approach that allows for a minimal number of tiles while robustly
accounting for the fine-scale hydrologic structure of the macroscale grid
cell.
This paper introduces a hierarchical multivariate clustering approach (HMC)
that summarizes the high-dimensional environmental data space into
hydrologically interconnected representative clusters (i.e., tiles). HMC has
three main components: (1) cluster the fine-scale map hillslopes in a grid
cell into characteristic hillslopes, (2) discretize the characteristic
hillslopes into height bands, and (3) cluster the intra-band soil and
vegetation. As a proof of concept, these clusters (i.e., tiles) are then used
within the Geophysical Fluid Dynamics Laboratory (GFDL) LM4-HB land model to
explore its potential to provide a robust multi-scale coupling between the
water, energy, and biogeochemical cycles. Using a 1/4∘
(∼ 25 km) grid cell in the western foothills of the Sierra Nevada in
central California as a test bed, this paper explores the number of tiles
necessary to robustly account for the sub-grid multi-scale heterogeneity in
the macroscale states and fluxes. It also explores the role that each of the
drivers of spatial heterogeneity plays at the macroscale. Finally, the
implications of this approach in the application and validation of
large-scale environmental models and Earth system models are discussed.
Test-bed site in the foothills of the southern Sierra Nevada in
California used to develop, implement, and test the HMC algorithm. The region
is characterized by strong heterogeneity in topography, climate, and soil
properties leading to a complex multi-scale ecosystem spatial structure.
Data
To develop, implement, and evaluate HMC, this study uses a 1/4∘ grid
cell that covers the foothills and high sierras of the southern Sierra Nevada
in California (Fig. ). This domain is selected due to the
observable role of the physical environment in the sub-grid heterogeneity.
This heterogeneity is primarily explained by the strong topographic gradient
between the Central Valley and the Sierra Nevada and its impact on
precipitation and temperature; the highest point in the domain is 3118 m,
while the lowest is 163 m. This area has an annual average rainfall of
614 mm with large intra-cell variability with a minimum of
299 mm yr-1 in the lowlands and a maximum of 1152 mm yr-1 in
the uplands. In both cases, most of the rainfall occurs between October and
May. The uplands are primarily covered by evergreen vegetation, while shrubs
and grasses cover the lowlands. The uplands are characterized by a higher
sand content than the lowlands and vice versa for clay content.
Land and meteorological data
Topography
This study uses the 1 arcsec USGS National Elevation Dataset (NED) digital
elevation model (DEM) and a series of derived products including flow
accumulation area, hillslopes, slope, height above the nearest drainage area
(HAND), and aspect. The NED covers the contiguous United States (CONUS) and
is created primarily from the USGS 10 and 30 m digital elevation models and
from higher-resolution data sources such as light detection and ranging,
interferometric synthetic aperture radar, and high-resolution imagery
.
Meteorology and climate
The recently developed Princeton CONUS Forcing (PCF) dataset provides a
1/32∘ (∼ 3 km) meteorological product over CONUS at an hourly
temporal resolution between 2002 and present . This
dataset downscales the National Land Data Assimilation System phase 2
(NLDAS-2) product from 1/8 to 1/32∘ using a series of available
products including Stage IV and Stage II radar/gauge products for rainfall.
PCF includes precipitation, downward shortwave radiation, downward longwave
radiation, air temperature, specific humidity, wind speed, and pressure.
Furthermore, to inform the microclimate heterogeneity in HMC, this paper uses
the recently released WorldClim2 dataset. This gridded dataset derived from
in situ observations provides monthly climatologies of temperature,
precipitation, solar radiation, vapor pressure, and wind speed over the
global land surface at a 30 arcsec spatial resolution
.
Soil properties
The soil properties come from the Probabilistic Remapping of SSURGO (POLARIS)
dataset , a new continental soil dataset that uses random
forests to spatially disaggregate and harmonize the Soil Survey Geographic
(SSURGO) database over CONUS. In POLARIS, for
each 30 m grid cell, every soil series is assigned a probability of being
found at a given grid cell. These probabilities are then combined with the
vertical profile information available for each soil series to construct a
minimum, maximum, weighted mean, and weighted variance for each grid cell –
enough information to construct a beta distribution for each parameter per
vertical layer of each grid cell. For this study, this approach provides
porosity (θs), wilting point (θwp), field
capacity (θfc), and saturated hydraulic conductivity
(Ksat), which are set to be the median of each corresponding beta
distribution. These data are then used to directly compute the bubbling
pressure (ψb) and the inverse of the pore distribution size
index (B) which are used in the Campbell water retention curve (θ=θs(ψb/ψ)1/B) .
Land cover
The Cropland Data Layer (CDL) provides the 30 m land cover types. The CDL is
an annually produced database over CONUS that combines the National Land
Cover Database with an annual analysis of the spatial
distribution of croplands. It is created and managed by the United States
Department of Agriculture's National Agricultural Statistics Service
(USDA-NASS). The predicted land cover types are based on the reflective
signatures from a number of satellites including Landsat TM and ETM+, MODIS
satellite data, and the Advanced Wide Field Sensors (AWiFS), among others
. The different categories are associated with
their corresponding land use types and species within the LM4-HB model.
Methods
Land model description: LM4-HB
For this study, the conceptual approach that is used to parameterize sub-grid
heterogeneity in the HydroBlocks land surface model
HB, is added to the fourth generation of the
Geophysical Fluid Dynamics Laboratory (GFDL) land model
LM4,. The resulting LM4-HB
model is used to explore how big data can be efficiently and effectively
harnessed to improve the characterization of the sub-grid multi-scale
heterogeneity in Earth system models. LM4-HB uses a hierarchical approach to
represent the underlying sub-grid heterogeneity; this makes it an ideal
candidate for the testing and implementation of a hierarchical multivariate
clustering algorithm to assemble the underlying sub-grid heterogeneity from
available environmental datasets. This section provides an overview of LM4-HB
with a primary focus on describing its hierarchical representation of
sub-grid heterogeneity.
Schematic representation of a characteristic hillslope. Each
characteristic hillslope is divided into height bands, which in turn are
partitioned into intra-band tiles. Each tile interacts via the subsurface
flow of water with the tiles in its same height band and with all the tiles
in the height bands below and above.
The land fraction of each grid cell in LM4-HB is partitioned into soil,
glacier, and lake components. The soil component in turn is composed of k
characteristic hillslopes; each hillslope has a unique set of attributes
including slope, aspect, convergence, and convexity, among others. As shown
in Fig. , each characteristic hillslope i is
divided into li height above nearest drainage area (HAND) bands (referred
to from now on as height bands). Each height band bi,j is divided into
pi,j clusters (i.e., tiles) to account for the intra-band heterogeneity
in soil and land cover. The total number of soil tiles is given by the sum of
all tiles in each height band for all characteristic hillslopes within a grid
cell. Although not shown in Fig. , the current
model uses a uniform soil depth for all tiles within a characteristic
hillslope. However, the soil properties (e.g., porosity and hydraulic
conductivity) control the effective soil depth of each tile, and thus
variable soil depths as shown in the schematic are effectively represented.
Each soil tile within LM4-HB consists of a model from canopy air down to
impermeable bedrock. The processes captured within the model include
bidirectional diffuse and direct, visible and near-infrared radiation
transfer; photosynthesis and stomatal conductance; surface energy, momentum,
and water fluxes; snow physics; soil thermal and hydraulic physics (including
advection of heat by water fluxes); vegetation phenology, growth, and
mortality; simple plant-functional-type transition dynamics; and simple
soil-carbon dynamics. For further details on the intra-tile processes, see
and .
Within each characteristic hillslope, each tile interacts with the tiles in
its same height band and the tiles in the height bands immediately below and
above via the subsurface flow of water; heat and carbon are advected by the
water fluxes. The tiles adjacent to the channel interact with the stream in
one direction (tile to stream). Each height band is characterized by a
length, width, and height above the nearest drainage area. The effective
width of a tile for a given height band is the corresponding fraction of the
width of the height band. For further details on the tile interactions and
the hillslope model more generally, see .
For all model simulations in this study, LM4-HB is run with a 50 m soil
depth (the same for all tiles) at a 1 h time step for 130 yr by cycling
through the forcing between 2002 and 2014 ten times. The first 117 yr are
used for spin-up, while the final 13 yr of the simulation are used for the
analysis.
Assembling the land model tiles: hierarchical multivariate clustering
To take advantage of LM4-HB's sub-grid representation of land cover, soil,
climate, and topography, the characteristic hillslopes and the
intra-hillslope heterogeneity are parameterized using available continental
and global environmental data. This section provides an overview of the
hierarchical multivariate clustering (HMC) algorithm used to assemble a grid
cell's tiles. Its steps are (1) define the characteristic hillslopes,
(2) discretize the characteristic hillslopes into height bands, and
(3) define the intra-band heterogeneity.
The characteristic hillslopes are defined by (1) delineating the
hillslopes from the elevation data, (2) calculating a suite of properties for
each hillslope from environmental data, and (3) using the hillslope
properties and the k-means clustering algorithm to define k
characteristic hillslopes.
Define the characteristic hillslopes
The characteristic hillslopes for a given grid cell are defined by clustering
a grid cell's fine-scale map of hillslopes. To assemble the map of
hillslopes, the DEM is sink-filled and the channels
are delineated using an area threshold of 100 000 m2. A recursive
algorithm then splits each basin into a maximum of three hillslopes – left
side, right side, and headwaters. Each hillslope's attributes are assembled
from the high-resolution soil, topography, and climate data. These include
each hillslope's average aspect, slope, annual mean precipitation, and annual
mean temperature. Metrics that summarize each hillslope's plan and profile
geometry are derived from the hillslope's binned HAND data. Given each bin's
slope, HAND, and area, the length and width are readily computed. To
summarize these properties, a line is fit to the set of widths for each
hillslope; the slope of this function provides a summary metric of the
hillslope's plan shape (convergence/divergence). For each hillslope's
profile, the function h=H1-1-(x/L)ab is
fit to the binned HAND data; where h is the HAND, H is the maximum HAND,
x is the horizontal position, and L is the hillslope length. The
parameters a and b summarize the concavity of the lower half and upper
half of each hillslope, respectively. This function is chosen due to its
flexibility to reproduce convex, concave, and complex hillslope profiles.
The profile of each characteristic hillslope is constructed by
fitting h=H1-1-(x/L)ab to the combined
profiles of all its corresponding hillslopes. The characteristic profile is
then discretized into ⌈Hi/Δh⌉ height bands. Finally, the
HAND and characteristic hillslope maps are combined with the discretized
profiles to assign a unique height band to each fine-scale grid cell. In the
height bands image, each band of each characteristic hillslope is represented
by a different color.
Assembling all the calculated attributes leads to an n by m array where
n is the number of hillslopes and m is the number of attributes. The
k-means clustering algorithm is then used to partition
the normalized m-dimensional attribute space into k characteristic
hillslopes. Figure provides an overview of the
steps used to define the characteristic hillslopes. The attributes of each
characteristic hillslope are then set to be the arithmetic mean of the
attributes of its corresponding hillslopes.
Discretize the characteristic hillslopes
After assembling the set of characteristic hillslopes, their attributes, and
their corresponding profile and width functions, the next step is to
discretize each profile along the length axis into height bands. The number
of height bands is li=⌈Hi/Δh⌉, where i is the
characteristic hillslope, Hi is the profile's maximum height, and Δh is the fixed height difference between adjacent height bands. Note that
the number of height bands per characteristic hillslope can differ per
characteristic hillslope since each has a unique Hi. Each resulting height
band has a unique mean width, length, and height.
For each height band of each characteristic hillslope, the
corresponding fine-scale grid cells are clustered into p intra-band tiles
according to their land cover and soil properties.
Using the high-resolution maps of characteristic hillslopes and the HAND,
each high-resolution grid cell is assigned a characteristic hillslope and a
height band. This is accomplished by first normalizing the HAND map per
hillslope by dividing all HAND values that belong to a given hillslope by the
given hillslope's maximum HAND value (H). This normalized HAND map is then
combined with the normalized discretized HAND profiles of the characteristic
hillslopes to assign each high-resolution grid cell to its corresponding
height band. This process formally connects the discretized characteristic
hillslopes to the observed landscape. Figure illustrates an example discretization of a
characteristic hillslope and the mapping of the discretized hillslopes to the
high-resolution grid.
As an exploratory simulation, the LM4-HB model is run between 2002
and 2014 using 14 tiles assembled via HMC. The tile simulations are shown for
evapotranspiration (ET), sensible heat flux (SH), baseflow (Rb),
and root zone soil moisture (SM) for 2005; the time series are color-coded to
correspond to the 30 m map of tiles. Each tile has a corresponding id
ti,j,k, where i is the characteristic hillslope, j is the height
band, and k is the intra-band cluster. For each variable, the tile-weighted
average time series (macroscale estimate) is superimposed on the tile
simulations for comparison.
Define the intra-band heterogeneity
The final step is to define the heterogeneity within each height band
bi,j where i is the characteristic hillslope and j is the height
band. For each band bi,j, the collocated fine-scale grid cell values of
the proxies of heterogeneity are extracted. This study uses saturated
hydraulic conductivity, porosity, and a set of binary maps (natural/cropland,
evergreen/deciduous, and grass/tree) derived from the high-resolution land
cover map. These binary maps are used to avoid clustering categorical data.
For each height band bi,j, this leads to an n by m array of
attributes where n is the number of fine-scale grid cells that belong to
bi,j and m is the number of attributes. The k-means algorithm is
then used to cluster the m-dimensional attribute space into pi,j
intra-band clusters (i.e., tiles). In this study, for simplicity, pi,j
is set to be the same for all height bands and characteristic hillslopes.
Therefore, the final number of tiles for the macroscale grid cell is given by
ntiles=p∑i=1kli.
Each tile within a grid cell is assigned an id ti,j,k where i is the
characteristic hillslope, j is the height band, and k is the intra-band
cluster. Figure shows an example of the
intra-band clustering and the tile configuration resulting from the
hierarchical clustering algorithm over this study's domain. For each tile,
the continuous land model parameters are set to be the arithmetic average of
all the parameter values of the fine-scale grid cells that belong to the
given tile. The mode is used instead of the mean for categorical land model
parameters. Each tile is assigned its own meteorology by assigning a weighted
average of all the overlying 4 km PCF grid cells that intersect with the
30 m grid cells that belong to the tile. One of the advantages of having
each 30 m grid cell belong to a tile and the corresponding map of tiles is
that the simulations for each tile can be mapped to the fine-scale grid to
provide a 30 m representation of the model output at each time step.
Model experiments
Exploratory simulation
By clustering the high-dimensional environmental space, HMC explicitly
relates the tiles used in LM4-HB to the observed fine-scale physical
environment while ensuring realistic hydrologic connections between tiles
along characteristic hillslopes. To illustrate the benefits and additional
model information that can be extracted when using HMC, an exploratory
simulation is run using a simple HMC-assembled tile configuration (k=2,
Δh=50 m, p=2) with 14 tiles within LM4-HB. This tile configuration
is among the simplest cases for this domain that are able to illustrate the
roles of all the different drivers of heterogeneity. This exploratory
simulation is then analyzed to illustrate the added information that HMC
provides.
Using the exploratory simulation, the tile-simulated daily
evapotranspiration (ET) values for 16 June 2005 are mapped onto the 30 m
fully distributed grid using the HMC-assembled fine-scale map of tiles.
Through a series of model experiments, the HMC parameters are
adjusted to assess their role in the modeled heterogeneity. ek,Δh,p is the experiment id, k is the number of characteristic hillslopes,
Δh is the difference in height between adjacent height bands, p is
the number of intra-band clusters, and ntiles is the resulting number of
tiles.
ek,Δh,p
k
Δh
p
ntiles
e1,1000,1
1
1000
1
1
e2,1000,1
2
1000
1
2
e10,1000,1
10
1000
1
10
e10,50,1
10
50
1
28
e10,20,1
10
20
1
70
e10,10,1
10
10
1
139
e10,10,2
10
10
2
278
e10,10,3
10
10
3
417
e10,10,5
10
10
5
695
Hierarchical multivariate clustering: parameter sensitivity
The primary objective of HMC is to harness high-resolution environmental data
to efficiently and effectively summarize a macroscale grid cell's underlying
multi-scale spatial structure. To make this possible, HMC relies on a set of
user-defined parameters to control the importance of each hierarchical step
in the algorithm; these parameters include (1) the number of characteristic
hillslopes (k), (2) the elevation difference between adjacent height bands
(Δh), and (3) the number of intra-band clusters per height band
(p). To test LM4-HB's sensitivity to these parameters, nine model
experiments are performed in which the HMC parameters are adjusted to assess
their individual roles. The model experiments are outlined in
Table . Experiments e1,1000,1,
e2,1000,1, and e10,1000,1 increase k from 1 to 10 characteristic
hillslopes, experiments e10,50,1, e10,20,1, and e10,10,1
decrease Δh until 10 m, and experiments e10,10,2,
e10,10,3, and e10,10,5 increase p by up to five intra-band
clusters.
Characterizing the roles of the drivers of heterogeneity
Applying HMC to existing high-resolution environmental data enables a robust
representation of the different drivers of spatial heterogeneity (soil,
topography, meteorology, and land cover) within macroscale environmental
models. However, it does not explicitly characterize the individual role of
each driver – key to advancing our understanding of the relationship between
the physical environment, ecosystem spatial structure, and macroscale
response. To make this analysis possible, another set of model experiments is
explored that investigate the individual role of each driver at the
macroscale. Using an approach similar to , each driver's
sensitivity is explored by turning the heterogeneity of properties associated
with each driver on and off. When “on” the properties associated with the
driver are left as assigned through HMC; when “off” the driver's properties
are set to be the grid cell mean. The different model experiments are
outlined in Table .
Visual comparison of the mapped annual mean (2002–2014) of
simulated evapotranspiration for the nine model experiments in
Table .
Results and analysis
Exploratory simulation
Figure shows the simulated time series for all 14
tiles at a daily time step for baseflow, root zone soil moisture,
evapotranspiration, and sensible heat flux. The fine-scale map of tiles is
also shown and color-coded to relate the spatial location of each tile to the
simulated time series. Each tile is assigned an id ti,j,k, where i is
the characteristic hillslope, j is the height band, and k is the
intra-band cluster. A comparison of the tiles' time series exemplifies the
differences in states and fluxes that are driven by a tile's location,
properties, and meteorology. Explicitly resolving hillslope dynamics leads to
significant differences in the root zone soil moisture; in general, soil
moisture decreases as the tiles are further away from the valley. However,
land cover, soil, hillslope structure, and meteorological differences can
lead to differences in soil moisture between hillslopes and intra-band
clusters (e.g., uplands vs. lowlands). This strong topographic gradient in
soil moisture provides more water for vegetation growth in the valleys than
along the ridges; given that this is a water-limited area, this explains the
appreciable differences in simulated evapotranspiration. Furthermore, this
also explains the strong heterogeneity in sensible heat flux, with tiles with
higher soil moisture having lower sensible heat fluxes. Finally, only the
tiles that are adjacent to the channels (t1,1,1, t1,1,2,
t2,1,1, and t2,1,2) produce noticeable baseflow.
Comparison of the model experiments in Table . For each experiment, the spatial mean and
spatial standard deviation for all water years (1 October–30 September)
between 2003 and 2014 are plotted for a suite of states and fluxes. The
corresponding values for each water year are color-coded according to their
precipitation. The black line shows the annual mean.
For all states and fluxes, the macroscale (tile-weighted average) time series
for each variable is superimposed on the tile simulations to illustrate the
strong differences that can exist between individual tiles and the macroscale
estimate (temporal dynamics and mean). These differences illustrate the
challenge of comparing a macroscale estimate to observations and simulations
at different spatial resolutions, a persistent challenge when aiming to apply
and evaluate macroscale models. However, as will be discussed in
Sect. , being able to connect each 30 m grid cell to
each tile simulation enables a path towards circumventing the scale
discrepancies between macroscale model estimates and in situ observations.
Figure illustrates how the tile simulations can then
be mapped onto the 30 m fully distributed grid. In this example, the daily
averaged simulated evapotranspiration value for each tile on 16 June 2005 is
assigned to each corresponding fine-scale grid cell. Being able to visualize
the assumed heterogeneity of the model enables a more realistic comparison to
fully distributed models. Furthermore, it makes it possible to provide model
output at spatial resolutions at far finer spatial resolutions than the
tile-weighted average (i.e., macroscale estimate).
Visual comparison of the mapped annual mean (2002–2014) of
simulated runoff (R), 2 m root zone soil moisture (SM), leaf area index
(LAI), and soil temperature (Ts) for the five model experiments in
Table .
Comparison of the model experiments in Table
. For each experiment, the spatial mean
and spatial standard deviation for all water years between 2003 and 2014 are
plotted for a suite of states and fluxes. The corresponding values for each
water year are color-coded according to their precipitation. The black line
shows the annual mean.
Hierarchical multivariate clustering: parameter sensitivity
As outlined in Sect. , the experiments in this
section explore the sensitivity of the HMC parameters through a set of model
experiments; these experiments are summarized in Table .
As an initial visual comparison,
Fig. shows the mapped annual
mean evapotranspiration between 2002 and 2014 at a 30 m spatial resolution
for the different model experiments. The baseline experiment is the one-tile
configuration (i.e., no sub-grid heterogeneity). An increase in the number of
characteristic hillslopes leads to the appearance of large-scale spatial
patterns in evapotranspiration (experiments e1,1000,1, e2,1000,1,
and e10,1000,1). This is primarily due to the strong topographic
gradient in precipitation between the lowlands and uplands; this
heterogeneity in evapotranspiration is possible through the disaggregation of
the PCF meteorology among the land model tiles. Decreasing Δh leads
to an increase in the number of height bands per characteristic hillslope;
this makes the role of topographic convergence in subsurface
flow readily apparent –
evapotranspiration is higher in the riparian zones (experiments
e10,50,1, e10,20,1, and e10,10,1). Finally, increasing the
number of intra-band clusters adds to heterogeneity in evapotranspiration due
to land cover and soil heterogeneity (experiments e10,10,2,
e10,10,3, and e10,10,5). Note how after experiment e10,10,2
(278 tiles), the spatial patterns cease to change as much.
Figure formalizes the comparison
between the different model experiments; it shows how the spatial mean and
spatial standard deviation of the annual mean of each year between 2002 and
2014 change as a function of the tile configuration for a suite of states and
fluxes. The primary result is the apparent convergence of all states and
fluxes for both the spatial mean and standard deviation with the increase in
the number of land model tiles. In other words, there is a point at which
further increases in the number of tiles have a limited impact at the
macroscale. For this site, that limit is approximately 300 tiles compared to
the 810 000 fine-scale grid cells in the domain. This result is encouraging;
it illustrates that multi-scale sub-grid heterogeneity can be characterized
effectively and efficiently in large-scale models by taking advantage of the
covariance between environmental properties.
The role that each parameter of HMC has at the macroscale depends on the
prognostic variable; these differing roles are discussed below.
k – an increase in the number of characteristic hillslopes from experiments
e1,1000,1 (1 tile) to e10,1000,1 (10 tiles) leads to noticeable
changes in all the prognostic variables. The most noticeable changes occur
when increasing the number of characteristic hillslopes from one to two. This
is primarily due to the improved representation of land cover heterogeneity;
instead of the grid cell being represented uniformly as evergreen trees, the
lowlands are now grasses while the uplands remain as evergreen trees. This
leads to a decrease in the cell's effective roughness length (i.e., a
decrease in aerodynamic conductance), and thus a decrease in sensible heat
flux and an increase in surface temperature. The decrease in aerodynamic
conductance also contributes to a decrease in transpiration.
Δh – a decrease in Δh from experiments e10,50,1 (28 tiles)
to e10,10,1 (139 tiles) leads to an increase in the number of height
bands in the characteristic hillslopes. This results in an explicit
representation of the role of topographic convergence in subsurface
flow; more soil water is available
in the valleys than along the ridges. The increase in soil water in the
valleys also leads to more frequent saturated excess runoff, to the extent
that during wet years this counteracts the decrease in baseflow. Another
noticeable impact of the increase in the number of height bands is the
decrease in inter-annual variability in transpiration, net primary
productivity, baseflow, and sensible heat flux. As will be discussed in
Sect. , this can have potentially important
implications for the role of spatial structure in ecosystem resilience to
hydrologic extremes.
p – an increase in the number of intra-band clusters from experiments
e10,10,2 (278 tiles) to e10,10,5 (695 tiles) leads to a more robust
representation of soil and land cover heterogeneity throughout the domain.
This leads to differences in most variables. However, these differences are
not as noticeable as those due to changes in k and Δh since most of
the heterogeneity in land cover and soil heterogeneity has already been
represented through these other parameters. This parameter will most likely
play a larger role in regions where the ecosystem spatial structure is not as
strongly tied to topography.
Through a series of model experiments, the heterogeneity of model
parameters and forcing data is turned on (heterogeneous) and off
(homogeneous). For simplicity, the parameters and forcing data are grouped
into soil, hillslope, land cover, and meteorology groups.
eid
Soil
Hillslope
Land cover
Meteorology
B
Off
Off
Off
Off
S
On
Off
Off
Off
SH
On
On
Off
Off
SHL
On
On
On
Off
SHLM
On
On
On
On
Characterizing the roles of the drivers of heterogeneity
Although Sect. provides preliminary insight into the role
of the different drivers of spatial heterogeneity (e.g., topographic
convergence impacts the macroscale soil moisture mean), due to the
interactions of the HMC parameters, it cannot precisely disentangle each
driver's unique role. For this purpose, as introduced in Sect. , another set of experiments is explored that turn
the different drivers of heterogeneity on and off. These experiments are
summarized in Table . Furthermore, the
tile configuration of experiment e10,10,2 in Sect.
(278 tiles) is used for all experiments in this section; this tile
configuration is chosen because, as shown in Sect. , the
model macroscale states and fluxes converge at around 300 tiles.
As an initial visual comparison, the mapped model results of the different
experiments are shown in Fig. for
annual mean runoff, 2 m root zone soil moisture, leaf area index, and soil
temperature between 2002 and 2014 at a 30 m spatial resolution;
Fig. formalizes this
comparison. The roles that each driver plays in the macroscale prognostic
states and fluxes are discussed below.
Baseline (B) – the baseline experiment equates to the homogeneous sub-grid
cell case. However, in this case, there are 278 tiles where each tile has the
same soil properties, hillslope structure (each tile is set to be its own
characteristic hillslope), and land cover properties. Furthermore, each tile
is run using the grid cell mean meteorology. Not surprisingly, there is no
heterogeneity in the plotted maps and the spatial standard deviation for all
prognostic states and fluxes is 0.
Soil heterogeneity (S) – this experiment adds heterogeneity in the soil
properties; this includes porosity and the Campbell retention curve
parameters, among others. It has a relatively small impact on the spatial
mean for all variables. However, there are changes in their spatial standard
deviations. The increase in spatial standard deviation is largest for soil
moisture, which in turn impacts the remaining prognostic variables. For
example, changes in available water impact infiltration excess runoff,
leading to, at times, appreciable heterogeneity in annual runoff production.
In any case, these differences are minor at the macroscale when compared to
the subsequent experiments.
Soil and hillslope heterogeneity (SH) – assigning each tile to its original
corresponding characteristic hillslope and discretizing these hillslopes lead
to significant differences at the macroscale. The strong topographic
gradients caused by the discretized hillslopes lead to strong topographic
gradients in soil moisture and thus explain the sharp increase in the spatial
standard deviation of soil moisture. These topographic gradients in soil
moisture lead to an overall increase in saturated excess runoff during wet
years, thus counteracting the overall decrease in baseflow; this also
explains the increase in the spatial standard deviation of runoff. The most
noticeable role of the topographically driven subsurface flow is the
reduction in inter-annual variability in root zone soil moisture, leaf area
index, baseflow, transpiration, net primary productivity, and sensible heat
flux. This is due primarily to the significant increase in inter-annual
change in storage when including explicit topographic gradients; these
topographic gradients enable the system to be able to release more water
during dry years (uphill deep soil water is made available to the riparian
zone through subsurface flow) and to absorb more water during the wet years
(increase in infiltration capacity due to the heterogeneity of soil
moisture).
Soil, hillslope, and land cover heterogeneity (SHL) – this experiment
adds heterogeneity in land cover. These changes are similar to those seen in
Sect. when increasing the number of characteristic
hillslopes. This is because both cases ensure evergreen forests are
represented in the uplands and grasslands and shrubs are represented in the
lowlands. This leads to a lower effective roughness height, explaining the
decrease in sensible heat flux, net primary productivity, and transpiration.
Land cover heterogeneity also leads to an appreciable increase in the spatial
standard deviation of sensible heat flux, net primary productivity, and
transpiration. Its role is also particularly noticeable in the soil
temperature spatial distribution by creating sharp contrasts between the
lowlands and uplands.
Soil, hillslope, land cover, and meteorological heterogeneity (SHLM) –
this last experiment adds heterogeneity in meteorology by prescribing the
meteorology to each tile from the overlying 4 km PCF grid. This leads to a
net increase in annual mean runoff and a decrease in transpiration. The
spatial heterogeneity of meteorology further enhances the contrast in soil
temperature between the lowlands and uplands. Furthermore, the larger
availability of water in the uplands leads to higher net primary productivity
in this region and thus a higher LAI.
Discussion
Sub-grid redistribution of water: dampening of extremes
Seeking to account for the role of sub-grid redistribution of subsurface
water within macroscale hydrologic and land surface models is not a new
objective; many schemes have been implemented over the past decades to
characterize its influence .
However, these approaches are designed primarily to account for the role of
fine-scale heterogeneity in the macroscale hydrologic response; they are
generally not meant to handle the sub-grid spatial coupling of the water,
energy, and biogeochemical cycles. LM4-HB addresses these limitations by
explicitly modeling the subsurface flow of water via horizontally and
vertically discretized characteristic hillslopes ; this
makes it possible to account for the impact of sub-grid redistribution of
water on the full gamut of land surface states and fluxes (e.g., soil
moisture, sensible heat flux, and net primary productivity). Furthermore, by
harnessing existing environmental information to parameterize sub-grid
heterogeneity, HMC ensures that the properties of the characteristic
hillslopes are formally connected to the observed physical environment.
The results from the model experiments in Sect. show that
upon enabling subsurface redistribution, the most noticeable difference at
the macroscale is the dampening of annual extremes in the water, energy, and
carbon cycles; for example, as shown in
Fig. there is a strong
decrease in the inter-annual variability of baseflow, transpiration, net
primary productivity, and sensible heat flux between the S and SH
simulations. As mentioned in Sect. , this is primarily due to
the significant increase in inter-annual change in storage when including
explicit topographic gradients; these topographic gradients enable the system
to release more water during dry years (uphill deep soil water is made
available to the riparian zone through subsurface flow) and to absorb more
water during the wet years (increase in infiltration capacity due to the
heterogeneity of soil moisture). The influence of topography on infiltration capacity and baseflow
production has been recognized for decades in hydrology
; however, its role in the coupled water, energy,
and biogeochemical cycles remains poorly understood. This study provides
insight into how a robust representation of land heterogeneity in Earth
system models makes it possible to more fully characterize the role of
topography in the coupled system.
These model experiments provide insight into the role of subsurface
redistribution at the macroscale; the underlying physical environment
provides a mechanism for ecosystems with pronounced topography to mitigate
the impacts of seasonal to annual hydrologic extremes. These results suggest
that an improved representation of spatial heterogeneity could improve
projections of ecosystem response to drought, particularly in mountainous
regions. A better representation of biophysical feedbacks to variations in
air temperature and vapor pressure deficit could also improve simulations of
land–atmosphere feedbacks that can intensify droughts and affect macroscale
circulations
. Future
work should explore how these results extend to other regions with different
configurations of the physical environment (e.g., topography). Beyond
understanding the role of the sub-grid subsurface redistribution of water,
these model experiments would also bring to light other impacts that the
fine-scale physical environment has on macroscale response.
Revisiting the application and evaluation of Earth system models
As explored in Sect. and shown in
Fig. , formally connecting the sub-grid tile
configuration to the high-resolution environmental covariates provides a
novel approach to visualize the output of the land components of Earth system
models – the simulation of each land model tile can be mapped to the
fine-scale grid. Using this approach, macroscale models are able to maintain
their existing computational and storage efficiency while providing highly
detailed local information. As discussed below, this has important
repercussions for the evaluation and application of these models.
Evaluation – as shown in Fig. , when
robustly characterized, sub-grid multi-scale heterogeneity can lead to
significant differences between the simulations at the tile (i.e., field
scale) and grid cell levels (i.e., macroscale). This discrepancy in spatial
scale is analogous to using in situ observations to evaluate and validate
Earth system models: the observations are at the field scale, yet the model
estimates are at the macroscale. The approach explored in this study allows
the Earth system modeling community to revisit this persistent challenge.
Since each fine-scale grid cell (∼ 30 m) is assigned to a land model
tile, in situ observations can be readily compared to the simulations of
their collocated model tile. This makes it possible to evaluate these models
using in situ observation networks (e.g., FLUXNET) without having to upscale
the observations or downscale the macroscale estimate. Furthermore, it also
enables the use of very high-resolution satellite products (e.g., Landsat) to
evaluate the modeled fine-scale spatial patterns. This approach enables the
Earth system modeling community to work more closely with field scientists to
further understanding of the Earth system and to accelerate the model
development cycle.
For example, this approach could facilitate improved methods for validating
soil carbon projections in ESMs. Past model–data comparisons of soil carbon
have relied on spatial models to scale soil carbon measurements to the grid
cell scale, as in the Harmonized World Soil Database (HWSD;
). Such scaling techniques can be
problematic for direct comparison with model simulations. First, the spatial
models necessary for scaling have the potential to introduce bias in the
observation-based product, and result in a comparison that is not purely
measurement-based. Second, variability in topography, ecosystem type, and
soil properties within the grid cell scale makes these comparisons
challenging to interpret: failure of a model grid cell to match scaled
observations could be due to process representation, model parameterization,
or the relative spatial coverage of ecosystem or edaphic types within the
model grid cell relative to the scaled observations. Incorrect attribution of
model error could result in inappropriate adjustments to model parameters or
processes, for example reducing the turnover rate of soil carbon pools in
uplands when a model underestimate of carbon stocks is actually due to high
carbon stocks observed in wetlands. HMC could address these issues by
facilitating direct comparison of modeled soil carbon stocks with
measurements grouped by the same properties used in the clustering analysis.
Application – Earth system models are used almost exclusively for
regional to global applications due to their coarse spatial scales, with
limited applicability to local stakeholders (e.g., farmers). Having the
ability to map the tile simulations to the fine-scale grid (∼ 30 m)
allows the community to reevaluate how these models are applied. For example,
accounting for the very high-resolution soil properties in each model tile
leads to more locally relevant soil moisture simulations; providing these
field-scale model estimates in real time can then be used to inform
irrigation requirements. Furthermore, the inherent model efficiency of this
approach facilitates robust ensemble frameworks; this enables a path towards
constraining the unavoidable uncertainty of the model predictions – even
more pervasive at higher spatial resolutions – while still providing local
detail. This novel approach to model application should be explored further
as it requires minimal increases in computational expense with potentially
large societal benefits.
Caution: convergence on the fully distributed simulation
The primary result of this study is that a relatively low number of land
model tiles (∼ 300 tiles in this study) are necessary to converge on
the macroscale response (mean and standard deviation) of the corresponding
fully distributed model. Although this result provides a promising path
forward for a robust representation of multi-scale heterogeneity in
large-scale models, as discussed below, there are limitations that must be
acknowledged and addressed in future implementations of this method.
Model structure uncertainty – the optimal tile configuration is
tied to the underlying process representation of the model. For example,
LM4-HB does not currently represent subsurface or surface interactions among
characteristic hillslopes. For this study, this translates to the model not
transferring water between the uplands and lowlands. This is an obvious
oversimplification as aquifers in the Central Valley in California (i.e.,
lowlands) strongly rely on water recharge via subsurface and surface flows
from the Sierra Nevada (i.e., uplands). Other important missing processes
include surface redistribution of water, water ponding, preferential
flowpaths, and water management, among others. As these key processes are
included, the optimal tile configuration will most likely become more complex
(i.e., more tiles). More generally, this implies that the optimal tile
configuration is model structure dependent and thus its transferability
between land surface models (and even different versions of the same model)
is limited.
Parameter uncertainty – this study uses a single realization of
plausible model parameters per 30 m grid cell over the domain. The
tile-level parameters are then derived from these high-resolution maps as
described in Sect. . All model simulations in the
study and thus the convergence analysis rely on this single realization of
model parameters. Given the known strong sensitivity of the spatial
properties and macroscale response of the water, energy, and carbon
cycles to many of these highly uncertain model
parameters (e.g., saturated hydraulic conductivity and soil depth), it
follows that the optimal tile configuration is most likely strongly dependent
on the prescribed model parameters. As this method is implemented for use in
parameter ensemble frameworks, special care must be taken to ensure that
different tile configurations are used for different parameter realizations
to ensure that the intended goal behind HMC of robustly characterizing the
multi-scale heterogeneity is still fulfilled in each ensemble member.
Model application – the search for an optimal tile configuration in
this study focuses exclusively on converging on the annual spatial mean and
spatial standard deviation of the fully distributed model. Although
appropriate for climate timescales, these coarse metrics will most likely be
insufficient for finer temporal scales (e.g., numerical weather prediction).
For example, if the objective is to model flash flooding using minute-scale
radar rainfall, the characteristic hillslopes in LM4-HB will need to more
robustly represent the spatial grid to appropriately capture the minute-scale
spatial variability of precipitation. This will likely require a large
increase in the total number of characteristic hillslopes to ensure
convergence on the macroscale response. For these types of applications other
approaches to assess convergence would be useful, including comparing runoff
histograms. Another approach would involve a direct comparison of the mapped
30 m simulations. This would provide a more complete assessment of
convergence and would be critical for field-scale applications (e.g.,
predictions of soil moisture at the farm level). In summary, the optimal tile
configuration will also be application dependent; while optimal tile
configurations derived for finer timescale applications will be appropriate
for coarser timescale applications, the opposite will rarely be true.
Proxy heterogeneity vs. process heterogeneity
The implementation of HMC throughout this study assumes that observed
characteristics of the physical environment (e.g., elevation) are robust
proxies of the heterogeneity of the biogeochemical cycles. Although this
assumption is generally adequate, it only indirectly addresses the real goal,
which is to characterize the multi-scale heterogeneity of the water, energy,
and carbon cycles. Moving forward, future development of HMC should move
beyond only clustering proxies of heterogeneity and instead focus on the
processes themselves. Following , one approach would
involve directly clustering the output from the fully distributed model.
Although unfeasible for global applications due to computational constraints,
this would provide the most robust method to ensure a comprehensive
characterization of the heterogeneity captured in the fully distributed model
with a minimal number of land model tiles. Another option would be to use
satellite remote products that directly measure the states and fluxes of the
biogeochemical fluxes at high spatial resolutions. For example, soil moisture
retrievals from Sentinel-1 would provide key
observations of the spatial distribution of saturation within a domain. Other
maps that would be useful include the Leaf Area Index from MODIS
, and evapotranspiration from Landsat
, among others. Although biased, these data would
provide a more formal connection between the model and observed
heterogeneity. It would also open a novel path to assimilate these
high-resolution products into land models at field scales.
Applying HMC over the globe: assembling optimal tile configurations
Section illustrates how a relatively small number of land
model tiles are necessary to explicitly characterize the underlying sub-grid
heterogeneity in this study's domain – this is substantially less than the
810 000 grid cells of a corresponding 30 m fully distributed simulation.
This provides the best trade-off for large-scale models: it explicitly
captures the role of sub-grid fine-scale features while maintaining
computational efficiency. However, although this study does provide a
preliminary exploration of the HMC parameter space (k, Δh, and
p), it does not provide a robust approach to find the optimal HMC
parameters in different regions over the globe. Three approaches that are
outlined below could be used to accomplish this goal.
The most direct approach is to optimize the HMC parameters on all macroscale
grid cells for a given grid size over the globe. This would be accomplished
through parameter optimization techniques
e.g.,. For each parameter set, HMC would be
run to create the model tiles and then LM4-HB would be used to run a
simulation to characterize its long-term macroscale states and fluxes.
Convergence on the fully distributed simulation would be attained at the
parameter set that leads to the fewest number of tiles while ensuring the
macroscale states and fluxes have converged within a user-defined tolerance.
This would lead to a robust representation of sub-grid heterogeneity albeit
requiring substantial computational resources.
A more computationally efficient path forward is to use the first approach on
only a subset of macroscale grid cells or catchments. These domains would be
chosen such that they sample comprehensively from different climate, soil,
land cover, and topographic regimes throughout the globe. The HMC parameters
would be optimized independently for each of these domains. Machine learning
(e.g., random forests) would then be used to regionalize these optimized
parameters by deriving non-linear functional relationships between the
optimized parameters and a suite of summary macroscale metrics (e.g.,
standard deviation of elevation, grid cell area, and average precipitation,
among others). This approach would provide a method to assemble the optimal
HMC parameters for a chosen region without having to resort to optimizing the
parameters for a new domain.
Finally, another option is to rely exclusively on the data that are used
within HMC. The primary goal behind clustering these data is to extract all
the relevant information and minimize redundancies. Assuming that the water,
energy, and biogeochemical fine-scale features are tightly coupled to the
observed environment, then ensuring that the mapped clustered input data
matches the original fine-scale maps would ensure that the model results will
provide a robust representation of the defined sub-grid heterogeneity. This
approach would provide a method to define the optimal number of land model
tiles without having to resort to model simulations, thus making the
parameter selection less model dependent.
Clustering: expanding beyond natural soil systems
The strong covariance between the different proxies of the drivers of spatial
heterogeneity makes it possible to summarize a high-dimensional proxy space
(i.e., environmental data) using a relatively small number of representative
clusters. HMC capitalizes on this covariance to characterize the fine-scale
heterogeneity of natural soil systems. However, the covariance of
environmental properties is not exclusive to natural soil systems and can be
extended to other systems over land, including urban areas, croplands, water
bodies, and glaciers. This section explores both the data that are available
for these types of systems and how clustering can be used to extract their
most representative characteristics.
Lakes and glaciers – Earth system models represent lakes and glaciers
as model tiles over land. Each lake tile is characterized by a set of
properties including depth and area; these can be obtained from existing
global lake inventory databases. These data have information such as
shoreline length, water volume, and average depth, among others (e.g.,
HydroLakes; ). For each macroscale grid cell,
clustering could be used to define a set of characteristic lakes with their
associated representative properties. A similar approach could also be used
to identify a grid cell's characteristic glaciers by clustering the
properties associated with the glaciers within that cell; rich global glacier
inventory databases such as GLIMS could then be
harnessed within ESMs.
Urban areas – recognizing the important role that urban areas play in the
coupled system, the community is actively incorporating them into ESMs
through urban canopy models (UCMs) . UCMs represent
urban areas through a set of characteristics including roof fraction,
building height, and canyon fraction, among others. Although there is
currently no comprehensive database that provides the characteristics for all
urban areas over the globe, there are emerging efforts to make this
information available (e.g., WUDAPT; ). As these
data become accessible, clustering could also be used to distinguish the
characteristic urban features within a model grid cell (e.g., high vs. low
buildings).
Croplands – although Earth system models include croplands, their representation
is generally oversimplistic. For example, in most cases, irrigation practices
are ignored and many different phenological and physiological differences
between crops are disregarded (e.g., rice vs. corn). Although far from
complete, datasets are emerging that are able to provide this information at
moderate to very high spatial resolutions over continental to global extents
.
These data provide metrics that summarize local cropland characteristics
(e.g., C3/C4) and irrigation practices (e.g., irrigated/rainfed). This
information can be summarized per grid cell via clustering to provide a more
complete representation of sub-grid cropland characteristics. Although this
study explores this possibility using the CDL database, further work is
necessary to more adequately account for the sub-grid variability in crop
characteristics and irrigation practices.
Peatland and permafrost landscapes – in wet and high-latitude regions
characterized by organic matter accumulation in peat and permafrost, spatial
heterogeneity can be crucial to understanding regional carbon stocks. For
example, found that peat and lake sediments
account for more than 70 % of carbon stocks despite covering only 33 % of
the land area in a northern Wisconsin landscape. Likewise, in permafrost
systems microtopographic variations driven by ice-wedge polygon formation can
dominate spatial variability of carbon cycling
. While spatially explicit
modeling of these complex landscapes can yield important insights about
ecosystem function and vulnerability to climatic changes
e.g.,, the reduced
computational demands of HMC could facilitate incorporation of these
important dynamics into macroscale simulations.
Conclusions
A robust representation of the influence of the multi-scale physical
environment on the coupled terrestrial water, energy, and biogeochemical
cycles remains a persistent challenge in Earth system models. One of the
principal obstacles is the oversimplification of the observed complex
heterogeneity within these models. This is primarily due to a limited
understanding of how to use the available petabytes of environmental data
effectively and efficiently in macroscale models.
Unsupervised machine learning, and more specifically cluster analysis,
provides a path forward by capitalizing on the observed landscape similarity
to extract the underlying defining features (i.e., clusters) from available
environmental data. The hierarchical multivariate clustering (HMC) approach
presented here takes this a step further by taking advantage of these
clustering techniques while also ensuring physically consistent surface and
subsurface interactions between clusters through discretized characteristic
hillslopes.
A series of different tile configurations computed via HMC is used within the
LM4-HB model to quantify its added benefits to macroscale models. The model
experiments over a 1/4∘ grid cell in central California show that
(1) the observed similarity over the landscape makes it possible to robustly
account for the role of multi-scale heterogeneity in the macroscale states
and fluxes with a relatively minimal number of sub-grid land model tiles;
(2) assembling the sub-grid tiles from the observed high-dimensional
environmental data can lead to important differences in the macroscale water,
energy, and carbon cycles; (3) connecting the fine-scale grid to the model
tiles via HMC makes it possible to circumvent the scale discrepancies between
the macroscale and field-scale estimates – this has significant implications
for how Earth system models are evaluated and applied.
HMC illustrates a path towards improving the representation of land
heterogeneity in ESMs by harnessing the available petabytes of environmental
data. However, HMC only scratches the surface of what is possible. Moving
forward, these approaches could be extended beyond natural systems to managed
systems (urban areas, croplands, reservoirs, and pumping, among others).
Clustering techniques could also be applied for non-soil systems including
lakes and glaciers. Furthermore, although using hillslopes as the governing
hydrologic structures is appropriate for the domain used in this study, this
will not be true everywhere (e.g., flat terrain). In these cases, the
hierarchical approach could be relaxed or extended to include other
structures including stream orders and basins.
Finally, the volume and complexity of data available today pale in comparison
to what will be available in the coming decades .
These data will provide unique opportunities for Earth system science;
however, unless methods are developed that can harness these data, their
intrinsic value to improve our understanding of the Earth system will be
limited. We encourage the Earth system modeling community to pursue the use
of clustering techniques to ensure these data are used effectively and
efficiently in macroscale models.