Adaptive clustering: Reducing the computational costs of distributed (hydrological) modeling by exploiting time-variable similarity of model elements

. In this paper we propose adaptive clustering as a new method to reduce computational efforts of distributed modelling. It 10 consists of identifying similar acting model elements during runtime, clustering them, running the model for just a few representatives per cluster, and mapping their results to the remaining model elements in the cluster. Key requirements for the application of adaptive clustering are the existence of i) many model elements with ii) comparable structural and functional properties and iii) only weak interaction (e.g. hill slopes, sub catchments or surface grid elements in hydrological and land surface models). The clustering of model elements must not only consider their time-invariant structural and 15 functional properties, but also their current state and forcing, as all these aspects influence their current functioning. Joining model elements into clusters is therefore a continuous task during model execution rather than a one-time exercise that can be done beforehand. Adaptive clustering takes this into account by continuously checking the clustering, and re-clustering when necessary. We explain the steps of adaptive clustering and provide a proof-of-concept at the example of a distributed, conceptual 20 hydrological model fit to the Attert basin in Luxembourg. The clustering is done based on normalized and binned transformations of model element states and fluxes. Analysing a 5-year time series of these transformed states and fluxes revealed that many model elements act very similarly, and the degree of similarity strongly varies with time, indicating the potential for adaptive clustering to save computation time. Compared to a standard, full-resolution model run used as a virtual reality ‘truth’, adaptive clustering indeed reduced computation time by 75 percent, while model quality, expressed as 25 Nash-Sutcliffe efficiency of sub catchment runoff, declined from 1 to 0.84. Based on this proof-of-concept application, we believe that adaptive clustering is a promising tool for reducing the computation time of distributed models. Being adaptive, it integrates and enhances existing methods of static grouping of model elements, such as lumping or grouped response units (GRU). It is compatible with existing dynamical methods such as adaptive time-stepping or adaptive gridding, and unlike the latter, does not require adjacency of the model elements to be joined.

or discharge at a catchment outlet, a spatially lumped representation in a model will suffice. Models designed for such coarse spatial resolutions, such as Topmodel (Beven and Kirkby, 1979) or HBV (Bergström, 1976) are easy to set up and computationally highly efficient, but necessarily conceptualize process patterns and redistribution processes and the underlying controls by means of effective dynamical laws, effective states, effective parameters and effective fluxes.
Often, however, we want to analyse and predict hydrological systems in higher spatial detail, which requires spatially 5 distributed models. In distributed models, spatial variability of hydrological systems is captured by dividing the model domain into sub domains, which are assumed to be internally homogeneous with respect to their main structural and functional properties. Such model elements have been referred to as hydrological response units HRU (Flügel, 1995;Kouwen et al., 1993), representative elementary areas REA (Wood et al., 1988), or representative elementary watersheds REW (Reggiani et al., 1998). Beyond incorporating the spatial variability of the hydrological system, distributed models 10 offer additional advantages: They incorporate distributed forcing, they can be parameterized and validated by distributed observations, and they permit more fundamental process representations (Kouwen et al., 1993). Distributed, physically based models such as MIKE SHE (Abbott et al., 1986), HYDRUS (Šimunek et al., 1999) or CATFLOW (Zehe et al., 2001) therefore have the desirable property of providing physically meaningful, distributed answers based on distributed internal dynamics. 15 The major drawback of distributed models is their large demand of high-resolution data for model setup and operation, and a CPU demand that rapidly grows with system resolution. The question about the optimal balance of spatial resolution and computational burden has therefore been a long-standing issue (not only) in the hydrological sciences (Melsen et al., 2016;Liu et al., 2016;Dehotin and Braud, 2008;Booij, 2003;Gharari et al., 2020). In this context, a range of methods have been proposed to address the computational problem: It can be crushed by massive parallel computing (Kollet, 2010), or reduced 20 by avoiding redundant computations. Redundancy occurs if several model elements act similar, such that knowing the behaviour of one is a good proxy for the behaviour of the others. In this context, and throughout the remainder of this text, we define similarity of two model elements as follows: "Two model elements act similarly if they share similar structural and functional properties, are in a similar state and exposed to similar forcing, such that they produce similar responses based on similar internal fluxes and state changes" (see also Zehe et al., 2014). Similarity -and its counterpart, redundancy -25 among model elements can be considered as a time-invariant (static) or time-variant (dynamic) phenomenon, and methods for redundancy reduction have been proposed on the basis of either of these views. Grouped response units GRU (Kouwen et al., 1993) for example rely on the static similarity paradigm. GRUs are groups of HRUs close enough to be subject to uniform forcing and negligible differences in routing. All HRUs in a GRU are then treated as a single computational unit. This considerably reduces computational effort, but it comes at the cost of losing spatial detail and spatial positioning. Time-30 variant (or adaptive) methods do not rely on a single, time-invariant grouping. Instead, groups of model elements are dynamically established and adjusted during model run time by identifying and exploiting patterns of similarity in either time or space. Adaptive time stepping (Minkoff and Kridler, 2006) exploits patterns of similarity in time, adaptive gridding (Pettway et al., 2010;Berger and Oliger, 1984) exploits patterns of similarity in space; combinations of both approaches are possible (Miller et al., 2006). Due to their generality, adaptive methods have been used to improve distributed modelling of a 35 large variety of systems such as the universe (Teyssier, 2002), the atmosphere (Bacon et al., 2000;Aydogdu et al., 2019), oceans (Pain et al., 2005), and groundwater systems (Miller et al., 2006). While adaptive methods are highly useful, they all require direct adjacency -in either time or space -of the model elements to be joined. However, similarity, in both nature and models, is not necessarily restricted to contiguous regions. For example, there may be many non-contiguous southfacing forested hillslopes with shallow soils in a watershed, in an intermediate wetness state, which will act very similarly at 40 a particular sunny day.
In this context, we suggest a new adaptive method for clustering of model elements, which is not limited to contiguous regions. It is motivated by the suggestions of Melsen et al. (2016) "to further investigate and substantially improve the representation of spatial and temporal variability in large-domain hydrological models" and contributes to solving the computational challenges of hydrological modelling as formulated by Clark et al. (2017). It comprises several steps: Clustering of model elements, choice of cluster representatives, mapping of results from representatives to recipients, and continuous evaluation of the clustering to decide when re-clustering is needed. We demonstrate adaptive clustering at the example of a distributed, conceptual hydrological model of the Attert basin in Luxembourg. Besides evaluating adaptive 5 clustering in terms of computational gains and related losses of modelling quality, we also discuss how the normalized and binned representations of model states and fluxes that we used for clustering contribute to hydrological system analysis by revealing space-time patterns of similarity in the catchment.
The remainder of the manuscript is structured as follows: In Sect. 2, we first describe the general, application-independent steps of adaptive clustering. This is the key methodological contribution of the paper. We then introduce the SHM 10 hydrological model and its set-up for the Attert basin in Luxembourg. The model constitutes the test environment for the proof-of-concept of adaptive clustering. We then describe how adaptive clustering is implemented in the SHM Attert model, and finally describe our approach and metrics used for evaluating adaptive clustering and measuring hydrological similarity.
In Sect. 3, we present and discuss results from distributed modelling with and without adaptive clustering, and compare them to a range of benchmark models in terms of computational efficiency and model quality. In the same section, we show 15 results of the hydrological similarity analysis. These results are relevant in relation to adaptive clustering, as the time-varying degree of similarity among model elements directly controls adaptive clustering. In addition, they are also useful for hydrological system analysis. In Sect. 4, we summarize the results, draw conclusions, discuss limitations of adaptive clustering and suggest further research.

Adaptive clustering
As explained in the introduction, the main goal of adaptive clustering is to reduce computational efforts of distributed and high-resolution modelling. The main idea is to avoid redundant computations by clustering similar acting model elements, and then to infer the dynamics of all elements in a cluster from just a few representatives. Key requirements for the successful application of adaptive clustering are the existence of i) many model elements with ii) comparable structural and 25 functional properties and iii) only weak interaction. If there are only few model elements, there will be nothing to cluster, if they are not structurally and functionally similar, it will be impossible to assign results from representatives to the remaining cluster members (recipients), and if there is strong interaction, ignoring it -which is inevitable in adaptive clustering -will cause large modelling error. It is important to keep in mind that even if two model elements are identical with respect to all time invariant (structural) properties, they can still act differently when starting from different initial conditions, or when 30 exposed to different boundary conditions. Therefore, while similarity among model elements can have a strong time invariant component, and static clustering can be beneficial, the full potential of clustering will be exploited if it is treated as time-variant (Loritz et al., 2018). This is the core idea of adaptive clustering. Its main steps are illustrated in Fig. 1, and we will explain the method along steps 'a' to 'j' in the plot.
Step 'a': Start the model from a fully distributed (non-clustered) initial state. Each model element -depicted by a circle -is in 35 a particular initial stateindicated by the value in the circle. Model elements typically possess several state variables and hence an array of state values, but for simplicity only a single one is shown in the plot.
Step 'b': Based on the similarity of their states, combine the model elements in clusters. In the plot, the clusters are depicted by bold circles and labelled 'A', 'B' and 'C'. The clustering involves two important choices: Choice of a suitable clustering algorithm and values of its hyper parameters, and choice of a state variable by which the clustering is done. In the following, 40 we will refer to this variable as 'clustering control variable'. The clusters, determined with states at the current time step, will until further decision (see step 'h') be used for all further modelling time steps. For these time steps, we refer to clusters determined in the past as 'inherited clusters'.
Step 'c': Select from each cluster a subset of model elements. These serve as cluster representatives. In the plot, the representatives are indicated by blue circles. The number of representatives per cluster controls the performance of adaptive clustering: A large number will guarantee high modelling quality, but small computational gains, and vice versa. 5 Step 'd': Execute the model for the next time step, but only for the representatives. From running the model, the representatives obtain updated values for each of their state variables. In the plot, the updated states are indicated by red colour.
Step 'e': The representatives 'donate' their updated states (and fluxes) to all recipients in their cluster by using a suitable mapping technique. In the plot, this is indicated by arrows. Note that due to the mapping, conservations laws are potentially 10 violated. This is a drawback of adaptive clustering and requires further attention.
Step 'f': Based on the updated states of the clustering control variable, combine the representatives into a new set of clusters.
In the plot, the new clusters are depicted by bold circles labelled 'I' and 'II' in red colour. These clusters may differ from the inherited clusters: When the model is executed in step 'd', each representative is driven by its particular forcing, which potentially leads -even within a cluster -to a divergence of states. Clusters may therefore break apart, unite, or exchange 15 elements as the states of the model elements evolve over time. Inherited clusters may therefore at some point become invalid and must be replaced by an up-to-date version.
Step 'g': Compare the new clusters to the inherited clusters. Please note that the new clustering in step 'f', and the cluster comparison is done only for the representatives. This is much more efficient than considering all model elements.
Comparing clusters involves identifying matching clusters, and then measuring their degree of agreement. In the plot, this is 20 illustrated by a table: Each column represents one of the inherited clusters, each row one of the new clusters. Matching clusters are indicated by cells with blue background colour ('A' and 'I', 'B' and 'II', 'C' has no match). The larger the number of representative model elements in matching clusters, the higher the agreement of the new and the inherited clusters.
Step 'h': Decide if the agreement of the new and the inherited clusters is sufficiently high. If the answer is 'Yes', the inherited clusters are still valid, and steps 'd' to 'h' are repeated for the next time step. If the answer is 'No', the inherited clusters are 25 replaced. Obviously, the new clusters can be used as a replacement, but they only contain the representatives. Recipients can be assigned to the new clusters based on their current state. The problem is, however, that their states were transferred from the representatives, and depending on the mapping method, they may be more or less averaged, smoothed states. If these values are used for clustering, there is a risk that recipients are always clustered in the same manner, limiting the model's ability to adapt to changing conditions and to represent heterogeneous situations. This risk can be reduced by operating the 30 model in full resolution for some time, as explained in steps 'i' and 'j', allowing the recipient model elements to evolve towards their particular state.
Step 'i': From the current time step, jump back in time. In the plot, this is indicated by a curved arrow extending back over two time steps.
Step 'j': Set the model to a fully distributed (non-clustered) mode. In the plot, this is indicated by all model elements arranged 35 in a row, without surrounding clusters. Starting from the state of the model elements at the jumped-to time, execute the model in full resolution until the current time step is reached again. Generally, the length of the jump back is a trade-off between enabling the recipient model elements to evolve towards their particular states free from cluster constraints, and additional computational expenses. Based on the new states of all model elements, continue at step 'b'.

Study area and hydrological model
The Attert basin, our test site, is located in the central western part of the Grand Duchy of Luxembourg and partially in 5 Eastern Belgium with a total catchment area of 288 km² up to gauge Useldange (Fig. 2). The landscape shows topographical, geological and pedological diversity, with a small area underlain by sandstones in the South and Northeast, a wide area of sandy marls in the centre part, and an elevated region underlain by schist in the North, which is part of the Ardennes massif.
The schist region reaches elevations up to 539 m a.s.l. and contains deeply incised river valleys. The Attert basin is situated in the temperate oceanic climate zone, and snow-related processes play a negligible role. Precipitation is mainly associated 10 with westerly synoptic flow regimes and reaches annual amounts of about 850 mm (Pfister et al., 2005;Pfister et al., 2000).
We selected the Attert basin for several reasons: A large body of existing hydrological knowledge (Pfister et al., 2009;Juilleret et al., 2012) including modelling studies (Fenicia et al., 2014;Fenicia et al., 2016), access to a comprehensive data set compiled in the CAOS (Catchments as Organized Systems) project (Zehe et al., 2014), and own prior modelling studies in the Colpach, a sub basin of the Attert, that revealed pronounced and time-variable similarity of model element behaviour (Loritz et al., 2018).
Instead of using one of the existing hydrological models for the Attert basin, we decided to set up a new one. This was mainly to ensure full code control, which greatly facilitates prototyping and testing of adaptive clustering. We chose a simple, conceptual, yet distributed model architecture tailored to the structure and hydrological function of the Attert basin. 5 It is closely related to established hydrological models such as HBV (Bergström, 1976), and due to its simplicity, we named it 'SHM' (Simple Hydrological Model). Its general structure and process inventory is explained in detail in Appendix A.1.
The setup of SHM for the Attert basin is described in Appendix A.2, and its multi-criteria calibration and validation -based on five years of hourly data -in Appendix A.3. Overall, the model achieves acceptable performance (Nash-Sutcliffe efficiency of 0.73 for validation), making it a suitable test bed for exploring adaptive clustering. 10

Implementation of adaptive clustering in SHM Attert
In this section, we explain how adaptive clustering is implemented in the SHM Attert model. We do so along the lines of the general steps 'a' to 'j' of adaptive clustering as described in Sect. 2.1 and Fig. 1. 20 For clustering of model elements -SHM sub catchments in our study -(step 'b'), we apply a straightforward yet effective approach based on binning: First, all model states of all sub catchments are normalized to [0,1] values. The state variable-and sub catchment-specific minima and maxima required for normalization were obtained from running the modelin full resolutionfor the entire five years of available data. The [0,1] value range is then subdivided into 64 bins of uniform width.
Choosing the number of bins was guided by the objective to balance resolution (many bins) and sufficiently populated bins (few bins). All sub catchments with normalized values of the clustering control variable falling into the same bin are assigned to the same cluster. Each non-empty bin therefore defines a cluster. The possible number of clusters is limited to a 5 minimum of one and a maximum of 64, and the number of clusters at a given point in time expresses the degree of similarity among the sub catchments at that time. We selected sub catchment runoff (qcat,out, see Fig. A1 and Table A1) as a single clustering control variable for three reasons: Firstly, for catchment hydrologists runoff is the main variable of interest, secondly, sub catchment runoff is influenced by all sub catchment states and fluxes, hence similarity of two sub catchments with respect to their runoff is a reasonable single-value indicator of overall similarity, thirdly we used only a single control 10 variable to keep things simple.
For each cluster, representatives are selected (step 'c') by random picking controlled by three parameters (see Table 1): Perc_reps defines the total number of representatives, expressed as percentage of the total number of sub catchments in the model. Applied to each cluster, it provides a first estimate about how many representatives should be picked from it. We found that besides controlling the total number of representatives, it is also useful to set a limit to the minimum and 15 maximum number of representatives per cluster. This is controlled by parameters min_reps_per_clus and max_reps_per_clus.
Mapping states and fluxes from cluster representatives to recipients (step 'e') applies the normalized values already used for clustering: Recipients are forced to assume the representative's normalized state (or flux), and these normalized states (or fluxes) are then re-converted by each recipient's min-max range to dimensionful values. If there is more than one 20 representative in a cluster, a single best one is selected as the representative closest to the median value of the clustering control variable of all representatives. The results of that single best representative are then mapped to all recipients. This method clearly leaves room for improvement, but we considered it good enough for a first proof-of-concept.
Comparing inherited clusters with current clusters (step 'g') involves two steps: Identifying matching clusters, and then measuring the degree of agreement between them. Note that with respect to the first step, it is not possible to simply define 25 matching clusters as those with the same bin number. For example, assume that the inherited clusters are determined, and afterwards uniform rainfall falls in the catchment, uniformly shifting all sub catchment states to a higher, 'wetter' bin.
Clustering the sub catchments with the new states will yield clusters identical to the inherited ones, but the cluster labels will have changed. We used the well-known Hungarian method (Kuhn, 1955;Munkres, 1957), which matches clusters by maximizing the agreement of their content instead of comparing their labels. When the cluster matches are established 30 (compare the table in Fig. 1), the degree of similarity between the clusterings is measured by the number of elements in matching clusters divided by the total number of elements (in Fig. 1, this is the number of elements in the blue cells divided by the total number of elements in the table). Clustering similarity is hence expressed by a number between zero (clusterings are incomparable) and one (clusterings are identical).
The inherited clusters are replaced (step 'h') if clustering similarity falls below an acceptance limit set by sim_crit (Table 1). 35 The jump back in time (step 'i') is controlled by parameter sim_uncrit, which like sim_crit is a similarity threshold: The jump goes back to the last time at which this threshold was still exceeded. Depending on the prevailing hydro-meteorological situation, the jump can be shorter or longer, but it will never extend beyond the time at which the inherited clusters were established.
All parameters controlling adaptive clustering are summarized in Table 1. For the SHM Attert application, we determined 40 their values by manual, iterative trial-and-error with the objective of maximizing computational savings while minimizing quality loss.

Experimental design and evaluation criteria
The existence of time-variant similarity among model elements is a precondition for a useful application of adaptive 5 clustering (see the related discussion in Sect. 2.1). We therefore precede the analysis of adaptive clustering performance with an analysis of space-time patterns of similarity among SHM Attert sub catchments. The approach and related metrics are explained in Sect. 2.4.1. In Sect. 2.4.2, we describe SHM Attert model variants used as benchmarks for adaptive clustering, and we introduce the evaluation criteria for measuring both computational effort and simulation quality of the competing models. 10

Entropy as a measure of hydrological similarity
How to measure the similarity among sub catchments, and its variation with time? Sub catchments differ in size, and many of their states and fluxes are size-dependent. Therefore, instead of directly comparing their values, we use the (0,1)normalized and binned time series of states and fluxes of all sub catchments as described in Sect. 2.3, step 'b'. At each point in time, the occupations of the 64 bins together form a histogram, which can be normalized to a discrete probability 15 distribution by dividing the bin populations with the total number of sub catchments. The overall degree of similarity among sub catchments can then be measured -in the same manner for any state or flux of interest -by Shannon information entropy in the universal unit of 'bit' (Eq. 1). We adopted this approach from Loritz et al. (2018); a more detailed introduction to concepts, measures and applications of information theory is given in Neuper and Ehret (2019), Singh (2013), and Cover and Thomas (1991). 20 As a basis for the similarity analysis, we operated the SHM Attert model in full resolution (no clustering) for the entire 5- year period of available data and then converted all sub catchment states and fluxes to normalized and binned values. Spatial maps of these states are then used to analyze spatial patterns of similarity, and time series of entropy to analyze temporal It is an interesting property of Shannon entropy that for a discrete distribution with a given number of bins, there exists an upper and a lower bound: If all elements fall into a single bin, the entropy of the distribution will take its minimum value, zero. If the elements are uniformly distributed over the bins, entropy will take its maximum value of = 2 ( ), where n is the number of bins. As we use the same 64 bins for all variables, the same lower bound of zero and the same upper bound of six bit applies to all of them, which facilitates comparison. In terms of similarity, entropies close to zero indicate a high 5 degree of similarity among model elements, entropies close to six indicate low similarity.

Evaluation criteria and benchmark models for adaptive clustering
With respect to adaptive clustering, two aspects are important: Computational savings, and related losses of modeling quality. The savings we measure by overall model run time, as it measures the entire modeling effort, i.e. the effort for operating the actual hydrological model and the effort for the adaptive clustering overhead. In order to make run times 10 comparable, we performed all model runs on the same machine and with no additional processes active. We verified the reproducibility of the results by repeating the runs many times: The observed spread was less than one per cent of total run time and therefore considered negligible. Modeling quality we measure by Nash-Sutcliffe efficiency (NSE) of sub catchment runoff qcat,out., as sub catchment runoff is a comprehensive single-valued indicator of overall sub catchment state, and NSE is the best-known quality measure in hydrology. We calculate NSE in a distributed manner, i.e. separately for each sub 15 catchment runoff, and then take its mean, weighted by the area of each sub catchment. Unlike directly calculating NSE of discharge at the basin outlet, this avoids potential compensation of under-and overestimations in particular sub catchments when aggregating their discharge in the river network.
As for similarity analysis, we use the results of a fully distributed model run for the entire 5-year time series of available data as a virtual reality benchmark. This 'reference' run was created by operating SHM Attert in a standard mode, i.e. without any 20 adaptive clustering functionality implemented. In addition to the 'reference' run, we established further benchmark cases: For the 'static' benchmark, we implemented the adaptive clustering functionality into SHM Attert, but set its parameters such that throughout the entire model run, each sub catchment was treated separately, and any clustering was suppressed. This means that adaptive clustering was in action, causing its computational overhead, but nevertheless the model was operated in the same fully distributed manner as the 'reference' run. We also established a 'static optimal' benchmark based on an offline, 25 prior similarity analysis of the sub catchments: All sub catchments with identical structural properties -except size -and identical forcing were joined into a set of time-invariant clusters (see Table 2). As we set up the SHM Attert model in a straightforward manner, with sub catchment parameters varying only between different geology and land use classes, and sub catchment forcings varying only between rain gauges, the 173 sub catchments could be grouped into only 24 timeinvariant, yet optimal clusters. 'Optimal' here means that there is no within-cluster variability -except size, which vanishes 30 due to the (0,1)-normalization -and any single sub catchment picked from a cluster is a perfect representative of all others. This is of course a simplified and idealized case due to the simplified set up of the model. Adding further structural properties and forcing, and in higher resolution, will result in more clusters. Nevertheless we used the 'static optimal' benchmark to evaluate the merits of advance knowledge about time-invariant sub catchment similarity. Its model setup and operation was equal to the 'static' case, but this time the 24 static optimal clusters were used, instead of treating each sub 35 catchment as a single cluster. Table 2. Sub catchments of SHM Attert, grouped into 24 time-invariant clusters by agreement in attributes geology, land use and meteorological forcing. The clusters are used in the 'static optimal' benchmark model. Sub catchment locations are shown in Fig. 2. The possible number of unique combinations of geology, land use, and rain gauge is 3 * 5 * 3 = 45. The cluster sizes range from 1 to 28, the average number of elements in a cluster is 7.2.  , 22, 26, 47, 71, 74, 82, 88, 94, 95, 96, 97, 98, 106, 108, 110, 116, 118, 157, 161, 168, 169 2 1 RCL 28 29,34,38,44,48,51,53,62,64,67,70,79,83,84,85,90,92,99,111,117,119,138,139,146,147,150,158 Table A2)

Results and discussion
We first show results from analysing sub catchment similarity, as it is a precondition for adaptive clustering, and then results from comparing models runs with adaptive clustering to benchmark cases. We also discuss if, and to which degree, spacetime patterns of similarity apparent in the fully distributed 'reference' model run are preserved by adaptive clustering.

Results for hydrological similarity 5
We discuss hydrological similarity by three aspects: Time-variant behavior, time-averaged values, and spatial patterns. For the first, time series of Shannon entropy for selected variables of the fully distributed 'reference' model run are shown in Fig.   3, panel 'a'. For better visibility, the plot is restricted to a single year. High entropies indicate little similarity among sub catchments, low entropies indicate high similarity. It is apparent that for all variables, entropies remain well below the benchmark maximum entropy (shown as red line), and often they are close to zero. This indicates that many of the sub 10 catchments act similarly, leading to redundancies in modeling which can potentially be reduced by adaptive clustering. Also, entropies vary with time. While the variability differs among variables (e.g. high for interflow storage si, and low for base flow storage sb), it is present for all of them, which emphasizes that clustering should be done in a time-variant manner. The entropy of the clustering control variable, qcat,out, shows a high correlation with discharge magnitude as shown in panel 'b': In times of rising and high discharge, entropies are high, which is likely due to i) the interplay of spatially distributed 15 precipitation and catchment states, and ii) the onset of fast runoff components, which may differ among sub catchments. As   Table A1). 'Uniform' indicates the benchmark maximum entropy of 6 bit for a 64-bin distribution. Black vertical lines indicate times for which spatial maps are shown in Fig. 4. Panel b: Discharge time series at catchment outlet gauge Useldange. 'Observed' are observations, 'reference' are results from the fully distributed 'reference' model run, 'adap-c' are from an adaptive clustering run with optimized parameters as shown in Table 1. Table 3. Like in Fig. 3, the values differ quite 5 substantially among the variables, with precipitation 'p' showing the lowest entropy of only 0.1 bit, and base flow related variables sb and qb,out showing the highest entropy of 2.88 bit. The low value of precipitation entropy can be explained by two effects: Firstly, during the frequent times of no rain, precipitation entropy is also zero as all stations show the same value, and secondly, even if it rains, at most three different bins of the distribution can be occupied as precipitation is measured by only three stations. This limits precipitation entropy to a possible maximum of log2(3) = 1.58 bit. The high values for base 10 flow can be explained by the pronounced, geology-induced differences of the base flow behavior across the catchment (compare the values of kb in Table A3), and the fact that due to the slow-changing nature of base flow, these differences prevail for a long time, keeping entropies high throughout the year (see Fig. 3, panel 'a'). Interestingly, the entropies of several variables are identical (qi,in, qb,in, and qu,out; si and qi,out; sb and qb,out). This is not a coincidence, but a consequence of how they are related: qi,in, and qb,in are percentages of qu,out; runoff from both the interflow and the base flow reservoir are 15 linear functions of the respective storages (see Fig. A1 and Table A1). All of these relations are entropy-preserving transformations, i.e. the entropies of all variables involved are necessarily equal.  Fig. 4 shows spatial patterns of normalized and binned values of the clustering control variable qcat,out for selected points in time. Plots in the left column ('a'-'e') are based on the 'reference' run, and we will focus on these in the following. We selected the times such as to cover a wide range of different hydrological situations (compare the black vertical lines in Fig.  25 3). Plots 'a' and 'b' are both in spring and related to the same rainfall-runoff event, the last in a sequence of three, with plot 'a'

Time-averaged entropies for all SHM state and flux variables are shown in
showing the values just before the onset of precipitation, and plot 'b' at the time of peak runoff. Comparing the plots it is obvious that the general magnitude of runoff has increased, indicated by the main colours shifting from red (low values) to yellow (intermediate values). Additionally we see that the spatial pattern of similarity also shifted from a geology-dominated pattern, reflecting the geology-based parametrization of sub catchments, to a pattern reflecting the joint influence of both geology and the spatial distribution of rainfall (see geological map and rain gauge areas of influence in Fig. 2). Interestingly, while the grouping of the sub catchments into clusters obviously changed, the overall number of clusters did only increase by two, from 12 to 14, indicating that the overall degree of similarity among sub catchment remained largely constant. The next 5 plot, 'c', shows a very different situation at the end of a long summer drought: Most sub catchments show very low runoff, only the sand stone areas, where groundwater flow dominates, maintain runoff above their absolute minimum. Overall, the entire catchment is in a very homogeneous state and sub catchments group into only three clusters. This state of high similarity comes to a sudden end with the onset of precipitation (plot 'd'), increasing the diversity of sub catchment runoff to 18 clusters and the development of a spatial pattern which is mainly influenced by rainfall spatial distribution and only to a 10 lesser degree by geology: the sand stone area is not as clearly separated from the other geologies as usual. Finally, after a period of extended rainfall (plot 'e'), a spatial pattern similar to the initial one in plot 'a' has re-established, but overall runoff magnitudes are still lower; a heritage of the long dry summer.
Altogether, the analysis of time-variant behavior, time-averages, and spatial patterns of similarity reveals several important points: i) for most variables, there is pronounced hydrological similarity among sub catchments, ii) similarity is time-variant, 15 and iii) the spatial patterns of similarity and their variation with time are in accordance with hydrological reasoning, which increases our confidence that expressing similarity among sub catchments by entropies is reasonable. In the next section we discuss to which degree, and at which price, adaptive clustering can capitalize on this. run, values in the right column are from an adaptive clustering run with optimized parameters as shown in Table 1.

Results for adaptive clustering
As explained in Sect. 2.4.2, we evaluate adaptive clustering for both computational effort and associated quality losses against several benchmarks. Fig. 5 shows the results as a two-dimensional plot. The black square indicates the 'reference' run, which corresponds to the standard case of running a model run in full resolution and without any adaptive clustering functionality. It took 816 seconds, and as the 'reference' run is our virtual reality 'truth', the model shows perfect simulation 10 quality, indicated by an NSE of 1. When adaptive clustering functionality is integrated into the model, but by choice of its parameters a fully distributed run is enforced (the 'static' benchmark case), the model still shows perfect simulation quality, but the overhead of adaptive clustering increases computation times by 707 seconds to a total of 1523 seconds (black triangle in Fig. 5). This is almost a doubling compared to the 'reference' case, a computational extra cost which clustering needs to over-compensate in order to be worth the effort. This is indeed the case, even for the simple 'static optimal' benchmark case (red triangle in Fig. 5): Representing 173 sub catchments by 24 representatives (one per cluster, see Table 2) reduced computation time to 233 seconds, despite the overhead, at no loss of quality. How does time-variable, adaptive clustering 5 compare to that? The blue dots in Fig. 5 depict results for selected parameter choices of adaptive clustering (we tested many but only show the pareto-optimal results), and reveal a general pattern of trade-off between effort and quality: The higher the computational effort, the higher the modelling quality, and vice versa. The red dot indicates the -in our eyes -optimal tradeoff based on the optimized parameter set shown in Table 1. The related computation time is 207 seconds, NSE is 0.84. This means that compared to the 'reference' case, computation time is reduced by 75% at the price of worsening NSE by 10 0.16. The effect of adaptive clustering on the quality of discharge simulations at the catchment outlet is shown in Fig. 3, panel 'b': Differences between the 'reference' and the 'adap-c' optimal adaptive clustering run are visible, but they are generally much lower than the differences between the 'reference' simulation and the observed discharge. This is encouraging. But when comparing the 'adap-c optimal' run to the 'static optimal' benchmark, we may ask whether the small reductions in computation time, at the cost of a decrease in model quality, makes adaptive clustering worth the effort. For the 15 given model, our answer will likely be 'No'. However, as discussed in Sect. 2.4.2, SHM Attert is extremely well suited for static clustering due to its simple set up, and for most distributed models it will not be possible to group model elements into an equally small number of time-invariant yet optimal clusters. In such cases, the relative gains of adaptive clustering will be potentially more pronounced. 20 Figure 5. Performance of model runs with respect to effort, measured by execution time, and quality, measured by mean Nash-Sutcliffe efficiency of sub catchment runoff qcat,out. 'Reference': Full resolution, no adaptive clustering overhead; 'static': Full resolution, but with adaptive clustering overhead; 'static optimal': Time-invariant optimal clustering, with clusters shown in Table 2; 'adap-c variations': adaptive clustering with various parameter settings; 'adap-c optimal': optimal adaptive clustering with parameters shown in Table 1. Fig. 6 is based on the 'adap-c optimal' model run and gives some insights about the behavior of adaptive clustering. The blue line in panel 'a' shows for each time step the agreement between the inherited and the current clustering (step 'g' in Sect. 2.1 and Sect. 2.3). The related thresholds for starting and ending jumps back in time are indicated by horizontal lines: sim_crit is indicated by the lower line, sim_uncrit by the upper line (see Table 1). Each time the clustering agreement falls below sim_crit, a jump back in time is triggered (red circle in the plot). It goes back to the closest time when the agreement was still 5 above sim_uncrit (red dot in in the plot). From this point in time, the model is operated in full distribution (no clusters) until the time of the jump back is reached again. Fig. 6 reveals that the occurrence and the length of jump back periods varies with the hydrological situation. During times of rapidly changing catchment conditions, such as in April, frequent but short jumps back appear, indicating that the inherited clusters are only valid for short periods of time. For periods of low flow, such as in August, no jumps back occur at all: Apparently, the clusters determined in mid-June remain valid until rainfall events at the 10 beginning of September terminate the long period of synchronized drying of all sub catchments. This indicates that updating of the clusters is controlled by changes in the hydro-meteorological situation. For the entire five-year simulation period, overall 165 re-clusterings occur, i.e. on average one every eleven days.
Adaptive clustering increases modeling efficiency by restricting computations to the representatives, but it also comes at a computational cost. This is illustrated by the blue line in Fig. 6, panel 'b', which shows the number of sub catchments for 15 which hydrological processes were calculated at each time step. Normally, this number corresponds to the number of cluster representatives set by parameter perc_reps (see Table 1). For the jump back periods it is different: They are visited twice, once when the model is in normal, clustered, forward mode, and once in full resolution when the model is in jump back mode. As a consequence, the total number of sub catchments processed in these times is high. It can even exceed the total number of sub catchments of the model (173, indicated by the red line). Overall, however, the savings prevail: For the entire 20 five-year simulation period, on average 34 of 173 sub catchments were processed per time step, which means a reduction of 80% compared to the 'reference' model run.  Fig. 3 and Fig. 4), black horizontal lines indicate the agreement thresholds given by parameters sim_crit and sim_uncrit (see Table 1). A red circle indicates when a jump back in time was triggered. The jump goes back to the next red dot to the left of each circle. Panel 'b': Number of sub catchments per time step for which hydrological processes were calculated. The red horizontal line indicates the total number of sub catchments (173) in the SHM Attert model.

5
Adaptive clustering restricts the execution of hydrological processes to a few representative sub catchments. While this seems sufficient to preserve the time-variant behavior of all sub catchments -as indicated by the high NSE values of the adaptive clustering model runs, and the good agreement of the discharge hydrographs in Fig. 3, panel 'b' -the question remains whether spatial patterns of similarity are also preserved. To address this question, spatial maps of sub catchment 10 discharge from the optimized adaptive clustering run are plotted in the right column of Fig. 4. The dates of each plot are identical to those of the 'reference' run in the left column. Comparing the associated maps shows that they largely agree.
Further, the main characteristics of sub catchment similarity seem to be preserved by adaptive clustering: Geology is the main and precipitation a secondary control, and the degree of similarity varies over time. But there are also differences: For example, comparing plots 'b' and 'g' reveals a smaller influence of the rainfall pattern in the adaptive clustering case, for plots 15 'd' and 'i' the opposite is the case. Generally, the overall degree of similarity is larger for the adaptive clustering case, i.e. the number of clusters is always smaller as for the 'reference' case. This is a consequence of mapping states from a few representatives to many recipients in a cluster, which involves averaging and hence an artificial increase of similarity among sub catchments.
To summarize: We have tested many variants of adaptive clustering against several benchmark models in terms of 20 computational effort and model quality. For the best variant, computation time was reduced by 75% compared to the fullresolution 'reference' model run, and model quality, expressed by NSE, decreased by 0.16. Compared to the 'static optimal' benchmark, which uses time-invariant clusters, computational savings were much smaller, as the SHM Attert model, due to its simplicity, lends itself well for time-invariant clustering. Analysing the time-variant behaviour of adaptive clustering revealed that re-clustering is linked to changes of the hydro-meteorological situation. Analysing spatial patterns of sub catchment similarity from model runs with and without adaptive clustering revealed that adaptive clustering preserves their main characteristics but shows a tendency for exaggerating similarity.

Summary and conclusions 5
In this paper we proposed and described adaptive clustering as a new way to reduce computational efforts of distributed modelling, while largely maintaining modelling quality. This is done by identifying, in a time-variant manner, similar acting model elements, clustering them and inferring dynamics of all model elements from just a few representatives per cluster.
We started from the observation that hydrological systems generally exhibit spatial variability of their properties, and that this variability is non-negligible if distributed dynamics are of interest, which then requires distributed modelling. We further 10 hypothesized that despite this variability, there is also similarity, i.e. many model elements exist with similar properties, which will exhibit similar internal dynamics and produce similar output when in similar initial states and when exposed to similar forcing. Similarity among model elements is hence not a static but rather a time-variable property dependant on the interplay of these factors, and similarity is also not necessarily limited to contiguous model elements.
Based on these premises we developed adaptive clustering, and provided a proof-of-concept at the example of a distributed, 15 conceptual hydrological model -SHM -, fit to the Attert basin in Luxembourg. Adaptive clustering comprises several steps: Clustering of model elements, choice of cluster representatives, mapping of results from representatives to recipients, and comparison of clusterings over time to decide when re-clustering is required. We explained these steps in general, and its implementation in the SHM Attert model in particular. We used normalized and binned transformations of model states for both clustering and for measuring overall similarity among model elements by Shannon information entropy. Analysing time 20 series of entropy of model states and fluxes revealed that i) for most variables, there is pronounced hydrological similarity among sub catchments, ii) similarity is time-variant, and iii) the spatial patterns of similarity and their variation with time are in accordance with hydrological reasoning. We then evaluated adaptive clustering with respect to both computational gains and losses of model quality against several benchmark models. Compared to a standard, full-resolution model run used as a virtual reality 'truth', computation time could be reduced by 75%, when a decrease of Nash-Sutcliffe efficiency by 0.16 was 25 accepted. Re-clustering of model elements was linked to changes of the hydro-meteorological situation, and was on average carried out once every eleven days.
Our tests and analyses were conducted in the virtual reality of a fully distributed model run, due to a lack of equally comprehensive observations. However, due to the good overall agreement of the model with the available multi-variate observations, we are confident that our main conclusion, namely that adaptive clustering is a promising tool for accelerating 30 distributed modelling of hydrological and other dynamical systems, also holds with respect to real world systems.
Additionally, adaptive clustering yields spatial and temporal patterns of similarity among model elements, which can be used for hydrological system analysis. Adaptive clustering integrates and enhances existing methods of time-invariant grouping of model elements, such as lumping or grouped response units (GRU), and it can be applied together with existing methods of exploiting time-variable similarity, such as adaptive gridding or adaptive time-stepping. A limitation of the method lies in 35 the potential violation of conservation laws when mapping results from cluster representatives to recipients.
What's ahead? For the study we selected sub catchment runoff as the single variable for both clustering control and model evaluation. This was mainly based on hydrological reasoning, and clearly other and/or additional variables for clustering control should be tested, and the effect of adaptive clustering on all model states and fluxes should be evaluated. Also, so far cluster representatives were simply chosen by random picking. We expect better performance by a targeted selection of 40 model elements, e.g. those close to the cluster centre. At last, we have tested adaptive clustering at the example of a relatively simple, conceptual hydrological model with limited internal variability. The performance of adaptive clustering in more advanced models such as MIKE SHE (Abbott et al., 1986), HydroGeoSphere (Brunner and Simmons, 2011;Davison et al., 2018), Noah-MP LSM (Niu et al., 2011), or the Community Land Model CLM (Lawrence et al., 2019) -where computation times are indeed a challenge -remains to be demonstrated. On the one hand, the potential savings by adaptive clustering will increase with the level of process detail in a model. On the other hand, its implementation will become more 5 difficult. For example, existing code may be structured in a way unfavourable for integrating the adaptive clustering functionality; it can be a challenge to combine massive parallel processing with adaptive clustering; and it will be difficult to integrate it into models where many processes act simultaneously at a hierarchy of model elements. However, the same could be said about adaptive gridding and adaptive time stepping, nevertheless they have been implemented very successfully in many advanced earth science models.   Fig. 37 with the assumptions k = 0 for su ≤ 0; k = 1 for su ≥ 0.8*su,max; k = su/su,max for 0 < k < 0.8*su,max

A.2 SHM Attertmodel setup 10
Setting up the SHM model for a catchment starts by GIS-based delineation of sub catchments and river elements using a digital elevation model. For the Attert, a 5 m digital elevation model based on LIDAR scans provided by the Luxembourg Institute of Science and Technology (LIST) was used. Each sub catchment was assigned a single land use, based on the Corine Land Cover map provided by the European Environment Agency, and a single geology, based on the 'Carte géologique détaillée 1:25000-1:50000' provided by the Luxembourg geological survey. In the catchment, altogether five different land use classes and three geological classes occurred. For an overview of geology and land use classes assigned to each sub catchment, please see Table 2.
In the Attert basin, hydrological function is strongly controlled by geology (Fenicia et al., 2016). Therefore, all soil-related model parameters (ß, su,max, perc, ki, kb) were kept equal for all sub catchments sharing the same geology. The parameter values were determined by calibration (see Appendix A.3). Likewise, all parameters related to evapotranspiration (kc, kθ) 5 were kept equal among all sub catchments sharing the same land use class. These were -without calibration -directly inferred from the land use (see Eq. A4 in Table A1). As we set all river elements to be approximately one kilometre in length, we could assign to all 147 of them the same value for kr, which we determined by calibration.
Running the SHM model requires observed time series of precipitation, air temperature, air relative humidity, wind velocity and global radiation. For precipitation, data from three stations were available. While this is clearly not enough to represent 10 the full spatial variability of precipitation, it nevertheless represents some of it. Each sub catchment was assigned precipitation from a single station using a nearest neighbour approach (see Fig. 2 and Table 2). As the remaining hydrometeorological variables typically exhibit less spatial variability than precipitation, we used observations from only a single station each (see Table A2 and Fig. 2).

A.3 SHM Attertmodel calibration and validation
We applied a multi-criteria calibration approach to ensure good overall performance of SHM Attert, and not just with respect to discharge at the catchment outlet. For calibration we used data from the four-year period 2011/11/01 00:00 -2015/10/31 23:00; for validation we used the remaining one-year period 2015/11/01 00:00 -2016/10/31 23:00. We started by joint calibration of all sub catchment parameters on catchment scale, i.e. against observed discharge at the catchment outlet storage in the SHM unsaturated zone reservoir (su). While this did not permit quantitative conclusions, it was nevertheless informative in terms of the timing of relative minima and maxima, and the overall shape of the time series. As direct observations of catchment-scale evapotranspiration rates were not available, we used satellite-based estimates provided by EUMETSAT (Trigo et al., 2011) instead and compared them to catchment-averaged evapotranspiration rates of SHM. For each variable we measured model performance by Nash-Sutcliffe efficiency (Nash and Sutcliffe, 1970). To measure overall 5 model performance in a single number, we merged the three efficiencies for discharge, soil moisture and evapotranspiration into a single, multi-criteria objective function according to Eq. (A12). The weights assigned to each component were subjectively chosen, mainly based on our evaluation of the quality of the underlying observations. To a lesser degree, the weights also reflect our evaluation of the relative importance of each component for overall model evaluation. After the first catchment-uniform and multi-criteria estimation of parameters, we refined the estimates of all soil-related model parameters by calibration against three gauges, each representative for a particular geology: Colpach for schist, Wollefsbach for marls, and Platen for sandstone. These parameters (see Table A3) were then assigned to all sub catchments sharing the same geology. After a few iterations of catchment-scale-and geology-specific calibration, we determined the 15 final, distributed parameter sets as shown in Table A3. The main differences among geology-specific parameters appear for the retention behaviour of the interflow and the base flow reservoir (ki and kb, respectively), which reflects the geologyspecific hydrological functioning of the Attert basin as described by Fenicia et al. (2016): In the schist, dynamics is governed by a combination of two subsurface flow paths, in the marl, fast responses governed by near-surface flow paths prevail, while the sandstone areas are characterized by delayed responses governed by groundwater flow. 20 The catchment-scale performance measures for both the calibration and the validation period are shown in Table A4, and in Table A5 for performance at the gauges used for geology-specific and for catchment-wide calibration. The model achieves a catchment-scale, multi-objective Nash-Sutcliffe efficiency of 0.73 in the validation period; gauge-or criteria-specific efficiencies range from 0.61 for gauge Wollefsbach to 0.77 for gauge Useldange at the catchment outlet.   Table A4. Catchment-scale performance measures (discharge: Nash-Sutcliffe efficiency at gauge Useldange. Soil moisture and evapotranspiration: Nash-Sutcliffe efficiency of catchment averages) of the SHM Attert in the 5-year calibration period (2011/11/01 00:00 -2015/10/31 23:00) and 1-year validation period (2015/11/01 00:00 -2016/10/31 23:00). 'Combination' refers to the joint objective function according to Eq. (A12).

Series
Calibration Validation  Table A5. Gauge-specific performance measures (Nash-Sutcliffe efficiency of discharge) of the SHM Attert in the calibration and validation period. Gauge locations are shown in Fig. 2, catchment sizes in Table A2.

Geology
Gauge  Competing interests. The authors declare that they have no conflict of interest.