We propose and provide a proof of concept of a method to analyse, classify
and compare dynamical systems of arbitrary dimensions by the two key features
uncertainty and complexity. It starts by subdividing the system's time trajectory into a number of time slices. For all values in a time
slice, the Shannon information entropy is calculated, measuring within-slice
variability. System

In the earth sciences, many systems of interest are dynamical; i.e. their states are ordered by time and evolve as a function of time. The theory of dynamical systems (Forrester, 1968; Strogatz, 1994) therefore has proven useful across a wide range of earth science systems and problems such as weather prediction (Lorenz, 1969), ecology (Hastings et al., 1993; Bossel, 1986), hydrology (Koutsoyiannis, 2006), geomorphology (Phillips, 2006) and coupled human–ecological systems (Bossel, 2007).

Key characteristics of dynamical systems include their mean states (e.g.
climatic mean values in the atmospheric sciences), their variability (e.g.
annual minimum and maximum streamflow in hydrology) and their complexity
(e.g. population dynamics in ecological predator–prey cycles).
Interestingly, despite its importance and widespread use, there are to date no
single agreed-upon definition and interpretation of complexity and no
agreed-upon base set of features characterizing a complex system. Gell-Mann
(1995), Lloyd (2001), Prokopenko et al. (2009) and Ladyman et al. (2013)
provide interesting overviews of the topic. Gell-Mann (1995) points out that
while measures of complexity for entities in the real world are to some
degree always context-dependent, they have in common that “

Characterizing dynamical systems by few and meaningful statistics representing the above-mentioned key features is important for several reasons: system classification, intercomparison and similarity analysis are pre-conditions for the transfer of knowledge from well-known to poorly known systems or situations (see e.g. Wagener et al., 2007, Sawicz et al., 2011, and Seibert et al., 2017, for applications in hydrology). Further, dynamical system analysis helps in detecting and quantifying nonstationarity, a key aspect in the context of global change (Ehret et al., 2014), and it is important for evaluating the realism of dynamical system models and for guiding their targeted improvement (Moriasi et al., 2007; Yapo et al., 1998).

In this paper, we address the task of parsimonious yet comprehensive characterization of dynamical systems by proposing a method based on concepts of information theory. It comprises both variability and complexity and adopts the view that the overall variability (or uncertainty) of a time series is the mean of its variabilities in subperiods and that the complexity of a time series is the overall variability of these variabilities. We use examples from hydrology, as due to the multitude of subsystems and processes involved, most hydrological systems are classified as variable and complex systems (Dooge, 1986). Hydrological systems and models thereof have been analysed in terms of predictive, model structural and model parameter uncertainty by Vrugt et al. (2003), Liu and Gupta (2007) and Vrugt et al. (2009). Hydrological systems have been classified in terms of their complexity by Jenerette et al. (2012), Jovanovic et al. (2017), Ossola et al. (2015), Bras (2015), Engelhardt et al. (2009), Pande and Moayeri (2018), Sivakumar and Singh (2012), Sivakumar et al. (2007) and Ombadi et al. (2021). Following early attempts by Jakeman and Hornberger (1993), Pande and Moayeri (2018) investigated how the relation between the information content and complexity of hydrological systems can guide the selection of adequate models thereof and vice versa.

In particular, concepts from information theory have been applied for hydrological system analysis and classification by Pachepsky et al. (2006), Hauhs and Lange (2008), Zhou et al. (2012), Castillo et al. (2015) and recently by Dey and Mujumdar (2022). Information-based approaches rely on log-transformed probability distributions of the quantities of interest and are thus independent of the units of the data. Compared to methods relying directly on the data values, this poses an advantage in terms of generality and comparability across disciplines. Being rooted in information theory, the method we propose in this paper makes use of this advantage. The same applies to the methods of multiscale entropy (MSE) proposed by Costa et al. (2002) in the context of physiological time series and the method suggested by LopezRuiz et al. (1995) for physical systems. Both share similarities with the complexity–uncertainty curve (c-u-curve) method but also differ in some important aspects, which will be discussed in Sect. 2.3 after the c-u-curve method has been introduced in Sect. 2.1. The MSE method has been applied to a wide range of complex systems, such as biological signals (Costa et al., 2005), ball-bearing fault measurements (Wu et al., 2013) and seismic (Guzmán-Vargas et al., 2008) and hydro-meteorological time series. For the latter, Li and Zhang (2008) analysed long time series of Mississippi River flow data, and Chou (2011) used MSE in combination with wavelet transformation to analyse properties of station-based rainfall time series. Brunsell (2010) also applied entropy measures on various temporal scales to assess the spatial–temporal variability of daily precipitation, similar to the MSE method, but refers to this as “a multiscale information theory approach”.

The remainder of the text is organized as follows. In Sect. 2, we present all the steps of the method, describe its properties, and compare it to existing methods. In Sect. 3, we apply the method to both synthetic time series and observed hydrological data to demonstrate uses and interpretations of the c-u-curve method. We summarize the method, discuss its limitations, and draw conclusions in the final Sect. 4.

Please note that in what follows, for clarity we introduce the method with the example of univariate time series with deterministic values, and we calculate discrete entropy based on a uniform binning approach.

The mathematical variable names used in this section and throughout the
paper were chosen with the goal of straightforward interpretation. The names
were constructed by a combination of the following base “alphabet”:

Applying the method to a given time series with overall nt time steps
consists of a number of steps and related choices. At first, for each
variate involved, a suitable discretization (binning) scheme is chosen. The
bins must cover the entire value range, and their total number (nvb) can be
chosen according to a user's demands regarding data resolution. Next, the time
series is divided into a number of ns time slices. The slices must be
mutually exclusive and together must cover the time series. The slices are
preferably, but not necessarily, of uniform width. Next, separately for each
slice, a discrete probability distribution (histogram) is calculated using
the data in the slice and the chosen binning scheme. From the histogram obtained in this way, the Shannon information entropy

As entropy values may differ between slices, an overall uncertainty estimate
for all slices is calculated as the expected value of all slice entropies.
For equal-width slices, this is mean entropy according to Eq. (2).

Next, we consider the variability of entropy across all the slices, and as before we
measure variability by entropy. In order to calculate this higher-order
“entropy of entropies”, a suitable binning scheme for entropy values must
be chosen, which can be based on the same criteria as outlined above. It is
then used to calculate a histogram of the ns entropy values. We thus
define

The entire procedure of calculating uncertainty and complexity is repeated
for many different choices of ns (time-slicing schemes). For each choice
of ns, for equal-width slices the width of a time slice is

For example, for a time series with

In this section, we briefly summarize some general properties of the c-u-curve and discuss its limitations and possible generalizations.

For the c-u-curve, both the

The lower bound for uncertainty is always zero, which is reached if, for all time slices, all values within a time slice fall into the same value
bin. The upper bound is dependent on the choice of nvb (the number of bins
resolving the value range). Its value,

As with uncertainty, the lower bound of complexity is always zero. It
is reached if the entropy values calculated for all time slices all fall
into the same entropy bin. The upper bound is dependent on the choice of
neb (the number of bins resolving the entropy range). Similar to
uncertainty, its value,

The shape and values of the c-u-curve remain invariant under prior normalization of the data if the binning scheme is also transformed. Normalization can therefore be applied for convenience to use the same binning scheme for all time series. Likewise, for better comparability among time series of different lengths, normalization of the time domain is also possible. As a consequence, the time slice widths sw will be expressed in units of “length relative to the length of the time series” rather than in the original time units. However, this potentially comes at the cost of losing interpretability, e.g. in detecting the effect of diurnal or seasonal cycles in the c-u-curve.

The values of the bounds and all uncertainty and complexity values of the curve depend on the chosen binning for the values and the entropies. For
direct comparison of the c-u-curve, the binnings should therefore agree. If
this is for some reason not feasible, comparability can be established by
normalizing values to a

For better visibility, we connected the c-u-curve points calculated for different time slice widths sw in Figs. 2 and 3 with a line. However, there is no theoretical argument guaranteeing the continuity of the c-u-curve, and the lines should not be interpreted in this manner. Nevertheless, test runs with many different data sets and many time slice widths suggest that the c-u-curve is generally smooth.

For short time series with highly variable data, different splits of the
time series into time slices might return quite different results. In other
words, the default splitting scheme starting at the first time step (e.g.
“1-2-3”, “4-5-6”, etc., for time slices of width

Without formal proofs, we briefly discuss here the effect of errors or
trends in the data on the values and shape of the c-u-curve. In the case of

We introduced the c-u-curve method with a univariate and deterministic example. However, the method is also applicable to multivariate and/or probabilistic data. When moving from univariate to multivariate data, the entropy within a time slice simply changes from univariate to multivariate entropy. When moving from deterministic to probabilistic variables, for each time step in a time slice, a value distribution rather than a crisp value will be used to populate the distribution of all values in the time slice, but the result will still be a single distribution with a single entropy value, which can be plotted as before in the c-u-curve. In Ehret (2022), we provide multivariate and probabilistic application examples and the related generalized code. Also, in the method description in Sect. 2.1, we calculated discrete entropy based on a uniform binning approach. We did so as it has some useful properties (ease of interpretation is one of them) compared to calculating continuous entropy. Nevertheless, the method can also be used with non-uniform binning or continuous representations of data distributions as long as entropy can be calculated from the data distribution. For a detailed discussion of discrete vs. continuous entropy, see Azmi et al. (2021) and references therein. Please also note that, strictly speaking, the c-u-curve method does not measure the uncertainty and complexity of an entire dynamical system, but only those of its signals (time series) that are available for analysis. For cases where the signals do not completely cover the system's state space, we should therefore refer to the results as “signal uncertainty” and “signal complexity”. As throughout the literature on dynamical system analysis, this distinction is usually not made, and we also stick to the term “system” rather than “signal” throughout this paper.

Two methods similar to the c-u-curve have been proposed in the literature,
which in the following we will briefly explain and discuss. The first,

The MSE method calculates the entropy of a time series for various
coarse-grained (

We discuss the properties of the c-u-curve with the example of six time series
as shown in Fig. 1a–f. Time series a–c are synthetic time series: a straight
line, random uniform noise and the famous Lorenz attractor (Lorenz, 1963). We chose them for their simple, exemplary and well-known behaviour. The
straight line (Fig. 1a) contains no variability whatsoever and should
therefore show very little uncertainty and complexity. The random noise
(Fig. 1b) contains very high but constant variability and should therefore
show high uncertainty and low complexity. The Lorenz attractor (Fig. 1c) is
a prime example of complex behaviour arising from feedbacks in dynamical
systems. We used the code as provided by Moiseev (2022) with standard
parameters to produce a time series of the Lorenz attractor. From its three
variates, for clarity, only the first one is shown and discussed, and the results
from jointly considering all three variates are similar. All synthetic time
series consist of

Synthetic and hydro-meteorological time series used for
demonstration of the c-u-curve. Time series for subplots

Time series d–f are hydro-meteorological observations taken from the CAMELS
US data set (Newman et al., 2015). The first (Fig. 1d) are daily
precipitation observations for the South Toe River, NC (short: STR), basin,
and the second (Fig. 1e) is the corresponding time series of daily streamflow
observations. The basin size is 113.1

For convenience, we normalized all six time series to a

In this section, we present and discuss the c-u-curves of all six time series. We start by discussing the three artificial time series, followed by the three hydro-meteorological time series. All the c-u-curves are shown in Fig. 2, and their key characteristics are summarized in Table 1. For clarity, Fig. 3 additionally shows only the hydro-meteorological time series in a subregion of Fig. 2. For further illustration, selected histograms of time series streamflow in GR are shown in Appendix A.

The c-u-curves for synthetic (dotted) and hydro-meteorological (no marker) time series as shown in Fig. 1. The time series length is 30 000 for the synthetic data and 12 418 for the
hydro-meteorological data. The number of value bins and entropy bins is 10,
and the maximum uncertainty limit and maximum complexity limit are at

Key characteristics of the c-u-curve for both the synthetic and hydro-meteorological time series.

The c-u-curve for all hydro-meteorological time series as shown in Fig. 1d–f. All the time series comprise 12 418 time steps, the number of
value bins and entropy bins is 10, and the maximum uncertainty limit and maximum
complexity limit are at

The overall shape of each c-u-curve contains key characteristics of the
underlying time series. We start by discussing the c-u-curve plot of the

The

The

Next, we discuss the c-u-curves of the hydro-meteorological time series. In Fig. 2, they are indicated by the lines without markers. It is immediately obvious that they all possess low uncertainty, much lower than the theoretical maximum (indicated by the vertical “maximum uncertainty” limit) and the random noise and also lower than the Lorenz attractor. This is in accordance with our expectations and a consequence of the typically high temporal autocorrelation of hydro-meteorological time series, which clearly separates them from purely random time series. For a better view of the details, we re-plotted the hydro-meteorological time series in a subregion of the uncertainty limits in Fig. 3, which we will refer to in the following.

Despite the generally low uncertainties, the

Interestingly, the corresponding

This is different for the second

In this paper we presented a method to analyse and classify dynamical
systems by the two key features

The c-u-curve method has several useful properties: independence from the units of the data (both uncertainty and complexity are expressed in bit), existence of upper and lower bounds for both uncertainty and complexity as a function of the chosen data resolution, and bounded behaviour when approaching the upper and lower limits of time slicing. For a single time slice containing all data, uncertainty equals the time series entropy and complexity is zero; for time slices containing single values, both uncertainty and complexity are zero. The c-u-curve method is applicable to single-variate and multivariate data sets as well as to deterministic and probabilistic value representations (ensemble data sets), making it suitable for a wide range of tasks and systems. The main limitation of the method arises from the requirement of sufficiently populating distributions, which sets bounds on both the minimum and maximum widths of time slices.

We provided a proof of concept with the example of six time series, three of them artificial, three of them from hydro-meteorological observations. The artificial time series (straight line, random noise, Lorenz attractor) were chosen for their very different, exemplary and well-known behaviour and with the goal of demonstrating that the c-u-curve successfully reveals this behaviour, i.e. to demonstrate the general applicability of the method across a wide range of time series types. The observed time series (precipitation and streamflow from a mainly rainfall-dominated basin and streamflow from a basin where additionally snow processes influence the hydrological function) were chosen with the goal of demonstrating that the c-u-curve method reveals characteristics of real-world time series that are in accordance with the general knowledge of hydrological system functioning. For all the time series, we were able to show that the c-u-curve properties were distinctly different among the time series – which indicates that the method has discriminative capabilities useful for system classification and that the properties are in accordance with expectations based on system understanding. This indicates that the method captures relevant time series properties and expresses them in terms of uncertainty and complexity.

While the range of applications presented in this paper is small and mainly intended as a proof of concept, the results encourage further studies. Particularly for hydro-meteorological applications, we suggest that the c-u-curve method can be used for hydrological classification, as an objective function in hydrological model training, and for hydrological system analysis. For classification, we suggest using large hydro-meteorological data sets such as those from Addor et al. (2017) or Kuentz et al. (2017) to analyse whether the c-u-curve distinguishes between catchments with known differences, such as groundwater- and interflow-dominated, pristine and regulated, snow-free and snow-influenced, and arid and humid. In the same context, classifications by the c-u-curve can be compared to existing hydrological classifiers and signatures (such as the flow-duration curve and others as discussed in Jehn et al., 2020, Addor et al., 2018, and Kuentz et al., 2017) in terms of classification similarity and strength. The clear differences in c-u-curve properties between the two streamflow time series investigated in this paper encourage further research in this direction. In terms of hydrological model training, we suggest that the c-u-curve and its characteristic values can be used as an additional objective function. While standard hydrological objective functions such as Nash–Sutcliffe efficiency guide models towards point-by-point agreement of model output and observations, c-u-curve characteristics can guide models towards correct representations of short- and long-term variability patterns. Supported by the (dis-)similarities of the c-u-curve properties of the precipitation and streamflow time series presented in this paper, we also suggest that by analysing and comparing c-u-curve properties of input, internal states and output of hydrological systems, valuable insights into the functioning of these systems can be gained, e.g. whether they increase or decrease the uncertainty and complexity of the signals propagating through them. Further work on these topics is in progress. Finally, we propose the combination of the multiscale entropy (MSE) and c-u-curve approaches as discussed in Sect. 2.3 as a very promising avenue for future work.

As an illustration of how time series values within a time slice translate into
histograms and entropy values, we show, for streamflow GR, in Fig. A1 the
streamflow hydrographs and the corresponding histograms for three time slices.
All the time slices have a width of 60 d, which is the slice width for which
the series shows the highest complexity (compare Table 1 and Fig. 3). Overall,
the time series (12 418 time steps) splits into

Normalized streamflow hydrographs and corresponding
histograms of three time slices from time series streamflow GR. Each time
slice comprises 60 d. For the histograms, the value range of the
normalized streamflow was split into 10 bins of uniform width. Subplots

Histogram of entropies from normalized time series
streamflow GR split into 206 time slices, each with a width of 60 d.
The entropy for each time slice was calculated from histograms (see Fig. A1).
For the histogram, the possible range of entropy values (

For the convenience of the reader, we repeat Theorem 5.12 from Conrad (2022)
here and some related explanation in slightly modified and shortened form,
but for the full proof, the reader is referred to the original publication.
In the following,

This number is between

The code and data used to conduct all the analyses in this paper are publicly available at

UE developed the c-u-curve method and wrote all the related code. UE and PD designed the study together and wrote the manuscript together.

The contact author has declared that neither of the authors has any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We gratefully acknowledge support by the Deutsche Forschungsgemeinschaft (DFG) and the Open Access Publishing Fund of the Karlsruhe Institute of Technology (KIT). We thank Philipp Reiser from the University of Stuttgart for pointing us to Conrad (2022).

This research has been supported by the INSPIRE Faculty Fellowship, Department of Science and Technology, Government of India (grant no. DST/INSPIRE/04/2022/001952, Faculty Registration No.: IFA22-EAS 114). The article processing charges for this open-access publication were covered by the Karlsruhe Institute of Technology (KIT).

This paper was edited by Jim Freer and reviewed by Jasper Vrugt and one anonymous referee.