Understanding the complexity of natural systems, such as climate
systems, is critical for various research and application purposes. A range
of techniques have been developed to quantify system complexity, among which
the Grassberger–Procaccia (G-P) algorithm has been used the most. However,
the use of this method is still not adaptive and the choice of scaling
regions relies heavily on subjective criteria. To this end, an improved G-P
algorithm was proposed, which integrated the normal-based

There are increasing interests in understanding system complexity, ranging from natural phenomena to social behaviors (Bras, 2015; Lin et al., 2015; Wang et al., 2016). As an open system with random external forcings and nonlinear dissipation, climate systems are highly complex (Nicolis and Nicolis, 1984; Jayawardena and Lai, 1994; Rind, 1999; Wang et al., 2015). Owing to nonlinear interactions among atmosphere, hydrosphere, and biosphere, climatic variables exhibit highly nonlinear and dynamic characteristics, which reflect the complexity of climate systems (Palmer, 1999; Rial et al., 2004; Sivakumar, 2005; Wu et al., 2010). It is thus imperative to quantitatively measure the complexity of climatic variables for understanding underlying processes. However, no common definition of system complexity exists in scientific communities, particularly from a mathematical perspective (Carbone et al., 2016). To resolve this issue, numerous concepts and methods, including chaos theory, wavelet analysis, and dynamical analysis, have been proposed to describe the complexity of climate systems (Lorenz, 1963; Di et al., 2014; Feldhoff et al., 2015; Sivakumar, 2017; Meseguer-Ruiz et al., 2017). For instance, the chaos theory has been extensively used to characterize the chaotic and nonlinear features of climate systems (Sivakumar, 2001). Overall, previous studies based on the chaos theory revealed that the time series of air temperature and precipitation is nonstationary with abundant information. The complexity of rainfall and temperature dynamics has been widely used to indicate the extent of the complexity of climate systems (Dhanya and Kumar, 2010; Gan et al., 2002).

One of the important parameters in the chaos theory is correlation dimension, which can be used to measure the complexity and chaotic properties of variables, including precipitation and streamflow (Sivakumar et al., 2002; Dhanya and Kumar, 2011; Kyoung et al., 2011; Lana et al., 2016). Conceptually, the correlation dimension of a variable indicates the number of primary controls of the variable and thus determines the degree of freedom of the underlying process (Sivakumar and Singh, 2012). Despite the wide applications in various scientific fields, the use of the correlation dimension method is still hindered by certain limitations. For instance, the dimension method proposed by Grassberger and Procaccia (1983b) (denoted as the G-P method hereafter) is commonly used in the fields of hydrology and atmospheric science; however, its calculation procedures are still problematic (Ji et al., 2011). Specifically, the G-P method utilizes phase space reconstruction (Packard et al., 1980) and the embedding theorem (Takens, 1981) to compute correlation dimensions, which requires selection of an appropriate scaling region. The scaling region is a domain, over which an object exhibits self-similarity across a range of scales. However, the G-P method relies on visual inspections for choosing scaling regions, which is subject to human errors (Sprott and Rowlands, 2001). To tackle this problem, alternative methods have been developed to improve the original G-P method (Maragos and Sun, 1993). For example, Jothiprakash and Fathima (2013) utilized empirical equations to calculate the upper limit of scaling regions. Ji et al. (2011) applied the clustering analysis technique to determine scaling regions. However, these existing methods for identifying scaling regions are still not adaptive and the choice of scaling regions relies heavily on subjective criteria, and the use of the least squares method for fitting straight lines to determine correlation exponents can include outliers (Cantrell, 2008) and thus is not optimal. Therefore, studies are still warranted to seek more objective and adaptive algorithms for identifying scaling regions to obtain more accurate estimates of correlation dimensions.

The primary aims of this study were twofold. First, a new algorithm was
proposed to improve the original G-P method, which integrated the methods of
normal estimation,

Correlation dimensions can be used to identify the complexity of dynamical
systems with varying complexity degrees (e.g., low-dimensional vs.
high-dimensional systems). A wealth of algorithms have been developed for
computing correlation dimensions, among which the G-P algorithm has been used
most and is also adopted in this study. The G-P algorithm uses the concept of
phase space reconstruction (Packard et al., 1980) from a single-variable time
series. Here, the method of delays (Takens, 1981) was employed for
reconstructing phase space. Given a time series

For the

To overcome the limitation of the original G-P algorithm for selecting
scaling regions, we propose an adaptive identification algorithm of scaling
regions, which utilizes the normal-based

Comparison of the fitted lines obtained from the RANSAC algorithm and the least squares method.

The flow chart of the proposed procedures for calculating correlation dimensions is given in Fig. 2, which consists of five major steps:

For the time series

The normals of the scatter
points on the ln

The

The RANSAC algorithm is used to fit a straight line through the set of remaining scatter points.

The slope of the line obtained from the RANSAC method is computed to
acquire the correlation dimension

Flow chart of the proposed algorithm for computing correlation dimensions (the details are listed in the text).

To test the effectiveness of the proposed algorithm, the classical chaotic
models of Lorenz (1963) in Eq. (4) and Henon (1976) in Eq. (5) were used.
The Lorenz and Henon systems with existing theoretical correlation
dimensions have been studied the most in the past and thus widely used to
analyze the chaotic behavior in climate systems and to test the
effectiveness of algorithms for computing climate system complexity (e.g.,
Grassberger and Procaccia, 1983a; Lai and Lerner, 1998; Ji et al., 2011).

Correlation integral as a function of

According to the autocorrelation function, the time delay

Illustration of using the normal-based

The final fitted lines and the correlation dimension of
the Lorenz system:

Figure 5a shows the final fitted lines through the scaling regions using the
RANSAC method. The slope of the fitted line is the correlation dimension for
each corresponding

To verify the accuracy of our algorithm for computing correlation
dimensions, the results derived from the proposed algorithm were compared
with the ones obtained from the IJM and PKC methods. The IJM method was
based on visual inspections to determine scaling regions (Jothiprakash and
Fathima, 2013), while the PKC method integrated the

Comparison of the correlation dimensions derived from
different methods. TCD: theoretical correlation dimension; IJM: intuitive judgment method;
PKC: point-based

The correlation dimension method is an important diagnostic tool for understanding the complexity of natural systems with chaotic characteristics. In this section, a case study is presented to illustrate the use of the newly developed algorithm for studying the complexity of climate systems. Specifically, the algorithm was first utilized to compute the correlation dimensions of precipitation and air temperature using time series obtained from the HRB. Afterwards, the regional patterns of correlation dimensions for precipitation and air temperature in the HRB were analyzed.

Locations of meteorological stations in the Hai River basin.

Variation of correlation dimension vs. embedding
dimension of climate variables:

The HRB is located in northeastern China (112–120

The correlation dimensions of precipitation and air temperature at all 40 meteorological stations were computed using the algorithm proposed in this study. Figure 7 shows the relationships between correlation dimension and embedding dimension for precipitation and air temperature at five representative stations across the HRB (i.e., Beijing, Fengning, Shijiazhuang, Xinxiang, and Zhangbei). The embedding dimensions of precipitation and air temperature for the five stations varied between 10 and 12. It is evident that the relationship between correlation dimension and embedding dimension for precipitation and air temperature differed among the selected stations. In general, correlation dimensions for precipitation showed gradual saturation processes with respective saturation values of 2.378, 2.407, 3.055, and 2.550 for Beijing, Fengning, Shijiazhuang, and Zhangbei stations, respectively (Fig. 7a), indicating chaotic dynamical characteristics of precipitation. By comparison, the correlation dimension for precipitation at the Xinxiang station increased with increasing embedding dimensions, suggesting random characteristics of precipitation. For air temperature, the correlation dimensions at the five stations also showed gradual saturation processes (Fig. 7b), suggesting low dimensional chaotic characteristics for air temperature.

The spatial distribution of the correlation dimension
values for all the 40 stations:

Figure 8 presents the spatial distributions of the saturated correlation dimensions at the 40 meteorological stations for precipitation and air temperature in the HRB. For both precipitation and air temperature, the correlation dimensions varied markedly across the area. The correlation dimension for precipitation ranged from less than 3 to more than 6, while the correlation dimension was much lower for temperature (i.e., less than 2). Overall, the ranges of the correlation dimensions for precipitation and air temperature were comparable to previously reported values in other regions with similar climatic conditions (Kyoung et al., 2011; Sivakumar and Singh, 2012; Sivakumar et al., 2014). More importantly, the considerable spatial variations in the dimensionality for both climatic variables suggest the regional differences in the complexity of the climate system in the HRB. Specifically, the correlation dimension for precipitation tended to be smaller in the northwestern mountainous area, with values of less than 2.5. In the central area, the correlation dimension for precipitation became larger, with values of greater than 3, while precipitation in the southeastern plain area showed very high correlation dimensions, with values of larger than 6. Given that correlation dimensions indicate the number of controls on the underlying process (Sivakumar and Singh, 2012), Fig. 8a suggests that precipitation processes become progressively more complex from the mountainous area to the plain area in the HRB. Interestingly, the regional pattern of the correlation dimension for air temperature showed an opposite trend with smaller values mainly located in the northern HRB, indicating more complex temporal dynamics of air temperature in the area.

The spatial pattern of the correlation dimension for precipitation in the HRB may be largely attributed to the regional flow pathway of moisture flux, which is mainly controlled by the East Asian Summer Monsoon (EASM). The HRB is located in a monsoon-dominated region, where the EASM plays a leading role in the regional meteorological system. Chen et al. (2013) showed that the EASM had significant impacts on the spatiotemporal distribution of precipitation in eastern China. Li et al. (2017) further suggested that there was a significant correlation between precipitation and the EASM index in the HRB. Wang et al. (2011) revealed that large-scale atmospheric circulations had close relationships with precipitation patterns in the HRB by analyzing the moisture flux derived from NCAR/NCEP reanalysis data. Influenced by the large-scale atmospheric circulation, precipitation in the middle and southeast parts of the HRB is more sensitive to climate variability due to their locations closer to the ocean. This leads to the decreasing trend of precipitation from the southeast to the northwest in the HRB, suggesting that the supply of moisture for precipitation in the region mainly comes from the ocean.

Partly owing to the closer geographical proximity to the ocean (Fig. 8), the EASM has a stronger impact on precipitation in the southern and central areas than in the northern part of the HRB. Furthermore, at the north corner of the HRB, the westerlies primarily affect the hydrometeorological system and thus weaken the impact of the EASM on precipitation (Li et al., 2017). In addition, other factors (e.g., topography, vegetation distribution, and human activity) may also have impacts on regional patterns of climate variables. In particular, the Yan and Taihang mountain range located in the northwestern HRB obstructs the vapor transport driven by the EASM, resulting in lower spatiotemporal variability in precipitation in the northern part of the HRB. As a result, precipitation had higher degrees of complexity in the southern HRB, while its complexity was lower in the mountainous area in the northwestern HRB. As for air temperature, the orographic effect on air temperature might be stronger in the mountainous area (Chu et al., 2010b), resulting in the higher complexity of temperature in this area. However, it should be noted that the range of the correlation dimension for air temperature from 1.0 to 2.0 suggests that two primary controls on temperature exist at all stations across the region.

In this study, the original G-P algorithm for calculating correlation
dimensions was modified by incorporating the normal-based

The effectiveness of the proposed method for calculating correlation
dimensions was illustrated using the classical Lorenz and Henon chaotic
systems. The results showed that the new method outperformed the traditional
intuitive judgment and point-based

Except for few stations in the northern region, precipitation at most of the meteorological stations in the HRB showed chaotic behaviors. Specifically, the correlation dimension for precipitation showed an increasing trend from the mountainous region in the northwest to the plain area in the southeast, indicating that precipitation processes became progressively more complex from the mountainous area to the plain area. The spatial pattern of the complexity of precipitation reflected the influence of the dominant climate system in the region. Meanwhile, air temperature at all meteorological stations showed chaotic characteristics. In contrast to precipitation, the complexity of air temperature exhibited an opposite trend, with less complexity in the plain area.

The modified G-P algorithm proposed in this study can be used more objectively to characterize the complexity of climate systems (and other hydrological systems, such as streamflow, soil moisture, and groundwater) and thus provide a more reliable estimate of the number of dominant factors governing climate systems. Theoretically, it can provide valuable information for optimizing the number of parameters in climate models to reduce computational demands and model parameter uncertainties. Furthermore, the findings of this study can be used for the regionalization of hydrometeorological systems in the HRB, which has important significance in prediction in ungaged areas (Lebecherel et al., 2016). It should be noted that more studies are still required to verify the present results using other nonlinear techniques, such as the Lyapunov exponent (Wolf et al., 1985) and approximate entropy (Pincus, 1995), which might provide additional insights into climate complexity analysis.

The data sets used in this study are publicly available. The monthly
precipitation and temperature data can be downloaded from the China
Meteorological Administration Network (

CD performed the improved method, experiments and algorithm coding. CD and TW conceived the framework and analysis of the paper. XY contributed to data collection. SL took part in the data analysis. All the authors contributed to writing and revising the paper.

The authors declare that they have no conflict of interest.

The work was supported by the National Key R&D Program of China (no. 2016YFA0601002, no. 2016YFC0401305, and no. 2017YFC0506603), and by the National Natural Scientific Foundation of China (no. 51679007 and no. U1612441). The authors would also like to acknowledge the financial support from the Tianjin University, and Tiejun Wang also acknowledges the financial support from the Thousand Talent Program for Young Outstanding Scientists for this study. Edited by: Dimitri Solomatine Reviewed by: Nagesh Kumar Dasika and one anonymous referee