Analysis of rainfall relationships in East Asia using a complex network

Concurrent floods in multiple locations pose systemic risks to the interconnected economy in East Asia through supply chains. Despite the significant economic impacts, however, the understanding of the interconnection between rainfall patterns in the region is yet limited. Here, we analyzed spatial dependence in rainfall patterns of the 24 megacities in the region 10 by means of complex analysis theory and discussed the technique’s its applicability. Each city and correlation coefficient was represented by a node and a link, respectively. Vital node identification and clustering analysis were conducted using adjacency information entropy and multi-community detection. The groups were clustered to reflect the spatial characteristics of climate. In addition, the climate links between each group were identified through the cross-mutual information considering the delay time for each group. It was concluded that the complex network analysis can be a useful method for analyzing spatial 15 relationship between climate factors.


Introduction
East Asia accounts for 54% of the global supply chain, providing a wide range of services and products across the world (Foley, 2020). However, East Asia is prone to major floods. According to the disaster database of the Centre for Research on the 20 Epidemiology of Disasters (CRED), between 2000 and 2020, an annual average of 165 flood disasters occurred worldwide, resulting in 5,278 deaths and economic damage up to US $29 million. While more than 22% of these flood disasters has occurred in East Asia, more than 60% of global-flood deaths and economical damage worldwide are in the region. For instance, Thailand recorded 813 deaths and US $40 million worth of damages from floods in 2011 (Haraguchi and Lall, 2015), while China recorded 300 fatalities and US $4.5 million in damages from floods in 2019 (CRED). These flood damages occurred in 25 several areas of East Asia simultaneously. Even though floods occurred simultaneously at distant places, the impacts of floods will propagate through supply chains, incurring economic losses in the entire region. In this sense, concurrent flooding causes severe life and economic losses in multiple countries at the same time, disrupting the global economy more severely. For example, in 2020, concurrent floods in East Asia inundated automobile factories in Thailand, disrupting automobile supply, https://doi.org/10.5194/hess-2021-343 Preprint. Discussion started: 14 July 2021 c Author(s) 2021. CC BY 4.0 License. and adversely affected China's rare-earth and fertilizer industries along the Yangtze river, affecting the global rare-earth 30 industry (AON. 2020).
Changes in rainfall characteristics caused by climate change are some of the primary causes of concurrent floods in East Asia.
These changes occur across all regions, and the changed characteristics affect each other, resulting in even larger changes (Zhili et al., 2020). Therefore, it is important to investigate the relationships among rainfall patterns in each region. Many studies have been conducted to identify rainfall relations in East Asia. Most of them investigated the relationship between 35 major East Asian countries using statistical techniques (Jeong et al., 2008, Kosaka et al., 2011, Deng et al., 2014, and some demonstrated connections between weather factors or indicators, sea level temperature, monsoons, etc. (Wu et al., 2003, Lau and Kim., 2006, Li et al., 2010, Sun and Wang, 2012, Wu, 2017. Researchers have also used teleconnection methods to discover relationships between precipitation in East Asia and other parts of the world (Kripalani and Kulkarni, 2001, Sahai et al., 2003, Lu and Lin, 2009, Lin, 2014, Preethi et al., 2017, Maity et al., 2020. The results of these studies have been used to 40 anticipate rainfall in East Asia and to aid in the preparation of flood disasters. In this study, we investigated the usefulness of complex network concepts for relationship analysis. Complex network theory, developed by Leonard Eüler in the 1980s, expresses and analyzes a subject or phenomenon as a graph. In the late 1990s, Watts et al. (1998) and Barabási and Albert (1999) extended the analytical technique, making the theory fundamental in network science. A complex network can display a complicated phenomenon as a simple graph. 45 Information obtained from the methodology can be used to identify the characteristics of subjects, their physical behavior, and the roles and relationships of the phenomenon's components. Complex network analysis has also been used in various fields because of its high applicability. For example, researchers have applied it to social networks (Michael et al., 2010), world trade (Brret et al., 2007), air transportation nets (Alessio et al., 2013), patterns in human migration (Davis et al., 2013), and among others. The analytical method has also been used in the fields of hydrology and meteorology fields to discover new patterns 50 and relationships (Donges et al., 2009, Stefania et al., 2013, Boers et al., 2015, Hong et al., 2020, Wolf et al., 2020. In particular, the method has also been used to analyze extreme rainfall patterns around the world (Boers et al., 2019), track rainfall events caused by typhoons (Ozturk et al., 2018), and study the spatial connectivity of rainfall (Naufan et al., 2018) to determine new information or characteristics.
With the encouraging results of previous rainfall-related studies, this study applied complex network theory to rainfall in East 55 Asia to understand the relationships between the rainfall patterns in each region. A complex network defines connectivity using correlation methods. Therefore, characteristics can be analyzed from the relationships. In addition, for clustering analysis, complex network-based methods consider the entire network, rather than the regions independently, unlike many other traditional methods. This feature results in a more accurate clustering (Xudong et al., 2020). Despite this advantage, one of the challenges in complex network theory is to identify thresholds, which determines whether the links existed. While no perfect 60 methodology exists to clearly address this challenge, new methodologies are constantly being proposed. In this study, we assumed that each region (node) is connected with all the other regions (nodes) in the network, and that each connection (link) has a correlation coefficient as a weight. This is because the weights used as input data in each analysis enabled the relationship https://doi.org/10.5194/hess-2021-343 Preprint. Discussion started: 14 July 2021 c Author(s) 2021. CC BY 4.0 License. between regions to reflect in the network and be analyzed. We assessed the effects of each region through centrality analysis and grouped the regions according to clustering analysis. Subsequently, mutual information (MI) was calculated with a time 65 lag (i.e., cross-mutual information) to identify the relationships between each group.
The remainders of this paper were organized as follows. Section 2 describes the study area and data used in this study. The complex network theory and related indicators are detailed in Section 3. Section 4 presents the results of the complex network analysis of East Asia and a discussion of these results. Section 5 presents the conclusions of this study.

Study area and materials 70
In this study, the major cities in East Asia were studied (Fig. 1). Among the cities in East Asia, we used selected cities from the World Bank Group report called "East Asia's Changing Urban Landscape" (The World Bank, 2015). However, we excluded Surabaya, Jakarta, and Badung (Indonesia) from the selected cities because of the changes in the location of rainfall observation since 2007. Instead, we included Ho Chi Minh City, Hai Phong (Vietnam), and Cebu (Philippines), which are economically active. Thus, a total of 24 cities were selected. 75 This study used daily precipitation data from the Asian Precipitation -Highly Resolved Observational Data Integration Toward Evaluation (APHRODITE) girded precipitation dataset (Akiyo et al., 2012). The APHRODITE data contains long-term, highresolution daily rainfall data of the Asian continent obtained from the dense precipitation observation data network (Fig. 2). 80 The data were obtained from the APHRODITE Water Resource project conducted by the Research Institute for Humanity and Nature (RHIN) and the Meteorological Research Institute of Japan Meteorological Agency (MRI/JMA) and have been used in many studies because of their high definition. Daily rainfall data for each city consisted of observations from January 1, 1981, to December 31, 2015. The basic statistics for each city's rainfall data are the same as those listed in Table 1. The city with the largest rainfall on average is Taipei, which is approximately six times more than that of Bangkok, which has the lowest. Bangkok has the largest variation, and Kula Lumpur has the least. 90

Complex network analysis
Complex network analysis effectively visualizes a subject or phenomenon using a network and analyzes its characteristics, components, and relationships among nodes in the network. To apply complex network analysis, nodes and links must be defined. A node is a fixed element that serves as a point of intersection/joining within a network. For example, in the global airways network, airports become nodes. A link is an element that connects each node. In a global airways network, airways are the links. Defining these two elements is crucial in the analysis because even networks with the same number of nodes and links can assume various forms (Fig. 3). In a complex network, the links are the most influential aspects of the network. This is because the type and characteristics of 100 the graph vary depending on the type of link used and how it is defined. Based on the directionality and weight of the link, the network can be an undirected/directed network or an unweighted/weighted network. Generally, actual systems such as transportation systems or the Internet do not require links to be defined. However, if uncertainty occurs in the connection, researchers must define them. The most widely used methodology is the correlation coefficient. Depending on the value of the correlation calculated between two nodes, the researcher can define whether a link exists. While various previous studies used 105 the Pearson correlation coefficient for links, they tend to derive inaccurate values if they were applied to nonlinear data. To address this problem, some researchers have utilized MI as an alternative (e.g., Donges et al., 2009, Kim et al., 2019, Ghorbani et al., 2021. MI is based on the information and probability theory. For two variables (A and B), it quantifies and represents the amount of information of B contained by the variable A.
Here, ( ) and ( ) are probability distributions of variables, and ( , ) is the joint probability density function of the 110 variables. MI values range from 0 to ∞, and an MI of 0 indicates that the two variables are independent of each other. MI can consider the nonlinearity of the data and has the advantage of calculating the correlation between different data sizes (Goyal, 2014). it applicable to all types of networks. This method has better results than the existing methods. The procedure of the method 120 is as follows:

Vital node identification using adjacency information entropy
First, calculate the adjacency degree ( ) of each node.
Here, is the degree(weight) of node j, and is a group of nodes that form links with node j.
Second, calculate the selection probability ( ). After comparing the calculated adjacency information entropy of each node, the importance is determined according to the descending power.

Multiresolution community detection in weighted complex networks
A complex network consists of many nodes and links. Some of the nodes with strong relationships or similar characteristics can be clustered together. These clusters have several features and perform specific network functions. However, the cluster 130 results depend on the level of analysis. Therefore, the multi-resolution community detection method can be a useful method for understanding complex networks (Newman, 2012). Several cluster analysis methods have been used for complex networks, but they require intense computations for complicated network shapes and focus only on graphical properties (Long and Liu, 2019). To address these problems, Hao and Xiao (2020) proposed a new clustering methodology using an intensity-based community detection algorithm (ICDA) in weighted networks. The procedure for the proposed method is as follows: 135 First, calculate the link intensity ( ) of each link.
Here, σ( ℎ ( , )) is the sum of link weights in the path through p links from node i to node j, P is the parameter of the path, and is a polygonal effect parameter. For edge between node i and node j, and are their respective strengths.
Second, identify the links with link intensity greater than the selected threshold and create a group of nodes with the identified links. 140 Here, t (0 < t ≤ 1) is the selected threshold, and is a group of nodes.
Third, calculate the belonging coefficient ( ) of the nodes in .
This method has the advantage-s of forming groups more accurately and a faster computational speed than other methods.

Construction of East Asia rainfall network
In this study, we described the shape of a rainfall network as being weighted and undirected. In the network, each node was selected from 24 major cities, and link weights represented shared knowledge between nodes. Each node had different link weights. Table 2 compares the results of the link weights of the nodes. According to Table 2, the ranges of average, maximum, and minimum link weights were 0.22-0.37, 0.27-1.67, and 0.13-0.24, respectively. The average and minimum values had a narrow range, whereas the maximum values had a relatively wide range.
We observed that the cities with the maximum values for each node were closely located. This is because the characteristics of rainfall in cities located in similar areas are similar, ; thus, the value of the MI is high. Various cities had maximum weights for each node, whereas the minimum weights were restricted to a few cities. Beijing and Tokyo were selected as the cities with 155 the lowest MI value eight and seven times, respectively. This was because the two cities are on the outskirts of East Asia.

Vital node identification by adjacency information entropy
For the network, we applied vital node identification (VNI) to determine the influence of nodes. VNI can be used to analyze all types of networks and more precisely determine the effects of nodes more accurately. The results for the 24 nodes are shown in Table 3. 160 The cities with high ranking nodes were located around the South China Sea (Fig. 4). In addition, they had a large average of MI. Cities with the low-rank nodes were in the northeast outskirts, except for Taipei, and had a low mean of MI. From these results, we can deduce that the location of a node affects its influence. High-rank nodes were in the center of the map and had many neighboring nodes. The characteristics of low-rank nodes were diametrically opposed to those of high-rank nodes. 165 https://doi.org/10.5194/hess-2021-343 Preprint. Discussion started: 14 July 2021 c Author(s) 2021. CC BY 4.0 License.
However, location is not the only factor affecting vital node identification. Despite its proximity to the South China Sea, Taipei had a low ranking because its link weights were the smallest on average (0.220).

Clustering analysis using multiresolution community detection 170
In the clustering analysis, multiresolution community detection was applied to 24 nodes to create groups. After calculating the belonging coefficient, we determined the groups based on the threshold value. To form a group of nodes with strong relationships, the threshold value was the 95 th quartile of the calculated belonging coefficient, 0.06.   Nodes in close proximity formed a group (Table 4 and Fig. 5). The cities of Seoul (South Korea) and Kuala Lumpur (Malaysia) were not clustered with the others. Because Seoul is located far from other nodes, it had low MI values; thus, its belonging coefficients were low. For the Kuala Lumpur node, the belonging coefficient calculated with other nodes was between 0.03 and 0.05. After cluster analysis, we selected representatives from each group ( Table 5). The node with the largest sum of belonging coefficients with nodes in the same group was considered the group's representative. However, we could not select representatives from groups with only two nodes because their sum of belonging coefficients was the same. As a result, G5, G7, and G8 were excluded.

Chengdu
When the average link weights of selected nodes in each group were compared, they had a high average link weight and were 185 ranked high in the vital node identification.

Relationship between node groups
Nodes were grouped based on their belonging coefficients (Section 4.3). The relationships between the groups were determined using cross-mutual information analysis. Cross-mutual information is a methodology for calculating MI by adding time lags between targets. It can estimate an appropriate correlation coefficient by considering the time intervals for geographically 190 distant points. In this study, the time lag ranged from −10 to 10 days, and we checked the maximum cross-mutual information value and corresponding time lag of each group (Table 6).  As Fig. 6 shows, most of the groups have strong relationships with G5 or G6 with maximum cross-mutual information values.
G5 and G6 have a maximum cross-mutual information value with each other, and this value is larger than other cross-mutual 200 information results. This result indicated that the two regions have a comparatively high relationship. A comparison of the lag times that forms the maximum cross-mutual information indicated that the maximum values were less than five days. Therefore, East Asian regions can affect each other within five days. The relationships in Fig. 6 and Table 6 were derived from the characteristics of East Asian rainfall. Indian and East Asian monsoons are major factors affecting rainfall in East Asia. The to East Asia. This can reach the northern part of China (Renguang and Yang, 2017). If the Indian monsoon is strong, large amount of rainfall can occur in India and northern China. This characteristic was observed in the relationship between G5 and G4 (G5 is the first place affected by the Indian monsoon in East Asia, and G4 is the one in northern China). The Indian monsoon moves northwest from the Bay of Bengal, passing mainland China into the Sea of Okhotsk. G5, G6, and G7 in this pathway are related to each other by the Indian monsoon. The effect of the South China Sea, which supplies vapor to the 210 mainland, was contained in the relationship between G1 and G6. In the summer, vapor from the South China Sea causes much rainfall in southern China and arrives in the mainland (Kanaly et al., 1996 andWei et al., 2019). Similar to the Indian Monsoon, the East Asian monsoon affects East Asian rainfall. The East Asian monsoon begins in the Western Pacific, moves eastward through Indonesia, and ends in Japan and South Korea. If it is strong, it affects southern Vietnam and Thailand (Rehe and Akimasa, 2002). This was observed in the relationship between G8 and G5. In the summer, there is an anomalous anticyclone 215 between China and Korea. The anomalous anticyclone located in the western sea forms a clockwise wind cycle throughout China, Korea, and Japan (Rengquang, 2017). This created wind cycle transports vapor from Japan to the east and the center of China. This phenomenon formed relationships between G2 and G6 and between G3 and G6.

Discussion
Complex network analysis has the advantage of reducing complex phenomena or systems to a graph form, making it easier to determine characteristics. In addition, it can be used to analyze the effects of network components, perform clustering analysis, etc. In this study, we used these merits to examine the relationships between major cities in East Asia. 225 To create a rainfall network, we first calculated the MI between nodes and used it as the link's weight. Thus, the network could reflect the correlation of rainfall in each city and was used as the most important factor in subsequent analyses. To check the effects of nodes in the network, the adjacency information entropy was calculated and compared. The results indicated that nodes surrounding the South China Sea were highly ranked, and the location of the node was an important factor in identifying vital nodes. The South China Sea is one of the main vapor providers in East Asia, and the two major monsoons pass through 230 it. Vapor from the South China Sea first affects coastal cities and then moves to other cities in the continent. Thus, rainfall from some cities affects the neighboring cities. Based on this phenomenon, cities located in the South China Sea ranked high.
As described in Section 4.3, the coefficient of each node was calculated using the link weight. Each group consisted of nodes located nearby, and their coefficients were significantly higher than those of the other nodes. The correlation values of representatives in groups were higher than the nodes in the same group. We observed that a city with a high correlation with 235 nearby cities had a significant influence and could be used to delegate a group. After clustering, we applied cross-mutual information analysis to determine the relationships between groups. During the analysis, the lag time was considered because the groups were geographically separated. The cross-mutual information results were interpreted using the rainfall characteristics of East Asia. Two monsoons (Indian and East Asian monsoons) and anomalous anticyclones affected group relationships. An interesting result was the strong relationship between G5 and G6. Even with G7 between them, they have a 240 strong correlation. Previous research has primarily focused on the relationship between southern China (G1) and regions surrounding the East China Sea (G5, G7, and G8) (Tie and Xiushu, 2008, Wenting et al., 2014, Zhifei et al., 2016. These studies analyzed the effects of monsoons in the East China Sea but did not expand the region to G6. Therefore, research into the physical interpretation of the link between the G5 and G6 regions is required.
The complex network facilitated a simple analysis of the relationship between East Asian cities. We observed new relationships 245 and several characteristics of rainfall in East Asia. The analysis results confirmed that the complex network methodology is an effective method for studying the relationship between weather phenomena and indicators.

Conclusions
Concurrent floods in East Asia inundate the firm's production facilities at multiple locations simultaneously, causing supply chains disruptions at the global level. In this study, we analyzed the spatial relationships between major cities in East Asia 250 using a complex network. The East Asia rainfall network was composed of major cities (nodes) and correlation coefficients (links). After the network was created, vital node identification and clustering analysis were conducted using adjacency information entropy and multi-community detection. Cross-mutual information defined relationships between cluster groups https://doi.org/10.5194/hess-2021-343 Preprint. Discussion started: 14 July 2021 c Author(s) 2021. CC BY 4.0 License.
in East Asia. The results revealed that the network reflected the rainfall characteristics of East Asia and the relationships significantly affected vital nodes and clustering analysis. In addition, we observed that Southeast Asia and northwest China 255 have a strong relationship. The study observed that although the computational burdens of implementing complex network analysis is not so high, the method accurately reflects the relationship between regional rainfall and can be used to analyze the relationships between various weather factors. In a subsequent study, we will evaluate the applicability of the complex network methodology to interpret key climate factors, such as ENSO, IOD, and NAO, which have complex interconnection characteristics. 260

Code/Data availability
The Asian Precipitation -Highly Resolved Observational Data Integration Toward Evaluation (APHRODITE) girded precipitation dataset is available online at APHRODITE's Water Resources (http://aphrodite.st.hirosaki-u.ac.jp/).