Köppen versus the computer : an objective comparison between the Köppen-Geiger climate classification and a multivariate regression tree

Introduction Conclusions References Tables Figures


Introduction
The goal of a global climate classification is to divide the world's land surface into areas with homogeneous climate conditions.Intra-annual variability in temperature, precipitation, and other climate variables should be as similar as possible within a given Figures

Back Close
Full remains that the K öppen-Geiger classification has stood the test of time, despite rapid improvements in computer processing power and the attendant expansion of global climate observing networks.Reasons for the longevity of the K öppen-Geiger classification have been explored by a number of authors (see, for example, Peel et al., 2007 and references therein).In short, these stem from the system's "historical inertia", which is primarily due to its use in teaching first courses in physical geography, meteorology, and climatology; its definition in terms of simple climate variables; and its rule-based nature.
The K öppen-Geiger climate classification is defined in terms of rules applied to variables derived from long-term monthly mean temperatures and precipitation amounts (see Table 1).Other manual classifications, for example the "rational classification" of Thornthwaite (1948), follow a similar rule-based approach, although the climate variables are not as conceptually simple or as widely monitored as those in the K öppen-Geiger system.Thornthwaite, for example, includes potential evapotranspiration as a key classification variable.With the advent of the computer, however, climate classifications have typically been performed using automated clustering algorithms (see, for example, Fovell and Fovell, 1993).While a cluster analysis may define a set of well-separated, homogeneous climate regions, the simplicity and interpretability of traditional rule-based methods is, to an extent, sacrificed.Determining rules that define climate classes obtained using cluster analysis techniques is often not possible, and, as a result, clear interpretation of the results can be somewhat challenging.This is the case because most automated clustering methods, including the majority of hierarchical and partitioning cluster analysis techniques are polythetic: i.e., cases are assigned to clusters based on information from all variables (Kaufman and Rousseeuw, 1990).In contrast, the use of simple rules in traditional climate classifications makes their interpretation relatively straightforward.
While less common than polythetic algorithms, monothetic algorithms have also been used for cluster analysis.Monothetic algorithms separate cases via a hierarchy of decision rules.Each new rule is defined using a single variable and adds a Introduction

Conclusions References
Tables Figures

Back Close
Full cluster to the analysis; the combined set of rules can be used to assign each case to exactly one cluster.This style of cluster analysis is much closer in spirit to traditional climate classifications than polythetic methods.The most common monothetic algorithm is, however, restricted to clustering binary data; Williams and Lambert (1959) designed association analysis, a divisive hierarchical algorithm, to cluster ecological data based on binary variables indicating the presence or absence of species at sites.On the other hand, classification and regression trees (Breiman et al., 1984), more generally referred to as recursive partitioning trees, work in a similar manner but are more flexible and are not restricted to binary data.Despite being conceptually similar to association analysis, recursive partitioning models have generally been restricted to supervised classification or regression tasks.In such examples, variability in a predictand variable is explained by splitting cases according to rules defined using a separate set of predictor variables.Multivariate regression trees (MRTs), recursive partitioning models with multiple predictands, have also been proposed and applied to a diverse set of prediction problems (Segal, 1992;Yu and Lambert, 1999;De'ath, 2002), including those in synoptic climatology (Cannon et al., 2002;Cannon, 2008).
As pointed out by Kaufman and Rousseeuw (1990, Sect. 5.3) and Ripley (1996, p. 321), a MRT, when used in an auto-associative fashion (i.e., without distinction between predictors and predictands), is an unsupervised clustering algorithm.Following the same reasoning, Chavent (1998) devloped auto-associative MRTs for use in cluster analysis.The MRT is, in theory, very well suited to climate classification.It is automated, which removes the need for a practitioner to define the regions; it is hierarchical, which allows a series of nested classes to be defined; and it is rule-based, which allows climate classes to be unambiguously defined and easily interpreted.The goals of this study are to (i) develop a rule-and computer-based climate classification using the MRT algorithm; (ii) to compare the performance of the MRT climate classification against the K öppen-Geiger climate classification.Introduction

Conclusions References
Tables Figures

Back Close
Full

K öppen-Geiger climate classification and data
The rules that define the K öppen-Geiger climate classification are applied to variables derived from long-term climate normals of monthly mean temperatures and precipitation amounts.Version 1.4 (release 3) of the WorldClim 1950-2000 climate normal dataset is used in this study.Station observations and methods used to develop the dataset are described by Hijmans et al. (2005).Although WorldClim consists of globally gridded climate normals at a resolution of 30 arc seconds, the 10 arc minute version of the dataset (>500 000 grid points) is used here to lighten the computational burden of running the classifications.The K öppen-Geiger climate classification used in this paper follows the rules given by Peel et al. (2007) with the order of application recommended by Kottek et al. (2006).
Table 1 lists the classification rules and definitions of the 10 climate variables (MAP, MAT, T hot , T cold , T mon10 , P dry , P sdry , P wdry , P swet , and P wwet ) used in the classification system.The system is hierarchical.The first level has 5 classes, the second has 13 classes, and the third has 30 classes.

Multivariate regression tree
A MRT is structured as a binary tree with nodes defined by simple decision rules applied to predictors X i l , where i and l are indexes over cases and predictor variables respectively.All cases start out assigned to a single node.Cases in this top-level node are divided into two groups by a decision rule defined by one of the predictors (e.g.

MAT < 12
• C).Depending on the outcome of the decision, cases follow one of the two branches from the node.New decision rules are created at nodes by a splitting algorithm until one or more stopping criteria are met.Cases are assigned to classes based on the terminal (unsplit) nodes they reach in the tree.
A splitting algorithm selects the decision variables and their thresholds.At each step it is responsible for determining the decision rule that will best partition the remaining Introduction

Conclusions References
Tables Figures

Back Close
Full cases into classes that are as homogeneous as possible.Homogeneity is measured with respect to a set of predictands Y i j , where j is an index over the predictand variables.This requires a quantitative measure of error (or heterogeneity) for nodes in the tree to be defined.Each node is characterized by the multivariate predictand mean, i.e., the centroid, of its assigned cases.The error measure for the kth node is the within-cluster sums of squared deviations from the centroid where Y i j k is the value of the j th of J predictand variables for the i th of N k cases assigned to the node; the dot indicates the mean value for that subscript.The overall error for a tree is calculated by summing the WSS k values over the K terminal nodes (2) To build the tree, splits are chosen to maximize the decrease in WSS k between the existing parent node WSS A and the new child nodes WSS B and WSS C ∆WSS = WSS A − (WSS B + WSS C ). (3) The search for the best split is exhaustive, considering each value of the predictor variables as a potential threshold.Data are split until each terminal node contains a preset minimum number of cases, until no further splits can be made because the error measure has been minimized, or until a desired level of complexity has been reached.A hierarchy of nested sub-trees is created through this procedure.Once a tree has been built, the proportion of explained predictand variance EV can be calculated as

Conclusions References
Tables Figures

Back Close
Full where WSS T is the total within-cluster sums of squared deviations for the top-level node.Splits in a MRT can be used to gain insight into the importance of predictor variables.Direct analysis of the splits present in a tree may not, however, be informative, as variables that happen to be highly correlated with the splitting variables may not appear in the tree at all.Instead, Breiman et al. (1984) suggest another measure of variable importance calculated using the best splits and a set of alternate splits defined at each node.Once the best split has been found for a given node, surrogate splits using each of the remaining predictor variables are also investigated.Surrogate splits are chosen to mimic the behavior of the best split (i.e., send cases to the same nodes as the best split).Candidates are compared against a naive rule that sends data to the node that received the majority of cases from the best split.Splits that are better predictors than the naive rule are ranked and the best is selected as the surrogate split for a given variable.The ∆WSS value associated with the surrogate split is then used as the measure of variable importance at that node.Variable importance for the tree is taken as the sum of the ∆WSS values for surrogate splits and best splits at all nodes.Surrogate splits also allow cases to be assigned to clusters when variables involved in the primary splits are missing.The best surrogate rule for the non-missing variables is substituted for the primary rule.This allows cases with missing information to progress down the tree and be assigned to classes.
Once the tree has been created, new cases can be assigned to classes using the splits defined at the decision nodes.As mentioned above, each node is characterized by the centroid of cases assigned to it during fitting.These values serve as predictions for new cases, and estimates of predictive error, explained variance, etc. can be estimated for the model.This predictive aspect of the MRT is shared by some partitioning techniques, for example k-means clustering.Unlike these methods, however, the MRT only requires values to be measured for predictor variables involved in the decision rules.This ability to identify a subset of important predictor variables is another advantage of the MRT approach to clustering.Introduction

Conclusions References
Tables Figures

Back Close
Full

MRT climate classification
The MRT algorithm is used to define a hierarchical, rule-based climate classification modelled after the K öppen-Geiger climate classification.To this end, the 10 climate variables from the K öppen-Geiger climate classification are used as predictor variables in the MRT model.For sake of simplicity and for consistency with the K öppen-Geiger rules, MAT, T hot , and T cold are rounded to the nearest • C; MAP is rounded to the nearest 100 mm; and P dry , P sdry , P wdry , P swet , and P wwet are rounded to the nearest 10 mm.
As the ultimate goal is to separate the globe into climate classes with distinct intraannual temperature and precipitation patterns, predictand variables are the 12 monthly mean temperature normals and the 12 monthly precipitation normals.For locations in the northern (southern) hemisphere, predictands are specified starting from January (July) and proceeding to December (June).To account for the poleward decrease in geographical area of grid points in the WorldClim dataset, the Mollweide equal area projection is applied prior to classification.Finally, predictands are rescaled to zero mean and unit standard deviation so that temperature and precipitation are given approximately equal weight in the calculation of WSS.(Note that while MRT models are fitted using computer code written by the author, results are verified using the "mvpart" package (Therneau et al., 2010) for the "R" programming language (R Development Core Team, 2009), which is free and open source software.) The MRT algorithm is terminated after making 29 splits.While the MRT provides a complete hierarchy of solutions, in this case from 2 to 30 classes, three hierarchical levels ( 5 All variables except for T mon10 are featured in primary splits of the 30 node MRT.When accounting for both primary and surrogate splits, all variables are assigned nonzero values of predictor importance, with MAP, T cold , and P wwet ranked as the three most important predictors in the MRT (Fig. 4).

Comparison of climate classifications
Before attempting to objectively compare the classification performance of the K öppen-Geiger and MRT climate classifications, it is instructive to consider global maps (Fig. 5), monthly distributions of temperature and precipitation (Fig. 6), and a cross-tabulation of classes (Table 2) at the simplest 5 class level.The two classifications bear some subjective similarities.When looking across classes, the largest cross-tabulation counts match for K öppen-Geiger class A (Tropical) and MRT class 2; classes B (Arid) and 3; and classes D (Cold) and 4. Monthly mean temperatures and precipitation amounts for these cases are, as expected, quite similar.On the other hand, while K öppen-Geiger class C (Temperate) matches best with MRT class 3, it also shows modest association with MRT classes 1, 2 and 4. K öppen-Geiger class E (Polar) is strongly associated with MRT class 5, but class 5 shows slightly better correspondence with K öppen-Geiger class D. What is driving these differences?Of the 5 primary K öppen-Geiger classes, only class B is decided by a rule involving precipitation.The remaining 4 classes are discriminated in terms of temperature.Contrast this with the MRT, in which rules leading to 3 of the 5 classes (classes 1, 2, and 3) feature precipitation variables and where the measure of predictor importance indicates that MAP is the most important discriminatory variable (Fig. 4).Are there then systematic differences in performance of the classification systems with respect to temperature and precipitation?If so, do these differences persist from the 5 class solution to the 13 and 30 class solutions?
To answer these questions, values of EV and their 95% confidence intervals are calculated for the monthly temperature and precipitation variables at the 5, 13, and 30 class levels.The effective number of samples used to estimate the confidence intervals Introduction

Conclusions References
Tables Figures

Back Close
Full is reduced based on the spatial autocorrelation of nearest neighbours (Bretherton et al., 1999).Results are shown in Fig 7 .Averaged over the year, the EV for monthly temperatures is 76.7% (5 classes), 79.0% (13 classes), and 90.3% (30 classes) for the K öppen-Geiger classification, versus 83.1% (+6.4% difference), 90.8% (+11.8%difference), and 93.9% (+3.6% difference) for the corresponding MRT solutions.Values of EV for monthly precipitation are 40.8%,62.5%, and 63.7% for the K öppen-Geiger classification, versus 51.2% (+10.4%),68.6% (+6.1%), and 77.4% (+13.7%) for the MRT.On a month by month basis, results for the MRT are generally significantly better than those for the K öppen-Geiger classification.In 60 of the 64 cases in which confidence intervals do not overlap, results indicate better performance by the MRT.Notable exceptions include better discrimination of Northern Hemisphere July (Southern Hemisphere December) temperatures (5 and 30 classes) and Northern Hemisphere August and September (Southern Hemisphere January and February) precipitation amounts (5 classes) for the K öppen-Geiger classification.
One potential criticism of the analysis above is that the values of EV are calculated over the monthly variables used as predictands in the MRT.As a result, the MRT may hold an unfair advantage over the K öppen-Geiger classification.While this is somewhat difficult to avoid, one alternative is to calculate values of EV for the suite of 10 variables used to define the classification rules for both the K öppen-Geiger and MRT classifications.Since these variables are derived from the monthly variables, the problem, to an extent, remains.Nevertheless, results are shown in Fig. 8.The K öppen-Geiger classification performs worse than the MRT in all 26 instances in which significant differences are found.To illustrate the consequences of differences in EV between the two classification systems, Fig. 9 shows scatterplots of observed MAT and MAP versus predicted class centroids at the 30 class hierarchical level.As expected, spread about the one-to-one line is similar between the classification systems for MAT, but the MRT does a much better job at discriminating wet from dry climates than does the K öppen-Geiger classification.For example, the 99.9 th percentile of observed MAP exceeds 4800 mm (maximum of 8000 mm), while MAP for the wettest K öppen-Geiger Introduction

Conclusions References
Tables Figures

Back Close
Full class centroid is less than 2700 mm.The corresponding value for the wettest MRT class exceeds 4300 mm.Finally, to provide a more level playing field, restricted MRT climate classifications -ones in which a continent is removed during model fitting -are also developed.Following model fitting, withheld points are classified according to the MRT rules and predictions are made for monthly temperatures and precipitation amounts.Values of EV are then calculated and compared against those for the classifications based on all grid points.As rules and centroids for both the K öppen-Geiger and original MRT classes are calculated based, in part, on data from the withheld continents, the restricted MRT models are at a substantial disadvantage in the "leave-one-continent-out" validation.Mean results for temperature and precipitation at the 30 class level are shown in Fig. 10 for North America, South America, Africa, and Australia.While, as expected, the restricted MRT models perform worse than the original MRT, they typically explain more variance than the K öppen-Geiger classification.For temperature, differences in EV between the restricted MRT and the K öppen-Geiger classification are +2.7%,−2.8%, +13.1%, and +0.9% for North America, South America, Africa, and Australia respectively; for precipitation, the respective differences are +17.3%,+6.9%, +14.8%, and +1.9%.

Summary and conclusions
A global climate classification is defined using a MRT model, which is an example of an automated, hierarchical, and rule-based clustering algorithm.Climate variables used to define the K öppen-Geiger climate classification are also used to construct the MRT classification.An objective analysis of classification performance shows that the MRT performs significantly better than the K öppen-Geiger classification, in particular for measures of precipitation.
For simplicity, the MRT climate classifications in this paper only consider binary splits based on a single boolean operator.Linear combinations of predictor variables can be Introduction

Conclusions References
Tables Figures

Back Close
Full used in MRT splits, although this would tend to lead to a less easily interpreted tree structure.The trade-off between interpretability and performance has not been fully explored.Similarly, the "greedy" optimization of splitting rules in a MRT can lead to a sub-optimal classification.As each split in the tree is chosen only to minimize the squared error at a particular node, the final tree is not guaranteed to reach the global minimum value of WSS.One way of improving the final solution is to revise previous splits as new splits are added.Chavent (1998) introduced a single-level revision step that adjusts the previously added split so that the WSS of the affected classes is minimized.While this improves the final solution, there is still room for improvement as the search only considers the previous split, not all splits in the tree.Alternatively, one can either adjust all splits after a tree is built or build a tree from scratch using some form of global search algorithm.Such options are left as avenues for future research.Note also that the final partitioning of cases into classes by a MRT is insensitive to monotonic transformations of the predictor variables.For instance, cases will still be sent down the same branches of the tree regardless of the units in which variables are measured.Scaling of predictand variables (and, similarly, redundancy between predictands), however, will affect the final classification.Fovell and Fovell (1993) and Mimmack et al. (2001) discuss biases related to scaling and redundancy in climatological cluster analyses.Redefinition of predictands using methods such as truncated principal component analysis may be warranted, depending on the problem.The MRT is somewhat unique because variables can be scaled or transformed differently depending on whether or not they are acting as predictors or predictands.For example, variables expressed in their original units could be used as predictors and a set of standardized principal components of the same variables could be used as predictands in the classification.In this study, monthly temperature and precipitation predictand variables are standardized to zero mean and unit standard deviation, but no further attempt is made to reduce redundancy or differentially weight temperature and precipitation.
The goal and format of this paper prevents a full exploration of the MRT climate classification, especially at the 13 and 30 class levels.As a means of fostering further Introduction

Conclusions References
Tables Figures

Back Close
Full work, Walter-Lieth climate diagrams for each class (Heinrich and Helmut, 1967;Guijarro, 2011) and a gridded global dataset of the system are provided as supplementary material.It is hoped that the results will be explored further using these resources.Finally, data and software packages mentioned in this study are freely available, which means that they can also be employed by instructors and students as a means of interactively exploring modern, computer-based climatological methods.Irrespective of whether or not the MRT climate classification outlined here gains any measure of "historical inertia", the underlying tools and datasets provide a rich environment for exploration and deserve a place alongside the K öppen-Geiger classification.Introduction

Conclusions References
Tables Figures

Back Close
Full  1. K öppen-Geiger climate variables and classification rules (after Peel et al., 2007 andKottek et al., 2006).Rules for climate class E must be applied first, then those for class B, and finally those for classes A, C, and D. For locations that satisfy both Cs and Cw (or Ds and Dw) rules, the w classification is applied if summer precipitation exceeds winter precipitation.In the following, temperature variables are measured in • C, precipitation variables are measured in mm, and summer/winter is defined as the warmer/cooler of the six month period of October-March/April-September.MAP = mean annual precipitation; MAT = mean annual temperature; T hot = temperature of the hottest month; T cold = temperature of the coldest month, T mon10 = number of months above 10 • C; P dry = precipitation of the driest month; P sdry = precipitation of the driest month in summer; P wdry = precipitation of the driest month in winter; P swet = precipitation of the wettest month in summer; P wwet = precipitation of the wettest month in winter; P threshold = precipitation threshold, which varies according to the following rules: if 70% of MAP occurs in winter, then P threshold = 2 × MAT, if 70% of MAP occurs in summer, then P threshold = 2 × MAT + 28, otherwise P threshold = 2 × MAT + 14.Full accounting for both primary and surrogate splits, all variables are assigned non-zero values of predictor importance, with MAP, T cold , and P wwet ranked as the three most important predictors in the MRT (Figure 4).

Comparison of climate classifications
Before attempting to objectively compare the classification performance of the Köppen-Geiger and MRT climate classifications, it is instructive to consider global maps (Figure 5), monthly distributions of temperature and precipitation (Figure 6), and a cross-tabulation of classes (Table ).The measure of predictor importance has been scaled to range from 0 (the predictor does not appear in any primary or surrogate splits) to 100 (the predictor is responsible for the greatest decrease in W SS).
13 Fig. 4. MRT climate classification predictor importance calculated from ∆WSS values for primary and surrogate splits (see Sect. 3).The measure of predictor importance has been scaled to range from 0 (the predictor does not appear in any primary or surrogate splits) to 100 (the predictor is responsible for the greatest decrease in WSS).Introduction Full    Full Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | , 13, and 30 classes) are defined for consistency with the K öppen-Geiger climate classification.The resulting MRT climate classification is summarized as a binary tree diagram in Fig. 1.Locations of climate classes are shown in Fig. 2. Values of EV for each number of classes are shown in Fig. 3.The 5, 13, and 30 class levels explain, respectively, 67.1%, 79.7%, and 85.6% of predictand variance.Little improvement would accrue with more classes (e.g., 35 classes results in 86.3% EV) suggesting that the 30 class level is a reasonable level of complexity for the classification system.Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Thornthwaite, C. W.: An approach toward a rational classification of climate, Geogr.Rev., 38, 55-94, 1948.2347 Wilcock, A. A.: K öppen after fifty years, Annals of the Association of American Geographers, 58, 12-28, 1968.2346 Williams, W. T. and Lambert, J. M.: Multivariates methods in plant ecology I: association-Discussion Paper | Discussion Paper | Discussion Paper | Table Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper | Discussion Paper |

Fig. 1 .Fig. 2 .
Fig. 1.Binary tree representation of the MRT climate classification rules.Climate variable names are given in Table 1.At each node, the decision rule is listed preceding the colon, while the order in which binary split was made is denoted by the integer value following the colon.Class names are given in bold.The 5 class level is assigned a number from 1-5.The 13 class and 30 class levels are denoted by adding letters to the numbers.If the rule immediately preceding the terminal node involves a precipitation variable, a W, w, D, or d is added, where the W or w indicates wetter conditions and the D or d indicates drier conditions.If the preceding rule involves a temperature variable, a H, h, C, or c is added, where the H or h indicates hotter conditions and the C or c indicates cooler conditions.Upper case letters are reserved for mean annual variables, whereas lower case letters are used for annual extreme or seasonal extreme variables.A map of class locations is given in Figure 2. 10 Fig. 1.Binary tree representation of the MRT climate classification rules.Climate variable names are given inTable 1.At each node, the decision rule is listed preceding the colon, while the order in which binary split was made is denoted by the integer value following the colon.Class names are given in bold.The 5 class level is assigned a number from 1-5.The 13 class and 30 class levels are denoted by adding letters to the numbers.If the rule immediately preceding the terminal node involves a precipitation variable, a W, w, D, or d is added, where the W or w indicates wetter conditions and the D or d indicates drier conditions.If the preceding rule involves a temperature variable, a H, h, C, or c is added, where the H or h indicates hotter conditions and the C or c indicates cooler conditions.Upper case letters are reserved for mean annual variables, whereas lower case letters are used for annual extreme or seasonal extreme variables.A map of class locations is given in Fig.2.

Table 2 .
Cross-tabulation of grid cells assigned to the K öppen-Geiger (A to E) and MRT climate classifications (1 to 5) at the 5 class level.Italicized (bold) entries indicate the largest value in a given row (column).

Table 1 19
Fig.8.As in Fig.7, but for climate variables defined in Table1.Introduction