Comparison of Agglomerative Hierarchy Methods in Grouping Cities in West Java Based on Gross Regional Domestic Product

How to cite this article: Sofhya, H. N., (2023). Comparison of Agglomerative Hierarchy Methods in Clustering Cities in West Java Based on Gross Regional Domestic Product. Eduma: Mathematics Education Learning and Teaching, 12(1), 101 111. doi: http://dx.doi.org/10.24235/eduma.v12i1.13216


INTRODUCTION
Gross Regional Domestic Product (GRDP) is one of the indicators that determine the economic development of a region (Romhadhoni et al., 2019).GRDP is the value of goods and services produced by a region in a certain period, calculated by adding up all the results obtained from both residents in the region plus foreigners working in the region.(Hartono et al., 2018).To achieve the success of economic development requires good cooperation between sectors of the economy, in cooperation resulting in each activity of the production sector has the power to attract (backward linkage), and the power to encourage (forward linkage) of each sector.Economic development is carried out by a developing country that has the aim of creating economic development that is felt by the community, increasing employment opportunities, reducing differences between regions, and a balanced economic structure.Data from the Badan Pusat Statistik (BPS) shows that West Java in 2022 is the province with the third highest GRDP in Indonesia.Economic development needs to be carried out to increase employment opportunities, reduce differences between regions and a balanced economic structure (Todaro & Smith, 2011).In reality, achieving high and increasing economic growth will not eliminate inequality in development.The most obvious development inequality is in the income aspect, which creates rich and poor groups, and the spatial aspect, which results in the existence of developed and underdeveloped areas.As a result, the ability of a region to drive the development process is also different.Development inequality can also be seen vertically, namely differences in income distribution, and horizontally, namely differences between developed and underdeveloped regions (Sjafrizal, 2008).Regional income inequality is a problem often faced by developing countries.Problems such as inequality and economic disparity are two major problems that are often experienced by developing countries, with the existence of inequality in regional income distribution, there will be a group of highincome people with low-income people.To measure income inequality, the BPS released the results of its assessment of the level of economic inequality through the gini ratio in West Java (Tambunan, 2001).Based on BPS data, the Gini ratio of West Java from 2018 to 2022 is obtained as presented in Figure 1.The higher the gini ratio close to 1 means the higher the gap that occurs in a region (Janah, 2022).
Based on the Gini ratio, it can be seen that the economic gap in West Java is still quite high and continues to increase from 2019.This gap can occur due to uneven economic development in cities in West Java.This cannot continue to be ignored, this gap needs to be reduced.Therefore, the West Java government needs to focus on improving the economy in areas with low economic conditions.One of the main indicators of the economic Gini Ratio West Java Figure 1.Gini Ratio West Java Chart condition of a region is through the amount of GRDP obtained by a region.In this study will be grouped Cities in West Java based on GRDP.It is intended that the government can know the group of cities in West Java that have high, medium and low GRDP.So that the government can provide the right policy for each regional cluster.The government can pay more attention to regional groups with low GRDP in West Java.
In cluster analysis, there are several methods that can be used.Generally divided into hierarchical and non-hierarchical clustering methods (Gustientiedina et al., 2019).Hierarchical clustering is a type of cluster analysis that aims to categorize a group by looking at the similarity of its characteristics (Ramadhani and Purnamasari, 2018).The results of cluster analysis with labor force indicator data in West Java comparing the hierarchical method and non-hierarchical method show that the hierarchical method has better cluster results when viewed from the Dunn Index obtained (Syafiyah et al., 2022).The hierarchical method is divided into agglomerative and divisive.The agglomerative method is divided into several methods including single linkage, average linkage, and complete linkage.The three methods differ in the selection of grouping criteria (Rachmatin, 2014).Comparison of clustering methods between single linkage and k-means methods in clustering a document gives the result that clustering using single linkage method is better than k-means method (Handoyo, 2014).Research on clustering of crop production using hierarchical clustering by comparing complete linkage and average linkage.The result of the research is that the average linkage method is better, because the standard deviation value of average linkage is 0.056 smaller than complete linkage (Mujiono & Sumartono, 2022).Meanwhile, research conducted on grouping sub-districts based on the value of livestock type variables and the methods used are three methods consisting of single linkage, complete linkage, and average linkage which will later be compared for the best method.The result of the research is that the complete linkage method is the best, because it is seen from the standard deviation ratio value of the complete linkage method which is 0.222 smaller than the others (Fikri & Ulinnuha, 2019).
Based on the research that has been done, it can be seen that the best clustering method can be different depending on the known data.Therefore, this study will cluster the cities in West Java based on GRDP using single linkage, average linkage and complete linkage agglomerative hierarchy methods.This is done to obtain the best cluster results for West Java cities. Determination of the best cluster results will be seen based on the ratio between the standard deviation between groups and the standard deviation within groups.A method used in forming clusters is said to be good if it has a standard deviation value in the cluster (S k ) minimum and maximum value of standard deviation between clusters (S b ) (Larasati et al., 2021).Thus the best clustering method is if the smallest ratio between standard deviation value in the cluster and value of standard deviation between clusters is obtained.the results of clustering using the best agglomerative hierarchy method in this study can be used by the government of West Java as a reference in determining economic policy for cities in West Java according to the cluster level of GRDP.

Population and Sample
The data used in this study is GRDP data at current prices for cities in West Java in 2022.Data obtained from BPS West Java.The clustering method used in this research is the agglomerative hierarchy method of single linkage, complete linkage, and average linkage.This method is used to cluster cities in West Java based on their GRDP.The steps taken in clustering using the agglomerative method are the first stage of calculating the distance between data.In cluster analysis using the hierarchical method, data classification is carried out based on the distance between data.Therefore, before clustering data, it is necessary to calculate the distance between data first.The most commonly used distance is the Euclidean distance (Suyanto et al., 2021).The Euclidian distance between two data expresses the similarity between the two data.The distance measure between the -th data and the -th data can be calculated through the calculation of the Euclidean square distance as follows: : distance between -th data and -th data   : the -th  data value   : the -th  data value   : the -th  data value   : the -th  data value In the hierarchical method data grouping is done based on the distance between consecutive data.In the single linkage method, clustering base on the smallest distance between two objects which will be the first cluster, and so on (Sofhya, 2023).In the initial stage, this method should be able to find the smallest distance within the  = {  } and combine corresponding objects, e.g.X, Y to be a cluster ().For further distances between clusters () and other clusters, suppose the following formula calculates : The values   and   are the shortest distance between clusters  and  also  and  clusters.In the second method, average linkage, the clustering criteria are based on the average distance between data (Prabowo et al., 2020).For further in average linkage, distances between clusters (XY) and other clusters suppose Z is calculated by the following formula: Where   is the distance between object -th in the cluster (XY) dan objek j-th in cluster Z, with   and   is the number of members in the cluster (XY) and Z.While in the third agglomerative hierarchy method, namely the complete linkage method, clustering is based on the farthest distance between one object and another.Furthermore, in the complete linkage method, the distance between cluster (XY) and another cluster, let's say Z, is calculated in the following way:  () = max{  ,   } (4) Where the values   and   are the farthest distance between clusters X and Z also Y and Z clusters.After clustering the data using the single linkage, average linkage, and complete linkage methods, the next step is to determine the best clustering method.A clustering method used in forming clusters is said to be good if it has a minimum standard deviation value within the cluster (  ) and a maximum standard deviation value between clusters (  ) (Larasati et al., 2021).Suppose S is the ratio between the value of the standard deviation within clusters and the standard deviation between groups then S can be written with the following formula: Thus the best clustering method is if the smallest  ratio value is obtained.

RESULT AND DISCUSSION
The cluster analysis process in this study used three agglomerative hierarchical methods, namely single linkage, complete linkage and average linkage.In the formation of clusters, a distance matrix was formed between cities in West Java consisting of 27 cities in West Java.The distance between cities in West Java is calculated by Euclidean distance through equation ( 1).The results obtained can be seen in Table 1.By calculating the Euclidean distance matrix, the smallest value between objects is obtained.The smaller the distance value between two objects, the more similar they are.The next step is to conduct a cluster analysis based on the Euclidean distance matrix obtained.

Single Linkage
In clustering with the single linkage method, the closest distance between two objects is calculated based on the Euclidean distance matrix that has been formed.In the hierarchical method, the number of clusters is not determined at the beginning but the number of clusters can be determined after obtaining the dendrogram of data analysis results.Through dendrograms, researchers can group data into several clusters by Based on the results on the dendrogram using single linkage if a vertical line is drawn, the cities in West Java can be formed into 3 large clusters based on the amount of GRDP.The results of clustering members using the single linkage method can be seen in Table 2. Based on the clustering results using single linkage, 3 large groups with cluster members are obtained as presented in Table 2.When viewed from the average of each cluster, it can be seen that cluster 1 is a cluster of cities in West Java with the first high GRDP with an average of 265130,8 billion Rupiah.Cluster 2 is a cluster of cities in West Java with medium or second highest GRDP with a cluster average of 185562,1 billion Rupiah.While cluster 3 is a cluster of cities in West Java with low GRDP with an average of only 34153,33 billion Rupiah.It can be seen that there is a very significant difference in average GRDP between clusters.The clustering results also show the inequality of GRDP by city in West Java.Based on the cluster results in Table 2, the standard deviation within the cluster (  ) is 14536,72 and standard deviation between clusters (  ) is 107062,203.Thus the ratio value () of standard deviation within clusters and between clusters using the single linkage method is obtained as follows 0,135778356.

Average Linkage
In clustering with the average linkage method, the closest distance between two objects is calculated based on the Euclidean distance matrix that has been formed.So that the Dendrogram of the results of the average linkage cluster members can be seen in Figure 3.
Based on the results of the dendrogram using average linkage, cities in West Java can be formed into 4 large clusters based on the amount of GRDP.The results of clustering members using the average linkage method can be seen in Table 3.

Numbers of members
Average GRDP (Billions Rupiah) Pangandaran, Bogor City, Sukabumi City, Cirebon City, Cimahi City, Tasikmalaya City, Banjar City Based on the clustering results using average linkage, 4 groups with cluster members are obtained as shown in Table 3.When viewed from the average of each cluster, it can be seen that cluster 1 is a cluster of cities in West Java with the first high GRDP acquisition with an average of 265130,8 billion Rupiah.Cluster 2 is a cluster of cities in West Java with the second highest GRDP with a cluster average of 185562,1 billion Rupiah.Cluster 3 is the cluster of cities in West Java with the third highest GRDP with a cluster average of 62534,22 billion rupiah.Cluster 4 is a cluster of cities in West Java with the lowest GRDP with an average of only 24136,542 billion rupiah.Clustering using average linkage provides different cluster results compared to single linkage.In this method, cities in West Java are divided into 4 clusters.Based on the results in Table 3, it can be seen that there is a very significant difference in average GRDP between clusters.The clustering results also show the inequality of GRDP by municipality in West Java.The average of cluster 4, which consists of 17 cities in West Java, has an average GRDP of only 24136,542 billion Rupiah, very much different from the average GRDP of Cluster 1, which is more than 10 times as much as 265130,8 billion Rupiah.Based on the cluster results in Table 3, the standard deviation within the cluster (  ) is 12247,22 and the standard deviation between clusters (  ) is 111101,8.Thus the ratio value () of standard deviation within clusters and between clusters using the average linkage method is 0,110234 .

Complete Linkage
In clustering with the complete linkage method, the farthest distance between two objects is calculated based on the Euclidean distance matrix that has been formed.So that the Dendrogram of the results of the complete linkage cluster members can be seen in Figure 4.
Based on the results on the dendrogram using complete linkage cities in West Java can be formed into 4 large clusters based on the amount of GRDP.The results of clustering members using the complete linkage method can be seen in Table 4. Based on the clustering results using complete linkage, 4 clusters were obtained with cluster members as shown in Table 4.When viewed from the average of each cluster, it can be seen that cluster 1 is a cluster of cities in West Java with the first high GRDP acquisition with an average of 265130,8 billion Rupiah.Cluster 2 is a cluster of cities in West Java with the second highest GRDP with a cluster average of 185562,1 billion Rupiah.Cluster 3 is the cluster of cities in West Java with the third highest GRDP with a cluster average of 59602,61 billion Rupiah.Cluster 4 is a cluster of cities in West Java with the lowest GRDP with an average of only 23019,266 billion Rupiah.The number of clusters in this method is the same as the number of clusters using the average linkage method.However, the members in cluster 3 and cluster 4 are different.In average linkage cluster 3 has 6 members and cluster 4 has 17 members.While in this method cluster 3 consists of 7 cities and cluster 4 consists of 16 cities, so the average of cluster 3 and cluster 4 in this method is different from the average of cluster 3 and cluster 4 in the average linkage method.Based on the clustering results using complete linkage in Table 4, it can be seen that there is a very significant difference in the average GRDP between clusters.Based on the cluster results in Table 4, the standard deviation within the cluster (  ) is 12221,48 and the standard deviation between clusters (  ) is 112106,9.Thus the ratio value () of standard deviation within the cluster and between clusters using the complete linkage method is 0,109016.

Comparison of cluster methods
In this study, the best clustering method was selected based on the ratio value between the standard deviation within the cluster (  ) and the standard deviation between clusters (  ).Clustering cities/districts in West Java using the single linkage, average linkage, and complete linkage methods gives different clustering results.The single linkage method obtained 3 clusters while the average linkage and complete linkage methods obtained 4 clusters.The three methods have the same members in Cluster 1 with only one member which is Bekasi.This is because Bekasi's GRDP is very high and far compared to other cities in West Java.Based on the clustering results of the three agglomerative hierarchical clustering methods, the ratio value is obtained as shown in Table 5.Table 5 shows that the value of the standard deviation ratio (S) of the complete linkage method of 0,109016 has a smaller standard deviation ratio value than the single linkage and average linkage methods.This means that the best method performance of the three methods used is the complete linkage method in clustering cities in West Java based on GRDP.

Conclusion
Clustering cities in West Java using single linkage, average linkage, and complete linkage methods gives different clustering results.The results of clustering using the three methods show significant average differences between the clusters obtained.Based on the comparison of standard deviation ratio values, the complete linkage method has a smaller standard deviation ratio value than the single linkage and average linkage methods, which is 0,109016.This means that the best method performance of the three methods used is the complete linkage method for clustering cities in West Java based on GRDP.The result of clustering using complete linkage is cities in West Java divided into 4 clusters.Cluster 1 is the cluster with the highest GRDP with one member, which is Bekasi.Cluster 2 is a cluster with a fairly high GRDP with three members, which are Bogor, Karawang, Bandung City.Cluster 3 with medium GRDP with seven members, which are Sukabumi, Bandung, Garut, Indramayu, Purwakarta, Bekasi City, Depok City.Last, cluster 4 is a cluster with low GRDP with sixteen members, which are Cianjur, Tasikmalaya, Ciamis, Kuningan, Cirebon, Majalengka, Sumedang, Subang, Bandung Barat, Pangandaran, Bogor City, Sukabumi City, Cirebon City, Cimahi City, Tasikmalaya City, Banjar City Implication cities in West Java can be divided into four economic levels based on their GRDP: very high, high, medium and low. to reduce inequality in West Java, the government can focus on cities that fall into the low economic category.

Figure 2
Figure 2Dendogram using single linkage

Figure 3
Figure 3Dendogram using average linkage

Figure 4
Figure 4 Dendogram using Complete linkage

Table 1
GRDP at Current Prices by City in West Java(Billion Rupiah)

Table 2
Clustering results of cities in West Java using single linkage method

Table 3
Clustering results of cities in West Java using average linkage method

Table 4 .
Clustering results of cities in West Java using complete linkage method

Table 5
Standard deviation values