Clustering
Definition Clustering is “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.
Pengklusteran merupakan pengelompokan record, pengamatan, atau memperhatikan dan membentuk kelas objek-objek yang memiliki kemiripan. Beberapa algoritma pengelompokkan diantaranya adalah EM dan Fuzzy C- Means
Clustering Main Features Clustering – a data mining technique Usage: –Statistical Data Analysis –Machine Learning –Data Mining –Pattern Recognition –Image Analysis –Bioinformatics
Notion of a Cluster can be Ambiguous How many clusters? Four ClustersTwo Clusters Six Clusters
Distance based method In this case we easily identify the 4 clusters into which the data can be divided; the similarity criterion is distance: two or more objects belong to the same cluster if they are “close” according to a given distance. This is called distance-based clustering.
Limitations of K-means: Non-globular Shapes Original Points K-means (2 Clusters)
Limitations of K-means: Differing Sizes Original Points K-means (3 Clusters)
Types of Clustering –Hierarchical Finding new clusters using previously found ones –Partitional Finding all clusters at once
Partitional Clustering Original Points A Partitional Clustering
Hierarchical Clustering Traditional Hierarchical Clustering Non-traditional Hierarchical Clustering Non-traditional Dendrogram Traditional Dendrogram
Algoritma Pengelompokan K-Means Langkah-langkah algoritma K-Means: 1.Tentukan berapa kelompok yang akan dibuat sebanyak k kelompok. 2.Secara sembarang pilih k buah catatan yang ada sebagai pusat-pusat keompok awal. 3.Setiap catatan akan ditentukan pusat kelompok terdekatnya. 4.Perbarui pusat-pusat kelompok. 5.Pusat kelompok yang terdekat pada setiap catatan akan ditentukan, dan seterusnya sampai nilai rasio tidak membesar lagi.
Rumus Jarak dua titik: Between Cluster Variation (BCV): BCV=d(m 1,m 2 )+d(m 1, 3 )+d(m 2,m 3 ) Dalam hal ini, d(m i, j ) menyatakan jarak m i ke m j Within Cluster Variation (WCV): WCV= (jarak pusat tiap cluster yang paling minimum) 2