Presentasi sedang didownload. Silahkan tunggu

Presentasi sedang didownload. Silahkan tunggu

K-Means Clustering. Mengelompokkan data-data menjadi beberapa cluster berdasarkan kesamaannya What is Clustering? Also called unsupervised learning, sometimes.

Presentasi serupa


Presentasi berjudul: "K-Means Clustering. Mengelompokkan data-data menjadi beberapa cluster berdasarkan kesamaannya What is Clustering? Also called unsupervised learning, sometimes."— Transcript presentasi:

1 K-Means Clustering

2 Mengelompokkan data-data menjadi beberapa cluster berdasarkan kesamaannya What is Clustering? Also called unsupervised learning, sometimes called classification by statisticians and sorting by psychologists and segmentation by people in marketing

3 What is a natural grouping among these objects?

4 School Employees Simpson's Family MalesFemales Clustering is subjective What is a natural grouping among these objects?

5 Two Types of Clustering Hierarchical Partitional algorithms: Membuat beberapa partisi dan mengelompokkan objek berdasarkan kriteria tertentu Hierarchical algorithms: Membuat dekomposisi pengelompokan objek berdasarkan kriteria tertentu. Misal= tua-muda, tua-muda(merokok-tidak merokok) Partitional

6 What is Similarity? The quality or state of being similar; likeness; resemblance; as, a similarity of features. Similarity is hard to define, but… “We know it when we see it”. Webster's Dictionary

7 D(, ) = 8 D(, ) = 1 Distance : Adalah ukuran kesamaan antar objek yang dihitung berdasarkan rumusan tertentu

8 Partitional Clustering Nonhierarchical, setiap objek ditempatkan di salah satu cluster Nonoverlapping cluster Jumlah kluster yang akan dibentuk ditentukan sejak awal

9 Algorithm k-means 1.Tentukan berapa cluster k yang mau dibuat. 2.Inisialisasi centroid dari tiap cluster (randomly, if necessary). 3.Tentukan keanggotaan objek-objek yang lain dengan mengklasifikasikannya sesuai centroid terdekat (berdasarkan distance ke centroid) 4.Setelah cluster dan anggotanya terbentuk, hitung mean tiap cluster dan jadikan sebagai centroid baru 5.Jika centroid baru tidak sama dengan centroid lama, maka perlu diupdate lagi keanggotaan objek-objeknya(balik ke -3). Sebaliknya jika centroid baru sama dengan yang lama maka selesai.

10 K-means Clustering: Step 1-2 Tentukan berapa cluster k yang mau dibuat. Inisialisasi centroid dari tiap cluster (randomly, if necessary) k1k1 k2k2 k3k3

11 K-means Clustering: Step 3 Tentukan keanggotaan objek-objek yang lain dengan mengklasifikasikannya sesuai centroid terdekat k1k1 k2k2 k3k3

12 K-means Clustering: Step 4 Setelah cluster dan anggotanya terbentuk, hitung mean tiap cluster dan jadikan sebagai centroid baru k1k1 k2k2 k3k3

13 K-means Clustering: Step 5 Jika centroid baru tidak sama dengan centroid lama, maka perlu diupdate lagi keanggotaan objek-objeknya k1k1 k2k2 k3k3

14 K-means Clustering: Finish Lakukan iterasi step 3-5 sampai tak ada lagi perubahan centroid dan tak ada lagi objek yang berpindah kelas k1k1 k2k2 k3k3

15 Comments on the K-Means Method Strength – Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. – Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms Weakness – Applicable only when mean is defined, then what about categorical data? – Need to specify k, the number of clusters, in advance – Unable to handle noisy data and outliers

16 Algoritma pengukuran distance SqEuclidean Cityblock Cosine Correlation Hamming

17 MATLAB [IDX,C] = kmeans(X,k) returns the k cluster centroid locations in the k-by-p matrix C

18 [...] = kmeans(...,'param1',val1,'param2',val2,...) enables you to specify optional parameter name-value pairs to control the iterative algorithm used by kmeans. The parameters are : – ‘distance’ – ‘start’ – ‘replicates’ – ‘maxiter’ – ‘emptyaction’ – ‘display’

19 'distance’ Distance measure, in p-dimensional space, that kmeans minimizes with respect to. kmeans computes centroid clusters differently for the different supported distance measures:

20 'start' Method used to choose the initial cluster centroid positions, sometimes known as "seeds". Valid starting values are:

21 'replicates' Number of times to repeat the clustering, each with a new set of initial cluster centroid positions. kmeans returns the solution with the lowest value for sumd. You can supply 'replicates' implicitly by supplying a 3-dimensional array as the value for the 'start' parameter.

22 'maxiter' Maximum number of iterations. Default is 100.

23 'emptyaction' Action to take if a cluster loses all its member observations. Can be one of:

24 'display' Controls display of output. 'off‘ : Display no output. 'iter‘ : Display information about each iteration during minimization, including the iteration number, the optimization phase, the number of points moved, and the total sum of distances. 'final‘ : Display a summary of each replication. 'notify‘ : Display only warning and error messages. (default)

25 Example dataku =[ ; ; ; ; ; ; ; ; ; ; ; ; ]

26 Using kmeans to build 3 cluster hasilk = kmeans(dataku,3)

27 Result hasilk =

28 Meaning of the result Data at row number 1, 2, and 4 are member of first cluster (cluster number 1). Data at row number 3,5,6,7,9,10,12 and 13 are member of second cluster (cluster number 2). Data at row number 8 and 11 are member of third cluster (cluster number 3).


Download ppt "K-Means Clustering. Mengelompokkan data-data menjadi beberapa cluster berdasarkan kesamaannya What is Clustering? Also called unsupervised learning, sometimes."

Presentasi serupa


Iklan oleh Google