K-Means Clustering.

Slides:



Advertisements
Presentasi serupa
Klastering dengan K-Means
Advertisements

Pengelompokan Jenis Tanah Menggunakan Algoritma Clustering K-Means
Clustering Okt 2012.
Self Organizing Maps Tim Asprak Metkuan
Oleh I Putu Agus Hendra Krisnawan
DATA MINING 1.
Fachrul Reza ( ) Julpan ( ) M. Nur Cipta Hidayah Lubis ( ) Oleh:
Pendahuluan Clustering adalah salah satu teknik unsupervised learning dimana kita tidak perlu melatih metoda tersebut atau dengan kata lain, tidak ada.
Aplikasi Model Jaringan Syaraf Tiruan dengan Radial Basis Function untuk Mendeteksi Kelainan Otak (Stroke Infark) Yohanes Tanjung S.
Applied Multivariate Analysis
Chapter 9 ALGORITME Cluster dan WEKA
Dan di antara tanda-tanda kekuasaan-Nya ialah diciptakan-Nya untukmu pasangan hidup dari jenismu sendiri, supaya kamu dapat ketenangan hati dan dijadikan-Nya.
Pemrosesan Teks Klasterisasi Dokumen Teknik Informatika STMIK GI MDP 2013 Shinta P.
Validitas & Reliabilitas
Clustering. Definition Clustering is “the process of organizing objects into groups whose members are similar in some way”. A cluster is therefore a collection.
Data Mining: Klasifikasi dan Prediksi Naive Bayesian & Bayesian Network . April 13, 2017.
PERTEMUAN KE-6 UNIFIED MODELLING LANGUAGE (UML) (Part 2)
HAMPIRAN NUMERIK SOLUSI PERSAMAAN NIRLANJAR Pertemuan 3
1 Pertemuan 15 Modelling Page Replacement Algorithm Matakuliah: T0316/sistem Operasi Tahun: 2005 Versi/Revisi: 5.
1 Pertemuan 11 OPTIMASI KINERJA Matakuliah: H0434/Jaringan Syaraf Tiruan Tahun: 2005 Versi: 1.
Algoritma-algoritma Data Mining Pertemuan XIV. Classification.
9.3 Geometric Sequences and Series. Objective To find specified terms and the common ratio in a geometric sequence. To find the partial sum of a geometric.
Fondasi Pemrograman & Struktur Data
Clustering Suprayogi.
Datamining - Suprayogi
Sistem Berbasis Fuzzy Materi 5
MODUL14 Segmentasi Citra
PEMBUATAN POHON KEPUTUSAN
Pengujian Hipotesis (I) Pertemuan 11
Matakuliah : I0014 / Biostatistika Tahun : 2005 Versi : V1 / R1
Data Mining.
Konsep Data Mining Ana Kurniawati.
Clustering.
Clustering Best Practice
Content Structure.
Work and Energy (Kerja dan Energi)
Dasar-Dasar Pemrograman
Pendugaan Parameter (I) Pertemuan 9
Pemrograman Berorientasi Objek
Analisis Cluster.
MEANING OF WORD/ PHRASE/SENTENCES
K-Nearest Neighbor dan K-means
Two-and Three-Dimentional Motion (Kinematic)
PENDIDIKAN DAN ILMU PENDIDIKAN
Dr Rilla Gantino, SE., AK., MM
Clustering (Season 1) K-Means
Gerund (the -ing form) For example: Kita tidak bisa makan tanpa minum
Pendugaan Parameter (II) Pertemuan 10
PRODI MIK | FAKULTAS ILMU-ILMU KESEHATAN
Analisis Klastering K-Means Model Datamining Kelompok 1 Eko Suryana
UML- UNIFIED MODELING LANGUAGE
Self-Organizing Network Model (SOM) Pertemuan 10
Metode Data Mining “ Self-Organizing Map [SOM] ” Taskum Setiadi ADVANCE MACHINE LEARNING STMIK Nusa Mandiri Jakarta2016 ADVANCE MACHINE LEARNING.
ANALISIS CLUSTER Part 1.
Pembelajaran tak-terbimbing dan klustering
CLUSTERING.
Aplikasi Graph Minimum Spaning Tree Shortest Path.
K-MEANS ALGORITHM CLUSTERING
THE INFORMATION ABOUT HEALTH INSURANCE IN AUSTRALIA.
Pengelompokan Dokumen (Document Clustering)
Konsep Data Mining Ana Kurniawati.
Implementasi clustering K-MEANS (dengan IRIS dataset)
By : Rahmat Robi Waliyansyah, M.Kom
FORCES. A force is an influence on a system or object which, acting alone, will cause the motion of the system or object to change. If a system or object.
SIMILES. The comparison is carried out using the words ‘like’ as etc. Example : 1. as free as a bird. The word ‘free’ is compared with the word ‘bird’
Right, indonesia is a wonderful country who rich in power energy not only in term of number but also diversity. Energy needs in indonesia are increasingly.
DECISION SUPPORT SYSTEM [MKB3493]
Al Muizzuddin F Matematika Ekonomi Lanjutan 2013
Universitas Gunadarma
Draw a picture that shows where the knife, fork, spoon, and napkin are placed in a table setting.
Transcript presentasi:

K-Means Clustering

What is Clustering? Also called unsupervised learning, sometimes called classification by statisticians and sorting by psychologists and segmentation by people in marketing Mengelompokkan data-data menjadi beberapa cluster berdasarkan kesamaannya

What is a natural grouping among these objects?

What is a natural grouping among these objects? Clustering is subjective Simpson's Family School Employees Females Males

Two Types of Clustering Partitional algorithms: Membuat beberapa partisi dan mengelompokkan objek berdasarkan kriteria tertentu Hierarchical algorithms: Membuat dekomposisi pengelompokan objek berdasarkan kriteria tertentu. Misal= tua-muda, tua-muda(merokok-tidak merokok) Hierarchical Partitional

What is Similarity? Similarity is hard to define, but… The quality or state of being similar; likeness; resemblance; as, a similarity of features. Webster's Dictionary Similarity is hard to define, but… “We know it when we see it”.

Distance : Adalah ukuran kesamaan antar objek yang dihitung berdasarkan rumusan tertentu 8 7 2 4 3 1 D( , ) = 8 D( , ) = 1

Partitional Clustering Nonhierarchical, setiap objek ditempatkan di salah satu cluster Nonoverlapping cluster Jumlah kluster yang akan dibentuk ditentukan sejak awal

Algorithm k-means Tentukan berapa cluster k yang mau dibuat. Inisialisasi centroid dari tiap cluster (randomly, if necessary). Tentukan keanggotaan objek-objek yang lain dengan mengklasifikasikannya sesuai centroid terdekat (berdasarkan distance ke centroid) Setelah cluster dan anggotanya terbentuk, hitung mean tiap cluster dan jadikan sebagai centroid baru Jika centroid baru tidak sama dengan centroid lama, maka perlu diupdate lagi keanggotaan objek-objeknya(balik ke -3). Sebaliknya jika centroid baru sama dengan yang lama maka selesai.

K-means Clustering: Step 1-2 Tentukan berapa cluster k yang mau dibuat. Inisialisasi centroid dari tiap cluster (randomly, if necessary) 5 4 k1 k2 k3 3 2 1 1 2 3 4 5

K-means Clustering: Step 3 Tentukan keanggotaan objek-objek yang lain dengan mengklasifikasikannya sesuai centroid terdekat 5 4 k1 3 k2 2 1 k3 1 2 3 4 5

K-means Clustering: Step 4 Setelah cluster dan anggotanya terbentuk, hitung mean tiap cluster dan jadikan sebagai centroid baru 5 4 k1 3 2 k3 k2 1 1 2 3 4 5

K-means Clustering: Step 5 Jika centroid baru tidak sama dengan centroid lama, maka perlu diupdate lagi keanggotaan objek-objeknya 5 4 k1 3 2 k3 k2 1 1 2 3 4 5

K-means Clustering: Finish Lakukan iterasi step 3-5 sampai tak ada lagi perubahan centroid dan tak ada lagi objek yang berpindah kelas k1 k2 k3

Comments on the K-Means Method Strength Relatively efficient: O(tkn), where n is # objects, k is # clusters, and t is # iterations. Normally, k, t << n. Often terminates at a local optimum. The global optimum may be found using techniques such as: deterministic annealing and genetic algorithms Weakness Applicable only when mean is defined, then what about categorical data? Need to specify k, the number of clusters, in advance Unable to handle noisy data and outliers

Algoritma pengukuran distance SqEuclidean Cityblock Cosine Correlation Hamming

MATLAB [IDX,C] = kmeans(X,k) returns the k cluster centroid locations in the k-by-p matrix C

[. ] = kmeans(. ,'param1',val1,'param2',val2, [...] = kmeans(...,'param1',val1,'param2',val2,...) enables you to specify optional parameter name-value pairs to control the iterative algorithm used by kmeans. The parameters are : ‘distance’ ‘start’ ‘replicates’ ‘maxiter’ ‘emptyaction’ ‘display’

'distance’ Distance measure, in p-dimensional space, that kmeans minimizes with respect to. kmeans computes centroid clusters differently for the different supported distance measures:

'start' Method used to choose the initial cluster centroid positions, sometimes known as "seeds". Valid starting values are:

'replicates' Number of times to repeat the clustering, each with a new set of initial cluster centroid positions. kmeans returns the solution with the lowest value for sumd. You can supply 'replicates' implicitly by supplying a 3-dimensional array as the value for the 'start' parameter.

'maxiter' Maximum number of iterations. Default is 100.

'emptyaction' Action to take if a cluster loses all its member observations. Can be one of:

'display' Controls display of output. 'off‘ : Display no output. 'iter‘ : Display information about each iteration during minimization, including the iteration number, the optimization phase, the number of points moved, and the total sum of distances. 'final‘ : Display a summary of each replication. 'notify‘ : Display only warning and error messages. (default)

Example dataku =[ 7 26 6 60; 1 29 15 52; ... 11 56 8 20; ... 11 31 8 47; ... 7 52 6 33; ... 11 55 9 22; ... 3 71 17 6; ... 1 31 22 44; ... 2 54 18 22; ... 21 47 4 26; ... 1 40 23 34; ... 11 66 9 12; ... 10 68 8 12]

Using kmeans to build 3 cluster hasilk = kmeans(dataku,3)

Result hasilk = 1 2 3

Meaning of the result Data at row number 1, 2, and 4 are member of first cluster (cluster number 1). Data at row number 3,5,6,7,9,10,12 and 13 are member of second cluster (cluster number 2). Data at row number 8 and 11 are member of third cluster (cluster number 3).