Analisis Cluster Oleh : Rahmad Wijaya.

Slides:



Advertisements
Presentasi serupa
Menggambarkan Data: Tabel Frekuensi, Distribusi Frekuensi, dan Presentasi Grafis Chapter 2.
Advertisements

Array.
Pengujian Hipotesis untuk Satu dan Dua Varians Populasi
This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2006 Microsoft Corporation. All.
Algoritma & Pemrograman #10
3. Economic Returns to Land Resources: Theories of Land Rent
Mata Kuliah : ALGORITMA dan STRUKTUR DATA 1.
PEMOGRAMAN BERBASIS JARINGAN
QUESTION- RESPONSE QUESTION- RESPONSE. Adaptif Hal.: 2 Isi dengan Judul Halaman Terkait Judul Halaman Pada bagian question-response, pertanyaan-pertanyaan.
TRIP GENERATION.
Peta Kontrol (Untuk Data Variabel)
Statistika Nonparametrik PERTEMUAN KE-1 FITRI CATUR LESTARI, M. Si
THE FINDING A PATTERN STRATEGY STRATEGI MENEMUKAN POLA Oleh Kelompok 3.
EKO NURSULISTIYO.  Perhatikan gambar 11 a, perahu dikenai oleh ombak dari arah kanan misalkan setiap 4 sekon dalam keadaan perahu diam. Dalam keadaan.
1 Pertemuan 21 Pompa Matakuliah: S0634/Hidrologi dan Sumber Daya Air Tahun: 2006 Versi: >
Menempatkan Pointer Q 6.3 & 7.3 NESTED LOOP.
Chapter Nine The Conditional.
ESTIMATION AND ROONDING OF NUMBERS
 N YU Stern Finance Professor, Edward Altman, developed the Altman Z-score formula in In 2012, he released an updated version called the Altman.
PERULANGANPERULANGAN. 2 Flow of Control Flow of Control refers to the order that the computer processes the statements in a program. –Sequentially; baris.
Slide 3-1 Elmasri and Navathe, Fundamentals of Database Systems, Fourth Edition Revised by IB & SAM, Fasilkom UI, 2005 Exercises Apa saja komponen utama.
Estimasi Prob. Density Function dengan EM Sumber: -Forsyth & Ponce Chap. 7 -Standford Vision & Modeling Sumber: -Forsyth & Ponce Chap. 7 -Standford Vision.
Penerapan Fungsi Non-Linier
Problems in The Simplex Method
PROSES PADA WINDOWS Pratikum SO. Introduksi Proses 1.Program yang sedang dalam keadaan dieksekusi. 2.Unit kerja terkecil yang secara individu memiliki.
Review Operasi Matriks
Jeff Howbert Introduction to Machine Learning Winter Classification Nearest Neighbor.
TEKNOLOGI WIRELESS Modul 1 - Teknologi Wireless.
Internal dan Eksternal Sorting
Ekonomi Manajerial dalam Perekonomian Global
Functions (Fungsi) Segaf, SE.MSc. Definition “suatu hubungan dimana setiap elemen dari wilayah saling berhubungan dengan satu dan hanya satu elemen dari.
Bilqis1 Pertemuan bilqis2 Sequences and Summations Deret (urutan) dan Penjumlahan.
Risk Management.
VALUING COMMON STOCKS Expected return : the percentage yield that an investor forecasts from a specific investment over a set period of time. Sometimes.
2-Metode Penelitian Dalam Psikologi Klinis
Implementing an REA Model in a Relational Database
Pertemuan 3 Menghitung: Nilai rata-rata (mean) Modus Median
Analysis of Variance (ANOVA)
Pendugaan Parameter part 2
METODE SAMPLING by Achmad Prasetyo, S.Si., M.M..
MEMORY Bhakti Yudho Suprapto,MT. berfungsi untuk memuat program dan juga sebagai tempat untuk menampung hasil proses bersifat volatile yang berarti bahwa.
3 nd Meeting Chemical Analysis Steps and issues STEPS IN CHEMICAL ANALYSIS 1. Sampling 2. Preparation 3. Testing/Measurement 4. Data analysis 2. Error.
Basisdata Pertanian. After completing this lesson, you should be able to do the following Identify the available group functions Describe the use of group.
1 Magister Teknik Perencanaan Universitas Tarumanagara General View On Graduate Program Urban & Real Estate Development (February 2009) Dr.-Ing. Jo Santoso.
LOGO Manajemen Data Berdasarkan Komputer dengan Sistem Database.
LIMIT FUNGSI LIMIT FUNGSI ALJABAR.
ASIKNYA BELAJAR MATEMATIKA
AUSTRALIA INDONESIA PARTNERSHIP FOR EMERGING INFECTIOUS DISEASES Excel tingkat menengah – Bagan (lanjutan) Location Date Name.
Linked List dan Double Linked List
GROUP 4. MORTALITAS Ketua: Prof. Budi Utomo Anggota:
Metodologi Penelitian dalam Bidang Informatika
STRUCTURAL CONTROL continuation STATEMENT  SWITCH  WHILE  DO..WHILE.
THE EFFICIENT MARKETS HYPOTHESIS AND CAPITAL ASSET PRICING MODEL
1. 2 Work is defined to be the product of the magnitude of the displacement times the component of the force parallel to the displacement W = F ║ d F.
Lecture 8 Set and Dictionary Sandy Ardianto & Erick Pranata © Sekolah Tinggi Teknik Surabaya 1.
MAINTENANCE AND REPAIR OF RADIO RECEIVER Competency : Repairing of Radio Receiver.
© 2009 Fakultas Teknologi Informasi Universitas Budi Luhur Jl. Ciledug Raya Petukangan Utara Jakarta Selatan Website:
PERSAMAAN DAN PERTIDAKSAMAAN
Luas Daerah ( Integral ).
Via Octaria Malau Transfer (Internal Transfers) Transfer (Transfers Internal) Select the account from which funds are to be transferred FROM and then select.
PENJUMLAHAN GAYA TUJUAN PEMBELAJARAN:
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public ITE PC v4.0 Chapter 1 1 Pengalamatan Jaringan – IPv4 Dosen Pengampu: Resi Utami Putri, S.Kom.,
TCP, THREE-WAY HANDSHAKE, WINDOW
Menu Standard Competence Based Competence.
Retrosintetik dan Strategi Sintesis
Web Teknologi I (MKB511C) Minggu 12 Page 1 MINGGU 12 Web Teknologi I (MKB511C) Pokok Bahasan: – Text processing perl-compatible regular expression/PCRE.
MICROSOFT EXCEL 2000 Bagian #4 GRAPHICS : OBJECT & CHART.
KONTROL ALUR EKSEKUSI PROGRAM
Applied Multivariate Analysis
Presented By : Group 2. A solution of an equation in two variables of the form. Ax + By = C and Ax + By + C = 0 A and B are not both zero, is an ordered.
Transcript presentasi:

Analisis Cluster Oleh : Rahmad Wijaya

Pokok Bahasan 1. Konsep Dasar 2. Statistik dalam Analisis Cluster 3. Langkah-langkah Analisis Cluster Rumuskan Permasalahan Memilih ukuran Jarak atau Kesamaan Memilih Prosedur Peng-clusteran Menetapkan Jumlah Cluster Interpretasi dan Profil dari Cluster Menaksir Reliabilitas and Validitas

Konsep Dasar W Cluster Analysis adalah suatu teknik mengelompokkan obyek atau cases ke dalam kelompok yang relatif homogen yang disebut CLUSTER Analisis Cluster sering juga disebut sebagai : Classification Analysis Numerical Taxonomy Pengelompokan dalam prakek sering tidak sama dengan pengelompokan yang ideal Perbedaan Analisis Discriminant dengan Cluster :

Situasi Pengelompokan Ideal Variable 2 Variable 1 Back

Situasi Pengelompokan dalam Praktek X Variable 2 Variable 1 Back

Penggunaan Analisis Cluster Contoh : Segmentasi Pasar. Memahami perilaku pembeli Mengidentifikasi peluang produk baru. Memilih pasar yang akan diuji. Mengurangi Data

Statistik dalam Analisis Cluster Agglomeration schedule Cluster centroid Cluster Centers Cluster membership Dendrogram Distance between cluster centers Incicle diagram Agglomeration schedule : Give information on the objects or cases being combined at each stage of a heirarchical clustering process Cluster centriod : the mean values of the variables for all the cases or object in particular cluster. Cluster Centers : The initial starting points in non hierarchical clustering. Clusters are built around these centers or seeds. Cluster membership Dendogram Distance between cluster centers Incicle diagram Similarity/Distance coefficient matrix 1

Langkah-langkah Analisis Cluster Rumuskan Permasalahan Memilih ukuran Jarak atau Kesamaan Memilih Prosedur peng-Cluster-an Menetapkan Jumlah Cluster Interpretasi dan Profil dari Cluster Menaksir Reliablitas dan Validitas

Rumuskan Permasalahan Contoh : Melakukan pengelompokan konsumen berdasarkan sikap mereka pada akvitivas belanja. Didasarkan pada penelitian sebelumnya dapat diidentifikasikan ada enamvariabel sikap. Konsumen diminta menyatakan tingkat kesepakatan mereka dengan pernyataan skala tujuh berikut ini : V1 = Shopping is fun V2 = Shopping is bad for your budget V3 = I combine shopping with eating out. V4 = I try to get best buys while shopping. V5 = I don’t care about shopping. V6 = You can save a lot of money by comparing prices. Data yang diperoleh dari 20 responden adalah sebagai berikut :

Data Mentah Case No. V1 V2 V3 V4 V5 V6 1 6 4 7 3 2 3 2 2 3 1 4 5 4 1 6 4 7 3 2 3 2 2 3 1 4 5 4 3 7 2 6 4 1 3 4 4 6 4 5 3 6 5 1 3 2 2 6 4 6 6 4 6 3 3 4 7 5 3 6 3 3 4 8 7 3 7 4 1 4 9 2 4 3 3 6 3 10 3 5 3 6 4 6 11 1 3 2 3 5 3 12 5 4 5 4 2 4 13 2 2 1 5 4 4 14 4 6 4 6 4 7 15 6 5 4 2 1 4 16 3 5 4 6 4 7 17 4 4 7 2 2 5 18 3 7 2 6 4 3 19 4 6 3 7 2 7 20 2 3 2 4 7 2

Memilih ukuran Jarak atau Kesamaan Sebab tujuan clustering adalah mengelompokan obyek bersama-sama, maka beberapa pengukuran dibutuhkan untuk menilai perbedaan atau kesamaan diantara obyek. Pengukuran yang sering dipergunakan adalah : Euclidean Distance is square root of the sum of the square differences in values for each variables. City Block or Manhattan distance is the sum of the absolute differences in value for each variables Chebychev distance is the maximum absolute difference in values for any variables.

Klasifikasi Prosedur peng-Cluster-an Clustering Procedures Hierarchical Nonhierarchical Agglomerative Divisive Sequential Threshold Parallel Threshold Optimizing Partitioning Linkage Methods Variance Methods Centroid Methods Ward’s Method Single Complete Average

Metode Hubungan Cluster (Linkage) Single Linkage Minimum Distance Cluster 1 Cluster 2 Complete Linkage Maximum Distance Cluster 1 Cluster 2 Average Linkage Average Distance Cluster 1 Cluster 2

Metode Cluster Agglomerative lainnya Ward’s Procedure Centroid Method

Output Cluster Hirarki

Icicle Plot Vertikal Jumlah Cluster Nomor Kasus Back 8+ 1+ 4+ 5+ 6+ 7+ 2+ 3+ 11+ 12+ 13+ 14+ 9+ 10+ 16+ 19+ 17+ 18+ 15+ 1 Nomor Kasus 2 9 8 4 6 3 5 7 Jumlah Cluster Back

Dendrogram Using Ward’s Method 3 15 1 12 7 8 17 6 11 5 13 2 20 9 19 16 4 10 18 14 Back Case Label Seq 5 10 15 20 25 Rescaled Distance Cluster Combine

4 cluster 3 cluster 2 cluster 1 8 2 6 12 3 5 4 Keanggotaan Cluster Jumlah anggota per cluster Cluster 4 cluster 3 cluster 2 cluster 1 8 2 6 12 3 5 4

Menetapkan Jumlah Cluster Pedoman dalam menetapkan jumlah cluster : Theoretical, conceptual, or practical consideration may suggest a certain number of cluster. In hierarchical clustering, the distance at which cluster are combined can be used as criteria. Thins information can be obtained from the agglomeration schedule or from the dendrogram. In non hierarchical clustering the ratio within group variance to between group variance can be plotted against the number of cluster. Point at which an elbow or a sharp bend occurs indicates an appropriate number of clusters. The relative size of clusters should be meaningful. In Cluster Membership table by making a simple frequency count of cluster membership. We. See that a three-cluster solution result in cluster with eight, six, and six element. However, if we go to four-cluster solution, the size of clusters are eight, six, five, and one. It is not meaningful to have a cluster with only one case.

Cluster Centroids Rata-rata per Variabel No. Cluster V1 V2 V3 V4 V5 V6 1 5.750 3.625 6.000 3.125 1.750 3.875 2 1.667 3.000 1.833 3.500 5.500 3.333 3 3.500 5.833 3.333 6.000 3.500 6.000 dapa Nilai Cluster Centriod dapat diperoleh dari Pengolahan Data K-Mean Cluster (lihat pada Final Cluster Center)

Menghitung Cluster Centroids pakai Ms Ecxel No Resp V1 v2 v3 v4 v5 v6 Cluster membership 1 6 4 7 3 2 5 8 12 15 17   5,75 3,63 3,13 1,88 3,88 9 11 13 20 1,67 1,83 3,5 5,5 3,33 10 14 16 18 19 5,83 Cluster centroid untuk Cluster 1 Cluster centroid untuk Cluster 2 dapa Cluster centroid untuk Cluster 2

Interpretasi and Profil dari Cluster Kita lihat dari Tabel Cluster Centroid : Pada Cluster 1 V1(shopping is fun), dan V3 (I combine shopping with eating out) nilainya relatif tinggi, sehingga cluster ini dapat diberi nama “fun-loving and concerned shoppers” Pada Cluster 2 V5(I don’t care about shopping) nilainya relatif tinggi, sehingga cluster ini dapat diberi nama “apathetic shoppers” Pada Cluster 3 V2 (Shopping is bad for my budget), V4 (I try to get the best buys while shopping) , dan V6 (You can save a lot of money by comparing prices) nilainya relatif tinggi, sehingga cluster ini dapat diberi nama “economical shoppers”

Menaksir Reliabilitas dan Validitas Prosedur formal untuk menilai reliabilitas dan viliditas dari hasil cluster kompleks. Prosedur berikut cukup memadai untuk mengecek kualitas hasil cluster : 1. Perform cluster analysis on the same data using different distance measure. Compare the result across measure to determine the stability of the solutions. 2. Use different methods of clustering and compare the result. 3. Split the data randomly in halves. Perform clustering separetly on each half. Compare cluster centroids across the two subsamples. 4. Delete variables randomly. Perform clustering based on the reduced set of variables. Compare the result with those obtained by clustering based on the entire set of variables.

Results of Nonhierarchical Clustering Initial Cluster Centers Cluster V1 V2 V3 V4 V5 V6 1 4.0000 6.0000 3.0000 7.0000 2.0000 7.0000 2 2.0000 3.0000 2.0000 4.0000 7.0000 2.0000 3 7.0000 2.0000 6.0000 4.0000 1.0000 3.0000 Classification Cluster Centers Cluster V1 V2 V3 V4 V5 V6 1 3.8135 5.8992 3.2522 6.4891 2.5149 6.6957 2 1.8507 3.0234 1.8327 3.7864 6.4436 2.5056 3 6.3558 2.8356 6.1576 3.6736 1.3047 3.2010 Case Listing of Cluster Membership Case ID Cluster Distance Case ID Cluster Distance 1 3 1.780 2 2 2.254 3 3 1.174 4 1 1.882 5 2 2.525 6 3 2.340 7 3 1.862 8 3 1.410 9 2 1.843 10 1 2.112 11 2 1.923 12 3 2.400 13 2 3.382 14 1 1.772 15 3 3.605 16 1 2.137 17 3 3.760 18 1 4.421 19 1 0.853 20 2 0.813

Distances between Final Cluster Centers Cluster V1 V2 V3 V4 V5 V6 1 3.5000 5.8333 3.3333 6.0000 3.5000 6.0000 2 1.6667 3.0000 1.8333 3.5000 5.5000 3.3333 3 5.7500 3.6250 6.0000 3.1250 1.7500 3.8750 Distances between Final Cluster Centers Cluster 1 2 3 1 0.0000 2 5.5678 0.0000 3 5.7353 6.9944 0.0000 Analysis of Variance Variable Cluster MS df Error MS df F p V1 29.1083 2 0.6078 17 47.8879 .000 V2 13.5458 2 0.6299 17 21.5047 .000 V3 31.3917 2 0.8333 17 37.6700 .000 V4 15.7125 2 0.7279 17 21.5848 .000 V5 24.1500 2 0.7353 17 32.8440 .000 V6 12.1708 2 1.0711 17 11.3632 .001 Number of Cases in each Cluster Cluster Unweighted Cases Weighted Cases 1 6 6 2 6 6 3 8 8 Missing 0 Total 20 20