Applied Multivariate Analysis Cluster Analysis
Tujuan Utama Mengambil sejumlah observasi dan membuat pengelompokkan unit-unit, sehingga unit-unit yg berada dlm satu kelompok mempunyai sifat sama dan unit antar kelompok mempunyai sifat berbeda.
Think About It Homogeneous subgroups are not the same as naturally occurring clusters. 1 2 3 4 5 6 Homogeneous, but not natural clusters …
Hal-hal yg diperhatikan Beberapa ukuran similaritas (kedekatan) dan dissimilaritas unit-unit. Euclidian Mahallanobis Penentuan cluster (banyak cluster) Hierarki Non Hierarki
Konsep Hierarki Hasil pengelompokkan alammi Hasil pengelompokkan merupakan pengabungan: contoh, lima kluster diperoleh dari penggabungan 2 kluster dari enam kluster. Metode agglomerasi – setiap observasi adalah cluster dimulai dengan menggabungkan Metode divisive (lawan agglomerasi)
Konsep partisi Mempartisi observasi kedalam cluster-cluster sehingga homogen dlm cluster. Bukan konsep penggabungan. Final cluster masih belum terpisah benar.
Nearest Neighbors Method Single linkage method Do until all points are placed in a single cluster Start with N clusters Form a cluster from the two closest points Think of this new cluster as a “point” and define the distance from any point to it as the minimum distance to any point in it.
Nearest Neighbors Example Pairwise distances between six points C0={[1],[2],[3],[4],[5],[6]} C1={[1],[2],[3,5],[4],[6]}
Nearest Neighbors Example Pairwise distances between five “points” C0={[1],[2],[3],[4],[5],[6]} C1={[1],[2],[3,5],[4],[6]} C2={[1],[2],[3,5,6],[4]} smallest
Nearest Neighbors Example Pairwise distances between four “points” C0={[1],[2],[3],[4],[5],[6]} C1={[1],[2],[3,5],[4],[6]} C2={[1],[2],[3,5,6],[4]} C3={[1],[2,4],[3,5,6]} smallest
Nearest Neighbors Example Pairwise distances between three “points” C0={[1],[2],[3],[4],[5],[6]} C1={[1],[2],[3,5],[4],[6]} C2={[1],[2],[3,5,6],[4]} C3={[1],[2,4],[3,5,6]} C4={[2,4],[1,3,5,6]} smallest Single Cluster
Prosedur K-Mean Cluster
Aplikasi (SPSS) Two Step cluster Hierarki Non Hierarki (K-mean) Eksplorasi Banyak kluster berdasarkan nilai AIC/ BIC Hierarki Single linkage Complete linkage Non Hierarki (K-mean)
Two step cluster Data car_sales.sav Catagorical variable : vehicle type Continous variable :Price in thousands – feul efficiency Plot Rank of variable important :by variables and confidence level Output AIC / BIC
Output SPSS Two Step Cluster
Distribusi kluster Pivot (double klik centroid) : Pivoting trays (buat struktur berikut)
Deskripsi Tiap Kluster
Cluster Hierarki Data : car_sales Select cases : if conditional (type =0) & (sales >100) Analyse> classify>hierarchical cluster Price in thousands through Fuel efficiency as analysis variables. Select Model as the case labeling variable Plot > Dendogram Method > Nearest neighbor , Zscore
Output cluster hierarki
Aplikasi Minitab : cereal.mtw Survey tentang merk dan kandungan gizi cereal Akan dilakukan pengelompokkan merk dan kandungan gizi