Data Mining
CRISP-DM Standar Proses Datamining
Materi Pengantar Data Mining Apa itu datamining Macam data yang dapat di “mining” Pola data yang dapat di “mining” Teknik yang digunakan untuk “mining” Dll
Memahami Data Objek Data dan Type atribut Statistik deskriptif dari data Visualisasi data Mengukur Data Similarity dan Dissimilarity
Pre-proses data Association Rule Pengantar preproses data Membersihkan data Reduksi data Tranformasi data dan diskritisasi data Association Rule Apriori Algorithm
Klasifikasi Konsep dasar Pohon Keputusan Naive Bayes Bayesian Network Backpropagation EM Evaluasi model klasifikasi
Analisa Kluster Konsep dasar Metode Partisi Metode Hirarki
Outlier Detection Pendekatan Statistik
Referensi
Tools
Pengantar Mengapa data mining? Apa datamining Data Mining? A Multi-Dimensional View of Data Mining What Kinds of Data Can Be Mined? What Kinds of Patterns Can Be Mined? What Kinds of Technologies Are Used? What Kinds of Applications Are Targeted? Major Issues in Data Mining A Brief History of Data Mining and Data Mining Society Summary
Why Data Mining? Pertumbuhan yang sangat besar: Business: Web, e-commerce, transactions, stocks, … Science: Remote sensing,… Society and everyone: Media sosial Banyak data miskin pengetahuan “Data mining—Analisa data secara otomatis dari data yang sangat besar.
Apa Data Mining? Data mining ( mendapatkan pengetahuan dari data) Ektraksi pola atau pengetahuan dari data yang besar. Nama lain Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, business intelligence, dll.
Proses Knowledge Discovery (KDD) Data mining sangat berperan dalam proses mendapatkan pengetahuan Pattern Evaluation Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases
Data Mining dalam Business Intelligence Sangat berpotensi untuk Mendukung keputusan bisnis End User Decision Making Data Presentation Business Analyst Visualization Techniques Data Mining Data Analyst Information Discovery Data Exploration Statistical Summary, Querying, and Reporting Data Preprocessing/Integration, Data Warehouses DBA Data Sources Paper, Files, Web documents, Scientific experiments, Database Systems
KDD Process: Tinjauan dari ML dan Statistics Pattern Information Knowledge Data Mining Post-Processing Input Data Data Pre-Processing Data integration Normalization Feature selection Dimension reduction Pattern evaluation Pattern selection Pattern interpretation Pattern visualization Pattern discovery Association & correlation Classification Clustering Outlier analysis … … … … This is a view from typical machine learning and statistics communities
Berbagai sudut pandang Data Mining “Data yang di “mining” Database data : data transactional data, time-series, text and web, multi-media, graphs & social dan networks Pengetahuan yang “mining” (or: Data mining functions) Association, classification, clustering, outlier analysis, etc. predictive data mining, dll Teknik yang digunakan machine learning, statistics, pattern recognition, visualization, dll Aplikasi Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc.
Fungsi Data Mining Function: (2) Association and Correlation Analysis Frequent patterns (atau frequent itemsets) Item yang sering dibeli bersamaan Association, correlation causality Bagaimana untuk me”mining” suatu pola atau rule secara efisien dalam database yang besar? Bagaimana menggunakan suatu pola untuk classification, clustering, dan aplikasi lain?
Data Mining Function: (3) Classification Classification dan label prediction Membangun suatu model (functions) didasarkan pada beberapa data training Memprediksi dari kelas label yang tidak diketahui Metode yang umum Decision trees, naïve Bayesian classification, support vector machines, neural networks,, logistic regression, … Aplikasi: Credit card fraud detection, diseases, web-pages, …
Data Mining Function: (4) Cluster Analysis Unsupervised learning (i.e., Class label is unknown) Mengelompokkan data (i.e., clusters) Prinsip: Maximizing intra-class similarity & minimizing interclass similarity Banyaj metode yang digunakan
Data Mining Function: (5) Outlier Analysis Outlier: Objrk data yang tidak mengikuti sifat secara umum dari data Metode: diperoleh dari hasil : clustering or regression analysis, … Kegunaan : fraud detection, rare events analysis
Data Mining: Confluence of Multiple Disciplines Machine Learning Pattern Recognition Statistics Data Mining Visualization Applications Algorithm Database Technology High-Performance Computing
Aplikasi Data Mining Web page analysis: web page classification, clustering to PageRank & HITS algorithms Collaborative analysis & recommender systems Basket data analysis to targeted marketing Data mining systems/tools (e.g., SAS, MS SQL-Server Analysis Manager, Oracle Data Mining Tools) untuk menerapkan data mining
Kesimpulan Data mining: Memperoleh pola pengetahuan dari data yang besar A KDD process : data cleaning, data integration, data selection, transformation, data mining, pattern evaluation, and knowledge presentation Data mining dapat dilakukan dari berbagai sumber data Fungsi Data mining : association, classification, clustering, trend and outlier analysis, dll.