PENGENALAN DATA MINING Fakultas Informatika – Telkom University 1 10/11/2017
Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining Task dalam Data mining Fungsionalitas Data mining Hubungan antara sistem data mining dengan Sistem Basis Data, Sistem Data Warehouse, dan Business Intelligence Permasalahan dalam Data Mining 10/11/2017
Sistem belajar kita: Student Centered Learning 3 10/11/2017
Latar Belakang Data Mining (1) Melimpahnya Data Terciptanya data dari tools otomatis dan teknologi basis data sehingga jumlah yang tercatat dalam basis data atau media penyimpanan lain semakin membesar 10/11/2017
Latar Belakang Data Mining (2) Walaupun data teramat melimpah, namun yang diolah menjadi knowledge sangat sedikit Solusinya?? Data warehouse dan data mining Data warehouse dan OLAP (on-line analytical processing) Ekstraksi knowledge yang menarik dalam bentuk rule, regularities, pola, konstrain dll dari data yang tersimpan dalam sejumlah besar basis data 10/11/2017
Top 10 Database Terbesar 2012 No Badan/Organisasi Jumlah Data 1 World Data Centre for Climate 20 terabytes of web data 6 petabytes of additional data 2 National Energy Research Scientific Computing Center 2.8 petabytes of data Operated by 2,000 computational scientists 3 AT&T 23 terabytes of information 1.9 trillion phone call records 4 Google 1 million searches per day 10. Not even the digital age can prevent the world's largest library from ending up on this list. The Library of Congress (LC) boasts more than 130 million items ranging from cook books to colonial newspapers to U.S. government proceedings. It is estimated that the text portion of the Library of Congress would comprise 20 terabytes of data. The LC expands at a rate of 10,000 items per day and takes up close to 530 miles of shelf space -- talk about a lengthy search for a book. 9. Portions of the CIA database available to the public include the Freedom of Information Act (FOIA) Electronic Reading Room, The World Fact Book, and various other intelligence related publications. 8. Amazon 7. Youtube 6. LexisNexis 5. Sprint Sprint is one of the world's largest telecommunication companies as it offers mobile services to more than 53 million subscribers, and prior to being sold in May of 2006, offered local and long distance land line packages. Large telecommunication companies like Sprint are notorious for having immense databases to keep track of all of the calls taking place on their network. Sprint's database processes more than 365 million call detail records and operational measurements per day. The Sprint database is spread across 2.85 trillion database rows making it the database with the largest number of rows (data insertions if you will) in the world. At its peak, the database is subjected to more than 70,000 call detail record insertions per second. 4. Google 3. AT&T Similar to Sprint, the United States' oldest telecommunications company AT&T 2. The second largest database in the world belongs to the National Energy Research Scientific Computing Center (NERSC) in Oakland, California. NERSC is owned and operated by the Lawrence Berkeley National Laboratory and the U.S. Department of Energy. Sumber: http://www.siliconindia.com/news/enterpriseit/Top-10-Largest-Databases-in-the-World-nid-118891-cid-7.html 10/11/2017
Perkembangan Data di Dunia (1) Source : Tan, 2004 10/11/2017
Perkembangan Data di Dunia (2) The amount of data stored in various media has doubled in three years, from 1999 to 2002. the amount of data put into storage in 2002, five exabytes (one quintillion bytes), was equal to the contents pf ahalf a million new libraries, each containing a digitised version of the print collection of the entire US Library of Congress (Lyman and varian, UC Berkeley, 2003) 10/11/2017
Perkembangan Data di Dunia (3) " It is projected that just four years from now, the world’s information base will be doubling in size every 11 hours. So rapid is the growth in the global stock of digital data that the very vocabulary used to indicate quantities has had to expand to keep pace. A decade or two ago, professional computer users and managers worked in kilobytes and megabytes. Now school children have access to laptops with tens of gigabytes of storage, and network managers have to think in terms of the terabyte (1,000 gigabytes) and the petabyte (1,000 terabytes). Beyond those lie the exabyte, zettabyte and yottabyte, each a thousand times bigger than the last. (IBM Global Technical Services white paper published in July 2006, titled, "The toxic terabyte: How data-dumping threatens business efficiency.) 10/11/2017
Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining Hubungan sistem data mining dengan Sistem Basis Data, Sistem Data Warehouse , dan Business Intelligence Task dalam Data mining Fungsionalitas Data mining Permasalahan dalam Data Mining 10/11/2017
Data Mining? 10/11/2017
10/11/2017
Just Joke.. 10/11/2017
Definisi Data Mining Data mining is an iterative process within which progress is defined by discovery, through either automatic or manual methods. [Kantardzic , 2003] Data mining (DM) is the extraction of hidden predictive information from large databases (DBs). With the automatic discovery of knowledge implicit within DBs, DM uses sophisticated statistical analysis and modeling techniques to uncover patterns and relationships hidden in organizational DBs [Wang, 2003] Data mining refers to extracting or \mining" knowledge from large amounts of data [Han, 2005] Non-trivial extraction of implicit, previously unknown and potentially useful information from data [Tan, 2003] 10/11/2017
Awal Data Mining Berawal dari beberapa disiplin ilmu, bertujuan untuk memperbaiki teknik tradisional sehingga bisa menangani: Jumlah data yang sangat besar Dimensi data yang tinggi Data yang heterogen dan berbeda bersifat 10/11/2017
Kata kunci data mining: Jadi Data Mining?? Kata kunci data mining: Sifatnya non trivial/ iteratif Menemukan knowledge atau informasi dari data yang berjumlah besar Data Mining merupakan inti dari proses Knowledge Discovery in Databases (KDD) 10/11/2017
Data Mining & Proses KDD Knowledge Data Mining Evaluasi Pola Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases Source : Han 2004 10/11/2017
Jenis Data pada Data Mining database, data warehouse, database transaksional Data streams dan sensor data Time-series data, temporal data, sequence data Struktur data, graf, social networks dan database link Object-relational database Spatial data spatiotemporal data Multimedia database Text databases The World-Wide Web 10/11/2017
Latar Belakang Data Mining Apa dan Mengapa Data Mining Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining Hubungan sistem data mining dengan Sistem Basis Data, Sistem Data Warehouse , dan Business Intelligence Fungsionalitas Data mining Task dalam Data mining Permasalahan dalam Data Mining 10/11/2017
Arsitektur Sistem Data Mining data cleaning, integration, and selection Database or Data Warehouse Server Data Mining Engine Pattern Evaluation Graphical User Interface Knowledge-Base Database Data Warehouse World-Wide Web Other Info Repositories 10/11/2017
Hubungan DM, DB dan DW Untuk mengoptimalkan penggunaannya sistem Data Mining seharusnya memiliki hubungan dengan sistem basis data dan data warehouse. Tidak adanya hubungan tidak direkomendasikan misalnya seperti flat file processing Hubungan Loose coupling misalkan mpengambilan data dari DB/DW Hubungan Semi-tight coupling, yakni utnuk menambah performansi DM dengan pengimplementasian primitif data mining dalam sistem DB/DW misalkan sorting, indexing, aggregation, histogram analysis, multiway join dll Hubungan Tight coupling— merupakan enviroment pemrosesan yang sama dimana DM terintegrasi dengan sistem DB/DW, mining query dioptimasi berdasrkan mining query, indexing, metode pemrosesan query processing methods, dll. 10/11/2017
Data Mining & Business Intelligence Meningkatkan potensi untuk mendukung keputusan bisnis End User Business Analyst Data DBA Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration OLAP, MDA Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts Data Sources Paper, Files, Information Providers, Database Systems, OLTP 10/11/2017
Latar Belakang Data Mining Apa dan Mengapa Data Mining Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining Integrasi sistem data mining dengan Sistem Basis Data,Sistem Data Warehouse , dan Business Intelligence Task dalam Data mining Fungsionalitas Data mining Permasalahan dalam Data Mining 10/11/2017
Task dalam Data Mining Metode Prediksi Metode Deskripsi Dengan menggunakan beberapa variabel untuk memprediksi nilai yang belum diketahui (unknown ) atau nilai selanjutnya (future) dari variabel lain Contoh: Classification Regression Deviation Detection Metode Deskripsi Menemukan pola pendeskripsian data yang dapat diinterpretasikan oleh manusia Clustering Association Rule Discovery Sequential Pattern Discovery 10/11/2017
Latar Belakang Data Mining Apa dan Mengapa Data Mining Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining Integrasi sistem data mining dengan Sistem Basis Data,Sistem Data Warehouse , dan Business Intelligence Task dalam Data mining Fungsionalitas Data mining Permasalahan dalam Data Mining 10/11/2017
Fungsionalitas Data Mining (1) Klasifikasi dan Prediksi Frequent patterns, asosiasi , korelasi dan kausalitas Analisis klaster Analisis Outlier Analysis Trend dan evolution Analisis statistik 10/11/2017
Aplikasi Data Mining (1) Analisis dan Manajemen Pasar target pemasaran, customer relation management (CRM), market basket analysis, cross selling, segmentasi pasar Analisis dan Manajemen Resiko Forecasting, customer retention, quality control, analisis kompetisi Deteksi dan manajemen fraud (kecurangan) Text mining (news group, email, dokumen) dan Analisis Web. Intelligent query answering 10/11/2017
Aplikasi Data Mining (2) Marketing and Sales Promotion Supermarket shelf management. Inventory Management Diagnosis Medis Collaborative Filtering Business Intelligence Network Intrusion detection Deteksi spam dll 10/11/2017
10/11/2017
10/11/2017
Latar Belakang Data Mining Apa dan Mengapa Data Mining Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining Integrasi sistem data mining dengan Sistem Basis Data,Sistem Data Warehouse , dan Business Intelligence Task dalam Data mining Fungsionalitas Data mining Permasalahan dalam Data Mining 10/11/2017
Permasalahan Utama Bagaimana Menentukan metodologi mining? karena: Tipe data berbeda Performansi yang diharapkan dari segi keefektifan, efisiensi dan skalabilitas bisa jadi berbeda tiap metodologi Evaluasi pola yanki pengukuran “interestingness’ yang berbeda Penanganan missing value dan noise dll Bagaimana Bentuk Interaksi dengan User? Apakah: Menggunakan Data mining query languages dan ad-hoc mining Hasil data mining berupa ekspresi dan visualisasi Aplikasi dan Dampak Sosial Perlindungan terhadap keamanan , integrity dan privacy data 10/11/2017
10/11/2017
10/11/2017