Presentasi sedang didownload. Silahkan tunggu

Presentasi sedang didownload. Silahkan tunggu

PENGENALAN DATA MINING

Presentasi serupa


Presentasi berjudul: "PENGENALAN DATA MINING"— Transcript presentasi:

1 PENGENALAN DATA MINING
PTIK Week 10 1

2 Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining
Task dalam Data mining Fungsionalitas Data mining Hubungan antara sistem data mining dengan Sistem Basis Data, Sistem Data Warehouse, dan Business Intelligence Permasalahan dalam Data Mining 5/22/2019

3 Latar Belakang Data Mining (1)
Melimpahnya Data Terciptanya data dari tools otomatis dan teknologi basis data sehingga jumlah yang tercatat dalam basis data atau media penyimpanan lain semakin membesar 5/22/2019

4 Latar Belakang Data Mining (2)
Walaupun data teramat melimpah, namun yang diolah menjadi knowledge sangat sedikit Solusinya??  Data warehouse dan data mining Data warehouse dan OLAP (on-line analytical processing) Ekstraksi knowledge yang menarik dalam bentuk rule, regularities, pola, konstrain dll dari data yang tersimpan dalam sejumlah besar basis data 5/22/2019

5 Top 10 Database Terbesar 2012 5/22/2019 No Badan/Organisasi
Jumlah Data 1 World Data Centre for Climate 20 terabytes of web data 6 petabytes of additional data 2 National Energy Research Scientific Computing Center 2.8 petabytes of data Operated by 2,000 computational scientists 3 AT&T 23 terabytes of information 1.9 trillion phone call records 4 Google 1 million searches per day 10. Not even the digital age can prevent the world's largest library from ending up on this list.  The Library of Congress (LC) boasts more than 130 million items ranging from cook books to colonial newspapers to U.S. government proceedings. It is estimated that the text portion of the Library of Congress would comprise 20 terabytes of data. The LC expands at a rate of 10,000 items per day and takes up close to 530 miles of shelf space -- talk about a lengthy search for a book. 9. Portions of the CIA database available to the public include the Freedom of Information Act (FOIA) Electronic Reading Room, The World Fact Book, and various other intelligence related publications. 8. Amazon 7. Youtube 6. LexisNexis 5. Sprint Sprint is one of the world's largest telecommunication companies as it offers mobile services to more than 53 million subscribers, and prior to being sold in May of 2006, offered local and long distance land line packages. Large telecommunication companies like Sprint are notorious for having immense databases to keep track of all of the calls taking place on their network.  Sprint's database processes more than 365 million call detail records and operational measurements per day. The Sprint database is spread across 2.85 trillion database rows making it the database with the largest number of rows (data insertions if you will) in the world. At its peak, the database is subjected to more than 70,000 call detail record insertions per second. 4. Google 3. AT&T Similar to Sprint, the United States' oldest telecommunications company AT&T 2. The second largest database in the world belongs to the National Energy Research Scientific Computing Center (NERSC) in Oakland, California.  NERSC is owned and operated by the Lawrence Berkeley National Laboratory and the U.S. Department of Energy. Sumber: 5/22/2019

6 Perkembangan Data di Dunia (1)
Source : Tan, 2004 5/22/2019

7 Perkembangan Data di Dunia (2)
The amount of data stored in various media has doubled in three years, from 1999 to the amount of data put into storage in 2002, five exabytes (one quintillion bytes), was equal to the contents pf ahalf a million new libraries, each containing a digitised version of the print collection of the entire US Library of Congress (Lyman and varian, UC Berkeley, 2003) 5/22/2019

8 Perkembangan Data di Dunia (3)
"  It is projected that just four years from now, the world’s information base will be doubling in size every 11 hours. So rapid is the growth in the global stock of digital data that the very vocabulary used to indicate quantities has had to expand to keep pace. A decade or two ago, professional computer users and managers worked in kilobytes and megabytes. Now school children have access to laptops with tens of gigabytes of storage, and network managers have to think in terms of the terabyte (1,000 gigabytes) and the petabyte (1,000 terabytes). Beyond those lie the exabyte, zettabyte and yottabyte, each a thousand times bigger than the last.  (IBM Global Technical Services white paper published in July 2006, titled, "The toxic terabyte: How data-dumping threatens business efficiency.) 5/22/2019

9 Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining
Hubungan sistem data mining dengan Sistem Basis Data, Sistem Data Warehouse , dan Business Intelligence Task dalam Data mining Fungsionalitas Data mining Permasalahan dalam Data Mining 5/22/2019

10 5/22/2019

11 Just Joke.. 5/22/2019

12 Definisi Data Mining Data mining is an iterative process within which progress is defined by discovery, through either automatic or manual methods. [Kantardzic  , 2003] Data mining (DM) is the extraction of hidden predictive information from large databases (DBs). With the automatic discovery of knowledge implicit within DBs, DM uses sophisticated statistical analysis and modeling techniques to uncover patterns and relationships hidden in organizational DBs [Wang, 2003] Data mining refers to extracting or \mining" knowledge from large amounts of data [Han, 2005] Non-trivial extraction of implicit, previously unknown and potentially useful information from data [Tan, 2003] 5/22/2019

13 Awal Data Mining Berawal dari beberapa disiplin ilmu, bertujuan untuk memperbaiki teknik tradisional sehingga bisa menangani: Jumlah data yang sangat besar Dimensi data yang tinggi Data yang heterogen dan berbeda bersifat 5/22/2019

14 Jenis Data pada Data Mining
database, data warehouse, database transaksional Data streams dan sensor data Time-series data, temporal data, sequence data Struktur data, graf, social networks dan database link Object-relational database Spatial data spatiotemporal data Multimedia database Text databases The World-Wide Web 5/22/2019

15 Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining
Hubungan sistem data mining dengan Sistem Basis Data, Sistem Data Warehouse , dan Business Intelligence Fungsionalitas Data mining Task dalam Data mining Permasalahan dalam Data Mining 5/22/2019

16 Hubungan DM, DB dan DW Untuk mengoptimalkan penggunaannya sistem Data Mining seharusnya memiliki hubungan dengan sistem basis data dan data warehouse. Tidak adanya hubungan tidak direkomendasikan misalnya seperti flat file processing Hubungan Loose coupling misalkan mpengambilan data dari DB/DW Hubungan Semi-tight coupling, yakni utnuk menambah performansi DM dengan pengimplementasian primitif data mining dalam sistem DB/DW misalkan sorting, indexing, aggregation, histogram analysis, multiway join dll Hubungan Tight coupling— merupakan enviroment pemrosesan yang sama dimana DM terintegrasi dengan sistem DB/DW, mining query dioptimasi berdasrkan mining query, indexing, metode pemrosesan query processing methods, dll. 5/22/2019

17 Data Mining & Business Intelligence
Meningkatkan potensi untuk mendukung keputusan bisnis End User Business Analyst Data DBA Making Decisions Data Presentation Visualization Techniques Data Mining Information Discovery Data Exploration OLAP, MDA Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts Data Sources Paper, Files, Information Providers, Database Systems, OLTP 5/22/2019

18 Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining
Integrasi sistem data mining dengan Sistem Basis Data,Sistem Data Warehouse , dan Business Intelligence Task dalam Data mining Fungsionalitas Data mining Permasalahan dalam Data Mining 5/22/2019

19 Task dalam Data Mining Metode Prediksi Metode Deskripsi
Dengan menggunakan beberapa variabel untuk memprediksi nilai yang belum diketahui (unknown ) atau nilai selanjutnya (future) dari variabel lain Contoh: Classification Regression Deviation Detection Metode Deskripsi Menemukan pola pendeskripsian data yang dapat diinterpretasikan oleh manusia Clustering Association Rule Discovery Sequential Pattern Discovery 5/22/2019

20 Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining
Integrasi sistem data mining dengan Sistem Basis Data,Sistem Data Warehouse , dan Business Intelligence Task dalam Data mining Fungsionalitas Data mining Permasalahan dalam Data Mining 5/22/2019

21 Fungsionalitas Data Mining (1)
Klasifikasi dan Prediksi Frequent patterns, asosiasi , korelasi dan kausalitas Analisis klaster Analisis Outlier Analysis Trend dan evolution Analisis statistik 5/22/2019

22 Aplikasi Data Mining (1)
Analisis dan Manajemen Pasar target pemasaran, customer relation management (CRM), market basket analysis, cross selling, segmentasi pasar Analisis dan Manajemen Resiko Forecasting, customer retention, quality control, analisis kompetisi Deteksi dan manajemen fraud (kecurangan) Text mining (news group, , dokumen) dan Analisis Web. Intelligent query answering 5/22/2019

23 Aplikasi Data Mining (2)
Marketing and Sales Promotion Supermarket shelf management. Inventory Management Diagnosis Medis Collaborative Filtering Business Intelligence Network Intrusion detection Deteksi spam dll 5/22/2019

24 Pokok Bahasan Latar Belakang Data Mining Apa dan Mengapa Data Mining
Integrasi sistem data mining dengan Sistem Basis Data,Sistem Data Warehouse , dan Business Intelligence Task dalam Data mining Fungsionalitas Data mining Permasalahan dalam Data Mining 5/22/2019

25 Permasalahan Utama Bagaimana Menentukan metodologi mining? karena:
Tipe data berbeda Performansi yang diharapkan dari segi keefektifan, efisiensi dan skalabilitas bisa jadi berbeda tiap metodologi Evaluasi pola yanki pengukuran “interestingness’ yang berbeda Penanganan missing value dan noise dll Bagaimana Bentuk Interaksi dengan User? Apakah: Menggunakan Data mining query languages dan ad-hoc mining Hasil data mining berupa ekspresi dan visualisasi Aplikasi dan Dampak Sosial Perlindungan terhadap keamanan , integrity dan privacy data 5/22/2019


Download ppt "PENGENALAN DATA MINING"

Presentasi serupa


Iklan oleh Google