Data Mining: 7. Algoritma Estimasi dan Forecasting Romi Satria Wahono WA/SMS: +6281586220090 1.

Slides:



Advertisements
Presentasi serupa
Chapter 11 k- Fold Cross Validation
Advertisements

HEALTHCARE DATAMINING
PERSAMAAN DIFERENSIAL (DIFFERENTIAL EQUATION)
Chapter 9 ALGORITME Cluster dan WEKA
Common Effect Model.
JATI SURYANTO S.PD. LANGUANGE TRAINING CENTER UNIVERSITAS MUHAMMADIYAH YOGYAKARTA.
Korelasi Linier KUSWANTO Korelasi Keeratan hubungan antara 2 variabel yang saling bebas Walaupun dilambangkan dengan X dan Y namun keduanya diasumsikan.
K-Map Using different rules and properties in Boolean algebra can simplify Boolean equations May involve many of rules / properties during simplification.
Analisis Deret Waktu: Materi minggu ketiga
Presented By : Group 2. A solution of an equation in two variables of the form. Ax + By = C and Ax + By + C = 0 A and B are not both zero, is an ordered.
1 Pertemuan 09 Kebutuhan Sistem Matakuliah: T0234 / Sistem Informasi Geografis Tahun: 2005 Versi: 01/revisi 1.
Masalah Transportasi II (Transportation Problem II)
1 Pertemuan 22 Analisis Studi Kasus 2 Matakuliah: H0204/ Rekayasa Sistem Komputer Tahun: 2005 Versi: v0 / Revisi 1.
PERTEMUAN KE-6 UNIFIED MODELLING LANGUAGE (UML) (Part 2)
HAMPIRAN NUMERIK SOLUSI PERSAMAAN NIRLANJAR Pertemuan 3
1 HAMPIRAN NUMERIK SOLUSI PERSAMAAN LANJAR Pertemuan 5 Matakuliah: K0342 / Metode Numerik I Tahun: 2006 TIK:Mahasiswa dapat meghitung nilai hampiran numerik.
1 Pertemuan 24 Deret Berkala, Peramalan, dan Angka Indeks-2 Matakuliah: A0064 / Statistik Ekonomi Tahun: 2005 Versi: 1/1.
1 Pertemuan 24 Matakuliah: I0214 / Statistika Multivariat Tahun: 2005 Versi: V1 / R1 Analisis Struktur Peubah Ganda (IV): Analisis Kanonik.
Data Mining: 2. Proses Data Mining
ASPEK PASAR SKB (LANJUTAN)
Data Mining: 7. Algoritma Estimasi
Data Mining Romi Satria Wahono WA/SMS:
4- Classification: Logistic Regression 9 September 2015 Intro to Logistic Regression.
9.3 Geometric Sequences and Series. Objective To find specified terms and the common ratio in a geometric sequence. To find the partial sum of a geometric.
Smoothing. Basic Smoothing Models Moving average, weighted moving average, exponential smoothing Single and Double Smoothing First order exponential smoothing.
Data Mining Romi Satria Wahono WA/SMS:
Jartel, Sukiswo Sukiswo
Data Mining Junta Zeniarja, M.Kom, M.CS
ILIMA FITRI AZMI TEACHING MATERIAL DEVELOPMENT
Notasi Object Oriented System
Being Researcher-Technopreneur
MOVING AVERAGES.
CLASS DIAGRAM.
CA113 Pengantar Manajemen Bisnis
BY EKA ANDRIANI NOVALIA RIZKANISA VELA DESTINA
Peramalan Data Time Series
Software Engineering Rekayasa Perangkat Lunak
Exponential Smoothing
PARADIGM SHIFT JATI SURYANTO S.PD., MA.
Dr Rilla Gantino, SE., AK., MM
CA113 Pengantar Manajemen Bisnis
Kk ilo Associative entity.
PERSAMAAN DIFERENSIAL (DIFFERENTIAL EQUATION)
ACCUMULATION PROBLEMS
Enhancing Decision Making
Master data Management
Pertemuan 4 CLASS DIAGRAM.
Pertemuan 21 dan 22 Analisis Regresi dan Korelasi Sederhana
Customer Relationship Management
How to Set Up AT&T on MS Outlook ATT is a multinational company headquartered in Texas. ATT services are used by many people widely across.
How You Can Make Your Fleet Insurance London Claims Letter.
How Can I Be A Driver of The Month as I Am Working for Uber?
Things You Need to Know Before Running on the Beach.
How to Pitch an Event
Don’t Forget to Avail the Timely Offers with Uber
Suhandi Wiratama. Before I begin this presentation, I want to thank Mr. Abe first. He taught me many things about CorelDRAW. He also guided me when I.
CA113 Pengantar Manajemen Bisnis
Konsep Aplikasi Data Mining
Take a look at these photos.... Also, in case you're wondering where this hotel is, it isn't a hotel at all. It is a house! It's owned by the family of.
Group 3 About causal Conjunction Member : 1. Ahmad Fandia R. S.(01) 2. Hesti Rahayu(13) 3. Intan Nuraini(16) 4. Putri Nur J. (27) Class: XI Science 5.
 Zoho Mail offers easy options to migrate data from G Suite or Gmail accounts. All s, contacts, and calendar or other important data can be imported.
If you are an user, then you know how spam affects your account. In this article, we tell you how you can control spam’s in your ZOHO.
In this article, you can learn about how to synchronize AOL Mail with third-party applications like Gmail, Outlook, and Window Live Mail, Thunderbird.
INTERROGATIVE ADJECTIVE. DEFINITION FUNCTION EXAMPLE QUESTION.
By Yulius Suprianto Macroeconomics | 02 Maret 2019 Chapter-5: The Standard of Living Over Time and A Cross Countries Source: http//
Right, indonesia is a wonderful country who rich in power energy not only in term of number but also diversity. Energy needs in indonesia are increasingly.
Website: Website Technologies.
Rank Your Ideas The next step is to rank and compare your three high- potential ideas. Rank each one on the three qualities of feasibility, persuasion,
Draw a picture that shows where the knife, fork, spoon, and napkin are placed in a table setting.
2. Discussion TASK 1. WORK IN PAIRS Ask your partner. Then, in turn your friend asks you A. what kinds of product are there? B. why do people want to.
Transcript presentasi:

Data Mining: 7. Algoritma Estimasi dan Forecasting Romi Satria Wahono WA/SMS:

2 Romi Satria Wahono SD Sompok Semarang (1987) SMPN 8 Semarang (1990) SMA Taruna Nusantara Magelang (1993) B.Eng, M.Eng and Ph.D in Software Engineering from Saitama University Japan ( ) Universiti Teknikal Malaysia Melaka (2014) Research Interests: Software Engineering, Machine Learning Founder dan Koordinator IlmuKomputer.Com Peneliti LIPI ( ) Founder dan CEO PT Brainmatics Cipta Informatika

8. Text Mining 7. Algoritma Estimasi dan Forecasting 6. Algoritma Asosiasi 5. Algoritma Klastering 4. Algoritma Klasifikasi 3. Persiapan Data 2. Proses Data Mining 1. Pengantar Data Mining 3 Course Outline

7. Algoritma Estimasi dan Forecasting 7.1 Linear Regression 7.2 Neural Network 7.3 Support Vector Machine 7.4 Time Series Forecasting 4

7.1 Linear Regression 5

1.Siapkan data 2.Identifikasi Atribut dan Label 3.Hitung X², Y², XY dan total dari masing- masingnya 4.Hitung a dan b berdasarkan persamaan yang sudah ditentukan 5.Buat Model Persamaan Regresi Linear Sederhana 6 Tahapan Algoritma Linear Regression

7 1. Persiapan Data Tanggal Rata-rata Suhu Ruangan (X) Jumlah Cacat (Y)

Y = a + bX Dimana: Y = Variabel terikat (Dependen) X = Variabel tidak terikat (Independen) a = konstanta b = koefisien regresi (kemiringan); besaran Response yang ditimbulkan oleh variabel a = (Σy) (Σx²) – (Σx) (Σxy) n(Σx²) – (Σx)² b = n(Σxy) – (Σx) (Σy) n(Σx²) – (Σx)² 8 2. Identifikasikan Atribut dan Label

9 3. Hitung X², Y², XY dan total dari masing-masingnya Tanggal Rata-rata Suhu Ruangan (X) Jumlah Cacat (Y) X2X2 Y2Y2 XY

Menghitung Koefisien Regresi (a) a = (Σy) (Σx²) – (Σx) (Σxy) n(Σx²) – (Σx)² a = (72) (4876) – (220) (1640) 10 (4876) – (220)² a = -27,02 Menghitung Koefisien Regresi (b) b = n(Σxy) – (Σx) (Σy) n(Σx²) – (Σx)² b = 10 (1640) – (220) (72) 10 (4876) – (220)² b = 1, Hitung a dan b berdasarkan persamaan yang sudah ditentukan

Y = a + bX Y = -27,02 + 1,56X Buatkan Model Persamaan Regresi Linear Sederhana

1.Prediksikan Jumlah Cacat Produksi jika suhu dalam keadaan tinggi (Variabel X), contohnya: 30°C Y = -27,02 + 1,56X Y = -27,02 + 1,56(30) =19,78 2.Jika Cacat Produksi (Variabel Y) yang ditargetkan hanya boleh 5 unit, maka berapakah suhu ruangan yang diperlukan untuk mencapai target tersebut? 5= -27,02 + 1,56X 1,56X = 5+27,02 X= 32,02/1,56 X =20,52 Jadi Prediksi Suhu Ruangan yang paling sesuai untuk mencapai target Cacat Produksi adalah sekitar 20,52 0 C 12 Pengujian

7.1.2 Studi Kasus CRISP-DM Heating Oil Consumption – Estimation (Matthew North, Data Mining for the Masses, 2012, Chapter 8 Estimation, pp ) Dataset: HeatingOil-Training.csv dan HeatingOil-Scoring.csv 13

Lakukan eksperimen mengikuti buku Matthew North, Data Mining for the Masses, 2012, Chapter 8 Estimation, pp tentang Heating Oil Consumption Dataset: HeatingOil-Training.csv dan HeatingOil- Scoring.csv 14 Latihan

15 CRISP-DM

Sarah, the regional sales manager is back for more help Business is booming, her sales team is signing up thousands of new clients, and she wants to be sure the company will be able to meet this new level of demand, she now is hoping we can help her do some prediction as well She knows that there is some correlation between the attributes in her data set (things like temperature, insulation, and occupant ages), and she’s now wondering if she can use the previous data set to predict heating oil usage for new customers You see, these new customers haven’t begun consuming heating oil yet, there are a lot of them (42,650 to be exact), and she wants to know how much oil she needs to expect to keep in stock in order to meet these new customers’ demand Can she use data mining to examine household attributes and known past consumption quantities to anticipate and meet her new customers’ needs? 16 Context and Perspective

Sarah’s new data mining objective is pretty clear: she wants to anticipate demand for a consumable product We will use a linear regression model to help her with her desired predictions She has data, 1,218 observations that give an attribute profile for each home, along with those homes’ annual heating oil consumption She wants to use this data set as training data to predict the usage that 42,650 new clients will bring to her company She knows that these new clients’ homes are similar in nature to her existing client base, so the existing customers’ usage behavior should serve as a solid gauge for predicting future usage by new customers Business Understanding

We create a data set comprised of the following attributes: Insulation: This is a density rating, ranging from one to ten, indicating the thickness of each home’s insulation. A home with a density rating of one is poorly insulated, while a home with a density of ten has excellent insulation Temperature: This is the average outdoor ambient temperature at each home for the most recent year, measure in degree Fahrenheit Heating_Oil: This is the total number of units of heating oil purchased by the owner of each home in the most recent year Num_Occupants: This is the total number of occupants living in each home Avg_Age: This is the average age of those occupants Home_Size: This is a rating, on a scale of one to eight, of the home’s overall size. The higher the number, the larger the home Data Understanding

A CSV data set for this chapter’s example is available for download at the book’s companion web site ( Data Preparation

20 3. Data Preparation

21 3. Data Preparation

22 4. Modeling

23 4. Modeling

24 5. Evaluation

25 5. Evaluation

26 6. Deployment

27 6. Deployment

28 6. Deployment

7.2 Neural Network 29

7.3 Support Vector Machine 30

7.4 Time Series Forecasting 31

Time series forecasting is one of the oldest known predictive analytics techniques It has existed and been in widespread use even before the term “predictive analytics” was ever coined Independent or predictor variables are not strictly necessary for univariate time series forecasting, but are strongly recommended for multivariate time series Time series forecasting methods: 1.Data Driven Method: There is no difference between a predictor and a target. Techniques such as time series averaging or smoothing are considered data-driven approaches to time series forecasting 2.Model Driven Method: Similar to “conventional” predictive models, which have independent and dependent variables, but with a twist: the independent variable is now time 32 Time Series Forecasting

There is no difference between a predictor and a target The predictor is also the target variable Data Driven Methods: Naïve Forecast Simple Average Moving Average Weighted Moving Average Exponential Smoothing Holt’s Two-Parameter Exponential Smoothing 33 Data Driven Methods

In model-driven methods, time is the predictor or independent variable and the time series value is the dependent variable Model-based methods are generally preferable when the time series appears to have a “global” pattern The idea is that the model parameters will be able to capture these patterns Thus enable us to make predictions for any step ahead in the future under the assumption that this pattern is going to repeat For a time series with local patterns instead of a global pattern, using the model-driven approach requires specifying how and when the patterns change, which is difficult 34 Model Driven Methods

Linear Regression Polynomial Regression Linear Regression with Seasonality Autoregression Models and ARIMA 35 Model Driven Methods

RapidMiner’s approach to time series is based on two main data transformation processes The fist is windowing to transform the time series data into a generic data set: this step will convert the last row of a window within the time series into a label or target variable We apply any of the “learners” or algorithms to predict the target variable and thus predict the next time step in the series 36 How to Implement

The parameters of the Windowing operator allow changing the size of the windows, the overlap between consecutive windows (also known as step size), and the prediction horizon, which is used for forecasting The prediction horizon controls which row in the raw data series ends up as the label variable in the transformed series 37 Windowing Concept

38 Rapidminer Windowing Operator

Window size: Determines how many “attributes” are created for the cross-sectional data Each row of the original time series within the window width will become a new attribute We choose w = 6 Step size: Determines how to advance the window Let us use s = 1 Horizon: Determines how far out to make the forecast If the window size is 6 and the horizon is 1, then the seventh row of the original time series becomes the fist sample for the “label” variable Let us use h = 1 39 Windowing Operator Parameters

40

Lakukan training dengan menggunakan linear regression pada dataset hargasaham-training.xls Gunakan Split Data untuk memisahkan dataset di atas, 90% training dan 10% untuk testing Harus dilakukan proses Windowing pada dataset Plot grafik antara label dan hasil prediksi dengan menggunakan chart 41 Latihan

Cari data time series di internet, data apapun Lakukan proses data mining terhadap data tersebut, lihat pola yang terbentuk 42 Latihan

1.Jelaskan perbedaan antara data, informasi dan pengetahuan! 2.Jelaskan apa yang anda ketahui tentang data mining! 3.Sebutkan peran utama data mining! 4.Sebutkan pemanfaatan dari data mining di berbagai bidang! 5.Pengetahuan atau pola apa yang bisa kita dapatkan dari data di bawah? 43 Post-Test NIMGenderNilai UN Asal Sekolah IPS1IPS2IPS3IPS 4...Lulus Tepat Waktu 10001L28SMAN Ya 10002P27SMAN Tidak 10003P24SMAN Tidak 10004L26.4SMAN Ya L23.4SMAN Ya

1.Jiawei Han and Micheline Kamber, Data Mining: Concepts and Techniques Third Edition, Elsevier, Ian H. Witten, Frank Eibe, Mark A. Hall, Data mining: Practical Machine Learning Tools and Techniques 3rd Edition, Elsevier, Markus Hofmann and Ralf Klinkenberg, RapidMiner: Data Mining Use Cases and Business Analytics Applications, CRC Press Taylor & Francis Group, Daniel T. Larose, Discovering Knowledge in Data: an Introduction to Data Mining, John Wiley & Sons, Ethem Alpaydin, Introduction to Machine Learning, 3rd ed., MIT Press, Florin Gorunescu, Data Mining: Concepts, Models and Techniques, Springer, Oded Maimon and Lior Rokach, Data Mining and Knowledge Discovery Handbook Second Edition, Springer, Warren Liao and Evangelos Triantaphyllou (eds.), Recent Advances in Data Mining of Enterprise Data: Algorithms and Applications, World Scientific, Referensi