2-Pembelajaran Statistik 25 Agustus 2015 Data pelatihan dan pengujian Bias dan variansi Error rate & confidence interval Regresi Linear Praktikum: Data.

Slides:



Advertisements
Presentasi serupa
ANALISIS REGRESI (REGRESSION ANALYSIS)
Advertisements

Chapter 11 k- Fold Cross Validation
© aSup-2007 PENGENALAN SPSS   1 INTRODUCTION to SPSS Statistical Package for Social Science.
The role of statistics.
KUSWANTO, SUB POKOK BAHASAN Mata kuliah dan SKS Manfaat Deskripsi Tujuan instruksional umum Pokok bahasan.
Regresi linier sederhana
Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.
Pendahuluan Algoritma Pengolahan Paralel S1-TI Prepared by: MT Wilson.
Pengertian dan Peranan Statistika dan Data Statistik Pertemuan 01
BLACK BOX TESTING.
Presented By : Group 2. A solution of an equation in two variables of the form. Ax + By = C and Ax + By + C = 0 A and B are not both zero, is an ordered.
© 2002 Prentice-Hall, Inc.Chap 1-1 Bab 1 Pendahuluan.
Testing Implementasi Sistem Oleh :Rifiana Arief, SKom, MMSI
Data Mining: Klasifikasi dan Prediksi Naive Bayesian & Bayesian Network . April 13, 2017.
INTRODUCTION TO SPSS Statistical Package for Social Science 1.
Population and sample. Population is complete actual/theoretical collection of numerical values (scores) that are of interest to the researcher. Simbol.
PENDUGAAN PARAMETER Pertemuan 7
(HTML). Frames are most typically used to have a menu in one frame, and content in another frame. When someone clicks a link on the menu that web page.
Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.
ANALISIS EKSPLORASI DATA
1 Pertemuan 25 Matakuliah: I0044 / Analisis Eksplorasi Data Tahun: 2007 Versi: V1 / R1 Analisis Regresi Ganda (I) : Pendugaan Model Regresi.
Pertemuan 07 Peluang Beberapa Sebaran Khusus Peubah Acak Kontinu
Dr. Nur Aini Masruroh Deterministic mathematical modeling.
Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables.
MULTIPLE REGRESSION ANALYSIS THE THREE VARIABLE MODEL: NOTATION AND ASSUMPTION 08/06/2015Ika Barokah S.
REGRESI LINIER SEDERHANA (SIMPLE LINEAR REGRESSION)
1 Pertemuan 24 Matakuliah: I0214 / Statistika Multivariat Tahun: 2005 Versi: V1 / R1 Analisis Struktur Peubah Ganda (IV): Analisis Kanonik.
Sebaran Peluang Kontinu (II) Pertemuan 8 Matakuliah: I0014 / Biostatistika Tahun: 2008.
4- Classification: Logistic Regression 9 September 2015 Intro to Logistic Regression.
Algoritma-algoritma Data Mining Pertemuan XIV. Classification.
Smoothing. Basic Smoothing Models Moving average, weighted moving average, exponential smoothing Single and Double Smoothing First order exponential smoothing.
A rsitektur dan M odel D ata M ining. Arsitektur Data Mining.
PROSEDUR – PROSEDUR POPULER DALAM EVIEWS
METODOLOGI PENELITIAN
Naive Bayesian & Bayesian Network
2. Data & Proses Datamining
STATISTIK INFERENSIAL
Data Hasil Ukur.
Probabilitas & Statistika
Klasifikasi Data Mining.
Matakuliah : I0014 / Biostatistika Tahun : 2005 Versi : V1 / R1
Kode Hamming.
Data Mining.
Clustering.
Pendugaan Parameter (I) Pertemuan 9
Software Engineering Rekayasa Perangkat Lunak
Pohon Keputusan (Decision Trees)
REAL NUMBERS EKSPONENT NUMBERS.
ANALISA REGRESI LINEAR DAN BERGANDA
Classification Supervised learning.
Semester Pendek FMIPA UGM 2005
Pertemuan Kesembilan Analisa Data
REGRESI LINIER SEDERHANA (SIMPLE LINEAR REGRESSION)
Oleh : Rahmat Robi Waliyansyah, M.Kom.
Teknik Pengujian Software
Business Statistics for Contemporary Decision Making.
Master data Management
Apa itu Statistik? Chapter 1.
Semester Pendek FMIPA UGM 2005
Analisis Korelasi dan Regresi Berganda Manajemen Informasi Kesehatan
Control Chart Transparency Masters to accompany Heizer/Render – Principles of Operations Management, 5e, and Operations Management, 7e.
DATA MINING with W E K A.
Uji Korelasi dan Regresi
Pertemuan 21 dan 22 Analisis Regresi dan Korelasi Sederhana
KLASIFIKASI.
Ukuran Akurasi Model Deret Waktu Manajemen Informasi Kesehatan
PEMODELAN MATEMATIKA Kudang B. Seminar.
REGRESI LINIER SEDERHANA (SIMPLE LINEAR REGRESSION)
Al Muizzuddin F Matematika Ekonomi Lanjutan 2013
Textbooks. Association Rules Association rule mining  Oleh Agrawal et al in  Mengasumsikan seluruh data categorical.  Definition - What does.
Transcript presentasi:

2-Pembelajaran Statistik 25 Agustus 2015 Data pelatihan dan pengujian Bias dan variansi Error rate & confidence interval Regresi Linear Praktikum: Data summarization, vizualisation, linear regression

Classification: Definition Given a collection of records (training set ) – Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible. – A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

Illustrating Classification Task

Examples of Classification Task Predicting tumor cells as benign or malignant Classifying credit card transactions as legitimate or fraudulent Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil Categorizing news stories as finance, weather, entertainment, sports, etc

Classification Techniques Logistic regression Decision Tree based Methods Rule-based Methods Memory based reasoning Neural Networks Naïve Bayes and Bayesian Belief Networks Support Vector Machines

Methods of Estimation Holdout (proportion) – Reserve 2/3 for training and 1/3 for testing (depends on analyst) Random subsampling – Repeated holdout Cross validation – Partition data into k disjoint subsets – k-fold: train on k-1 partitions, test on the remaining one – Leave-one-out: k=n Stratified sampling – Oversampling vs undersampling Bootstrap (Averaging / Ensamble) – Sampling with replacement

Linear Regression A response y is a continuous measurement variable such as sales or profit Function f ( ・ ) is linear in the k regressor (predictor) variables The estimation of the parameters is usually achieved through least squares (which, for independent normal errors, is identical to maximum likelihood estimation).

Linear Regression Concepts

Linear Regression Evaluation Given a set of predictions for m new cases, we can evaluate the predictions according to their: – The mean error should be close to zero; mean errors different from zero indicate a bias in the forecasts. – The root mean square error expresses the magnitude of the forecast error in the units of the response variable. – The mean absolute percent forecast error expresses the forecast error in percentage terms.

ESTIMATION IN R

EXAMPLE 1 (3.1): FUEL EFFICIENCY OF AUTOMOBILES We try to model the fuel efficiency, measured in: GPM (gallons per 100 miles), as a function of GPM = 100/MPG: – WT = weight of the car (in 1000 lb), – DIS = cubic displacement (in cubic inches), – NC = number of cylinders, – HP = horsepower, – ACC = acceleration (in seconds from 0 to 60 mph) – ET = engine type (V-type and straight (coded as 1).

KUIS1 (10 menit) 1.Apakah datamining itu menurut pendapat Anda? 2.Sebutkan 4 tugas (tasks) dalam datamining 3.Apa yang dimaksud dengan tugas prediktif dan deskriptif dalam datamining? 4.Berikan dua contoh untuk masing-masing tipe data berikut ini: 1.Nominal 2.Ordinal 3.Interval 4.Ratio

TUGAS1: Analisis Data dan Regresi Linear 1.Gunakan data FuelEff 1.Berikan arti dari setiap baris kode dalam cross-validation (leave one out) regresi linear 2.Modifikasi kode cross-validation untuk 10-folds regresi linear 3.Analisis secara singkat apakah ME, RMSE dan MAPE 10-folds lebih baik atau lebih buruk dibandingkan dengan leave one out. 2.Gunakan data Orange Juice 1.Ikuti dan coba pahami langkah-langkah dalam buku DMBAR 2.3 (tidak dikumpul) 2.Berikan summary dari data Orange Juice: atribut apa saja berupa data nominal atau numeric 3.Lakukan regresi linear (hanya data numeric) dengan ‘logmove’ sebagai variabel yang akan diprediksi 1.Gunakan semua variabel numeric 2.Pilih satu variabel numeric yang paling kuat korelasinya dengan ‘logmove’ 3.Lakukan cross-validation: leave one out dan 10-folds untuk semua variabel dan variabel terpilih pada nomor 2. 4.Manakah kombinasi yang anda pilih: leave one out atau 10-folds, semua variabel atau terpilih. Berikan alasan yang jelas. Kumpulkan: 8 September 2015 Kirimkan untuk share folder Anda Per hari keterlambatan -25%