2-Pembelajaran Statistik 25 Agustus 2015 Data pelatihan dan pengujian Bias dan variansi Error rate & confidence interval Regresi Linear Praktikum: Data.

Slides:

Advertisements

Presentasi serupa

ANALISIS REGRESI (REGRESSION ANALYSIS)

Advertisements

Chapter 11 k- Fold Cross Validation

© aSup-2007 PENGENALAN SPSS   1 INTRODUCTION to SPSS Statistical Package for Social Science.

The role of statistics.

KUSWANTO, SUB POKOK BAHASAN Mata kuliah dan SKS Manfaat Deskripsi Tujuan instruksional umum Pokok bahasan.

Regresi linier sederhana

Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.

Pendahuluan Algoritma Pengolahan Paralel S1-TI Prepared by: MT Wilson.

Pengertian dan Peranan Statistika dan Data Statistik Pertemuan 01

BLACK BOX TESTING.

Presented By : Group 2. A solution of an equation in two variables of the form. Ax + By = C and Ax + By + C = 0 A and B are not both zero, is an ordered.

© 2002 Prentice-Hall, Inc.Chap 1-1 Bab 1 Pendahuluan.

Testing Implementasi Sistem Oleh :Rifiana Arief, SKom, MMSI

Data Mining: Klasifikasi dan Prediksi Naive Bayesian & Bayesian Network . April 13, 2017.

INTRODUCTION TO SPSS Statistical Package for Social Science 1.

Population and sample. Population is complete actual/theoretical collection of numerical values (scores) that are of interest to the researcher. Simbol.

PENDUGAAN PARAMETER Pertemuan 7

(HTML). Frames are most typically used to have a menu in one frame, and content in another frame. When someone clicks a link on the menu that web page.

Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.

ANALISIS EKSPLORASI DATA

1 Pertemuan 25 Matakuliah: I0044 / Analisis Eksplorasi Data Tahun: 2007 Versi: V1 / R1 Analisis Regresi Ganda (I) : Pendugaan Model Regresi.

Pertemuan 07 Peluang Beberapa Sebaran Khusus Peubah Acak Kontinu

Dr. Nur Aini Masruroh Deterministic mathematical modeling.

Simple Regression ©. Null Hypothesis The analysis of business and economic processes makes extensive use of relationships between variables.

MULTIPLE REGRESSION ANALYSIS THE THREE VARIABLE MODEL: NOTATION AND ASSUMPTION 08/06/2015Ika Barokah S.

REGRESI LINIER SEDERHANA (SIMPLE LINEAR REGRESSION)

1 Pertemuan 24 Matakuliah: I0214 / Statistika Multivariat Tahun: 2005 Versi: V1 / R1 Analisis Struktur Peubah Ganda (IV): Analisis Kanonik.

Sebaran Peluang Kontinu (II) Pertemuan 8 Matakuliah: I0014 / Biostatistika Tahun: 2008.

4- Classification: Logistic Regression 9 September 2015 Intro to Logistic Regression.

Algoritma-algoritma Data Mining Pertemuan XIV. Classification.

Smoothing. Basic Smoothing Models Moving average, weighted moving average, exponential smoothing Single and Double Smoothing First order exponential smoothing.

A rsitektur dan M odel D ata M ining. Arsitektur Data Mining.

PROSEDUR – PROSEDUR POPULER DALAM EVIEWS

METODOLOGI PENELITIAN

Naive Bayesian & Bayesian Network

2. Data & Proses Datamining

STATISTIK INFERENSIAL

Data Hasil Ukur.

Probabilitas & Statistika

Klasifikasi Data Mining.

Matakuliah : I0014 / Biostatistika Tahun : 2005 Versi : V1 / R1

Pendugaan Parameter (I) Pertemuan 9

Software Engineering Rekayasa Perangkat Lunak

Pohon Keputusan (Decision Trees)

REAL NUMBERS EKSPONENT NUMBERS.

ANALISA REGRESI LINEAR DAN BERGANDA

Classification Supervised learning.

Semester Pendek FMIPA UGM 2005

Pertemuan Kesembilan Analisa Data

REGRESI LINIER SEDERHANA (SIMPLE LINEAR REGRESSION)

Oleh : Rahmat Robi Waliyansyah, M.Kom.

Teknik Pengujian Software

Business Statistics for Contemporary Decision Making.

Master data Management

Apa itu Statistik? Chapter 1.

Semester Pendek FMIPA UGM 2005

Analisis Korelasi dan Regresi Berganda Manajemen Informasi Kesehatan

Control Chart Transparency Masters to accompany Heizer/Render – Principles of Operations Management, 5e, and Operations Management, 7e.

DATA MINING with W E K A.

Uji Korelasi dan Regresi

Pertemuan 21 dan 22 Analisis Regresi dan Korelasi Sederhana

Ukuran Akurasi Model Deret Waktu Manajemen Informasi Kesehatan

PEMODELAN MATEMATIKA Kudang B. Seminar.

REGRESI LINIER SEDERHANA (SIMPLE LINEAR REGRESSION)

Al Muizzuddin F Matematika Ekonomi Lanjutan 2013

Textbooks. Association Rules Association rule mining  Oleh Agrawal et al in  Mengasumsikan seluruh data categorical.  Definition - What does.

Transcript presentasi:

2-Pembelajaran Statistik 25 Agustus 2015 Data pelatihan dan pengujian Bias dan variansi Error rate & confidence interval Regresi Linear Praktikum: Data summarization, vizualisation, linear regression

Classification: Definition Given a collection of records (training set ) – Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible. – A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

Illustrating Classification Task

Examples of Classification Task Predicting tumor cells as benign or malignant Classifying credit card transactions as legitimate or fraudulent Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil Categorizing news stories as finance, weather, entertainment, sports, etc

Classification Techniques Logistic regression Decision Tree based Methods Rule-based Methods Memory based reasoning Neural Networks Naïve Bayes and Bayesian Belief Networks Support Vector Machines

Methods of Estimation Holdout (proportion) – Reserve 2/3 for training and 1/3 for testing (depends on analyst) Random subsampling – Repeated holdout Cross validation – Partition data into k disjoint subsets – k-fold: train on k-1 partitions, test on the remaining one – Leave-one-out: k=n Stratified sampling – Oversampling vs undersampling Bootstrap (Averaging / Ensamble) – Sampling with replacement

Linear Regression A response y is a continuous measurement variable such as sales or profit Function f ( ・ ) is linear in the k regressor (predictor) variables The estimation of the parameters is usually achieved through least squares (which, for independent normal errors, is identical to maximum likelihood estimation).

Linear Regression Concepts

Linear Regression Evaluation Given a set of predictions for m new cases, we can evaluate the predictions according to their: – The mean error should be close to zero; mean errors different from zero indicate a bias in the forecasts. – The root mean square error expresses the magnitude of the forecast error in the units of the response variable. – The mean absolute percent forecast error expresses the forecast error in percentage terms.

ESTIMATION IN R

EXAMPLE 1 (3.1): FUEL EFFICIENCY OF AUTOMOBILES We try to model the fuel efficiency, measured in: GPM (gallons per 100 miles), as a function of GPM = 100/MPG: – WT = weight of the car (in 1000 lb), – DIS = cubic displacement (in cubic inches), – NC = number of cylinders, – HP = horsepower, – ACC = acceleration (in seconds from 0 to 60 mph) – ET = engine type (V-type and straight (coded as 1).

KUIS1 (10 menit) 1.Apakah datamining itu menurut pendapat Anda? 2.Sebutkan 4 tugas (tasks) dalam datamining 3.Apa yang dimaksud dengan tugas prediktif dan deskriptif dalam datamining? 4.Berikan dua contoh untuk masing-masing tipe data berikut ini: 1.Nominal 2.Ordinal 3.Interval 4.Ratio

TUGAS1: Analisis Data dan Regresi Linear 1.Gunakan data FuelEff 1.Berikan arti dari setiap baris kode dalam cross-validation (leave one out) regresi linear 2.Modifikasi kode cross-validation untuk 10-folds regresi linear 3.Analisis secara singkat apakah ME, RMSE dan MAPE 10-folds lebih baik atau lebih buruk dibandingkan dengan leave one out. 2.Gunakan data Orange Juice 1.Ikuti dan coba pahami langkah-langkah dalam buku DMBAR 2.3 (tidak dikumpul) 2.Berikan summary dari data Orange Juice: atribut apa saja berupa data nominal atau numeric 3.Lakukan regresi linear (hanya data numeric) dengan ‘logmove’ sebagai variabel yang akan diprediksi 1.Gunakan semua variabel numeric 2.Pilih satu variabel numeric yang paling kuat korelasinya dengan ‘logmove’ 3.Lakukan cross-validation: leave one out dan 10-folds untuk semua variabel dan variabel terpilih pada nomor 2. 4.Manakah kombinasi yang anda pilih: leave one out atau 10-folds, semua variabel atau terpilih. Berikan alasan yang jelas. Kumpulkan: 8 September 2015 Kirimkan untuk share folder Anda Per hari keterlambatan -25%