Association Rules and Frequent Item Analysis

Slides:



Advertisements
Presentasi serupa
2. Introduction to Algorithm and Programming
Advertisements

Chapter 10 ALGORITME for ASSOCIATION RULES
Relation
Data Mining.
RANGKAIAN LOGIKA KOMBINASIONAL
Lecture 5 Nonblocking I/O and Multiplexing Erick Pranata © Sekolah Tinggi Teknik Surabaya 1.
Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.
Game Theory Purdianta, ST., MT..
K-Map Using different rules and properties in Boolean algebra can simplify Boolean equations May involve many of rules / properties during simplification.
1 DATA STRUCTURE “ STACK” SHINTA P STMIK MDP APRIL 2011.
BLACK BOX TESTING.
Presented By : Group 2. A solution of an equation in two variables of the form. Ax + By = C and Ax + By + C = 0 A and B are not both zero, is an ordered.
Association Rules.
Association Rule (Apriori Algorithm)
1 Pertemuan 09 Kebutuhan Sistem Matakuliah: T0234 / Sistem Informasi Geografis Tahun: 2005 Versi: 01/revisi 1.
Ruang Contoh dan Peluang Pertemuan 05
Masalah Transportasi II (Transportation Problem II)
1 Pertemuan 21 Function Matakuliah: M0086/Analisis dan Perancangan Sistem Informasi Tahun: 2005 Versi: 5.
BAB 6 KOMBINATORIAL DAN PELUANG DISKRIT. KOMBINATORIAL (COMBINATORIC) : ADALAH CABANG MATEMATIKA YANG MEMPELAJARI PENGATURAN OBJEK- OBJEK. ADALAH CABANG.
Pertemuan XIV FUNGSI MAYOR Assosiation. What Is Association Mining? Association rule mining: –Finding frequent patterns, associations, correlations, or.
PERTEMUAN KE-6 UNIFIED MODELLING LANGUAGE (UML) (Part 2)
Pertemuan 07 Peluang Beberapa Sebaran Khusus Peubah Acak Kontinu
1 Pertemuan 11 Function dari System Matakuliah: M0446/Analisa dan Perancangan Sistem Informasi Tahun: 2005 Versi: 0/0.
Algoritma-algoritma Data Mining Pertemuan XIV. Classification.
9.3 Geometric Sequences and Series. Objective To find specified terms and the common ratio in a geometric sequence. To find the partial sum of a geometric.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 9 Relational Database Design by ER- to-Relational Mapping.
ANALISIS ASOSIASI BAGIAN 1
ANALISIS ASOSIASI.
DISTRIBUSI BINOMIAL.
ANALISIS ASOSIASI BAGIAN 1
Induksi Matematika.
Cartesian coordinates in two dimensions
Cartesian coordinates in two dimensions
Assocation Rule Data Mining.
ANALISIS ASOSIASI BAGIAN 1
Statistika Chapter 4 Probability.
Pengujian Hipotesis (I) Pertemuan 11
Matakuliah : I0014 / Biostatistika Tahun : 2005 Versi : V1 / R1
ANALISA ASOSIASI DATA MINING.
Data Mining Junta Zeniarja, M.Kom, M.CS
Algorithms and Programming Searching
.: ALGORITMA APRIORI :. DSS - Wiji Setiyaningsih, M.Kom
Pendugaan Parameter (II) Pertemuan 10
REAL NUMBERS EKSPONENT NUMBERS.
Kk ilo Associative entity.
Teknik Pengujian Software
Master data Management
Pertemuan 4 CLASS DIAGRAM.
FP-Growth Darmansyah Rahmat Hasbullah
ANALISIS ASOSIASI APRIORI.
How to Set Up AT&T on MS Outlook ATT is a multinational company headquartered in Texas. ATT services are used by many people widely across.
How You Can Make Your Fleet Insurance London Claims Letter.
How Can I Be A Driver of The Month as I Am Working for Uber?
How to Pitch an Event
Don’t Forget to Avail the Timely Offers with Uber
MODUL 10 APRIORI.
Aplikasi Graph Minimum Spaning Tree Shortest Path.
THE INFORMATION ABOUT HEALTH INSURANCE IN AUSTRALIA.
Lesson 2-1 Conditional Statements 1 Lesson 2-1 Conditional Statements.
ASSOCIATION RULE DAN PENERAPANNYA
If you are an user, then you know how spam affects your account. In this article, we tell you how you can control spam’s in your ZOHO.
Right, indonesia is a wonderful country who rich in power energy not only in term of number but also diversity. Energy needs in indonesia are increasingly.
Website: Website Technologies.
Textbooks. Association Rules Association rule mining  Oleh Agrawal et al in  Mengasumsikan seluruh data categorical.  Definition - What does.
Rank Your Ideas The next step is to rank and compare your three high- potential ideas. Rank each one on the three qualities of feasibility, persuasion,
Vector. A VECTOR can describe anything that has both MAGNITUDE and DIRECTION The MAGNITUDE describes the size of the vector. The DIRECTION tells you where.
Probability IIntroduction to Probability ASatisfactory outcomes vs. total outcomes BBasic Properties CTerminology IICombinatory Probability AThe Addition.
Draw a picture that shows where the knife, fork, spoon, and napkin are placed in a table setting.
ASSOCIATION RULES APRIORI.
2. Discussion TASK 1. WORK IN PAIRS Ask your partner. Then, in turn your friend asks you A. what kinds of product are there? B. why do people want to.
Transcript presentasi:

Association Rules and Frequent Item Analysis

Aturan Asosiasi (Association Rule) #1 Aturan Asosiasi adalah teknik data mining untuk menemukan aturan asosiatif antara kombinasi item. Penting atau tidaknya suatu aturan asosiasi dapat diketahui dengan dua parameter, Support yaitu persentase kombinasi item tersebut dalam database Confidence yaitu kuatnya hubungan antara item dalam association rule.

Aturan Asosiasi (Association Rule) #2 Algoritma untuk aturan asosiasi yang paling popular adalah Algoritma Apriori dengan paradigma “Generate And Test”, yaitu pembuatan kandidat kombinasi item yang mungkin berdasarkan aturan tertentu, kemudian diuji apakah kombinasi item tersebut memenuhi syarat support minimum. Kombinasi item yang memenuhi syarat tersebut disebut frequent item-set atau large item-set, yang kemudian dipakai untuk membuat aturan-aturan yang memenuhi syarat confidence minimum.

Aturan Asosiasi (Association Rule) #3 Algoritma apriori merupakan algoritma pertama dan sering digunakan untuk menemukan aturan asosiasi di dalam aplikasi data mining dengan teknik aturan asosiasi (Witten dan Frank, 2005; Kuntardzic, 2003) Tujuan dari algoritma apriori adalah untuk menemukan aturan (rule) yang memenuhi minimum support yang telah ditetapkan sebelumnya dan memenuhi nilai confidence yang disyaratkan Tugas dari algoritma apriori dalam menemukan aturan asosiasi dapat dibagi menjadi dua fase : (1). menemukan itemsets. (2). membentuk aturan (rule) dari itemsets yang ditemukan. Algoritma dari aturan asosiasi yang lain adalah Algoritma Predictive Apriori dan Tertius

Aturan Asosiasi (Association Rule) #4 Potongan pseudocode dari Algoritma Apriori

Association Rules Mining #1 Association Rules Mining (aturan mining assosiasi) berfungsi untuk menemukan asosiasi antar variabel, korelasi atau suatu struktur diantara item atau objek-objek didalam database transaksi, database relasional, maupun pada penyimpanan informasi lainnya (Agrawal dan Srikant, 1994; Tan dkk., 2006) Sebuah aturan asosiasi adalah implikasi dari X⇒Y, dimana X⊂I , Y⊂I, dan X∩Y = φ. Aturan X⇒Y berada di dalam himpunan transaksi D dengan Kepercayaan (confidence) c, jika c% dari transaksi dalam D yang ada X terdapat juga Y. Dukungan (support) s di dalam set transaksi D, jika s% transaksi dalam D terdapat X∪Y

Association Rules Mining #2 Support itemset (1) adalah jumlah transaksi T yang didalamnya terdapat (1). Confidence mengukur seberapa besar ketergantungan suatu item dengan item yang lainnya. Frequent itemset adalah itemset yang memiliki nilai support lebih atau sama besar dengan nilai minimum support yang ditentukan (2) Dimisalkan suatu aturan R : X⇒Y, maka: ………….. (1) …….. (2)

Association Rules Mining #3 Sebuah aturan (rule) biasanya terdiri dari dua bagian, yaitu kondisi (condition) dan hasil (result). Dan biasanya disajikan dalam pernyataan sebagai berikut : Jika kondisi maka hasil (If condition then result)

Transactions Example

Transaction database: Example, 1 Instances = Transactions ITEMS: A = milk B= bread C= cereal D= sugar E= eggs

Transaction database: Example, 2 Attributes converted to binary flags

Definitions Item: attribute=value pair or simply value usually attributes are converted to binary flags for each value, e.g. product=“A” is written as “A” Itemset I : a subset of possible items Example: I = {A,B,E} (order unimportant) Transaction: (TID, itemset) TID is transaction ID

Support and Frequent Itemsets Support of an itemset sup(I ) = no. of transactions t that support (i.e. contain) I In example database: sup ({A,B,E}) = 2, sup ({B,C}) = 4 Frequent itemset I is one with at least the minimum support count sup(I ) >= minsup

SUBSET PROPERTY Every subset of a frequent set is frequent! Q: Why is it so? A: Example: Suppose {A,B} is frequent. Since each occurrence of A,B includes both A and B, then both A and B must also be frequent Similar argument for larger itemsets Almost all association rule algorithms are based on this subset property

Association Rules Association rule R : Itemset1 => Itemset2 Itemset1, 2 are disjoint and Itemset2 is non-empty meaning: if transaction includes Itemset1 then it also has Itemset2 Examples A,B => E,C A => B,C

From Frequent Itemsets to Association Rules Q: Given frequent set {A,B,E}, what are possible association rules? A => B, E A, B => E A, E => B B => A, E B, E => A E => A, B __ => A,B,E (empty rule), or true => A,B,E

Classification vs Association Rules Classification Rules Focus on one target field Specify class in all cases Measures: Accuracy Association Rules Many target fields Applicable in some cases Measures: Support, Confidence, Lift

Rule Support and Confidence Suppose R : I => J is an association rule sup (R) = sup (I  J) is the support count support of itemset I  J (I or J) conf (R) = sup(J) / sup(R) is the confidence of R fraction of transactions with I  J that have J Association rules with minimum support and count are sometimes called “strong” rules

Association Rules Example, 1 Q: Given frequent set {A,B,E}, what association rules have minsup = 2 and minconf= 50% ? A, B => E : conf=2/4 = 50%

Association Rules Example, 2 Q: Given frequent set {A,B,E}, what association rules have minsup = 2 and minconf= 50% ? A, B => E : conf=2/4 = 50% A, E => B : conf=2/2 = 100% B, E => A : conf=2/2 = 100% E => A, B : conf=2/2 = 100% Don’t qualify A =>B, E : conf=2/6 =33%< 50% B => A, E : conf=2/7 = 28% < 50% __ => A,B,E : conf: 2/9 = 22% < 50%

Find Strong Association Rules A rule has the parameters minsup and minconf: sup(R) >= minsup and conf (R) >= minconf Problem: Find all association rules with given minsup and minconf First, find all frequent itemsets

Finding Frequent Itemsets Start by finding one-item sets (easy) Q: How? A: Simply count the frequencies of all items

Finding itemsets: next level Apriori algorithm (Agrawal & Srikant) Idea: use one-item sets to generate two-item sets, two-item sets to generate three-item sets, … If (A B) is a frequent item set, then (A) and (B) have to be frequent item sets as well! In general: if X is frequent k-item set, then all (k-1)-item subsets of X are also frequent Compute k-item set by merging (k-1)-item sets

Another example Given: five three-item sets (A B C), (A B D), (A C D), (A C E), (B C D) Lexicographic order improves efficiency Candidate four-item sets: (A B C D) Q: OK? A: yes, because all 3-item subsets are frequent (A C D E) Q: OK? A: No, because (C D E) is not frequent

Generating Association Rules Two stage process: Determine frequent itemsets e.g. with the Apriori algorithm. For each frequent item set I for each subset J of I determine all association rules of the form: I-J => J Main idea used in both stages : subset property

Example: Generating Rules from an Itemset Frequent itemset from golf data: Seven potential rules: Humidity = Normal, Windy = False, Play = Yes (4) If Humidity = Normal and Windy = False then Play = Yes If Humidity = Normal and Play = Yes then Windy = False If Windy = False and Play = Yes then Humidity = Normal If Humidity = Normal then Windy = False and Play = Yes If Windy = False then Humidity = Normal and Play = Yes If Play = Yes then Humidity = Normal and Windy = False If True then Humidity = Normal and Windy = False and Play = Yes 4/4 4/6 4/7 4/8 4/9 4/12

Rules for the weather data Rules with support > 1 and confidence = 100%: In total: 3 rules with support four, 5 with support three, and 50 with support two Association rule Sup. Conf. 1 Humidity=Normal Windy=False Play=Yes 4 100% 2 Temperature=Cool Humidity=Normal 3 Outlook=Overcast Temperature=Cold Play=Yes ... 58 Outlook=Sunny Temperature=Hot Humidity=High

Weka associations File: weather.nominal.arff MinSupport: 0.2

Weka associations: output

Filtering Association Rules Problem: any large dataset can lead to very large number of association rules, even with reasonable Min Confidence and Support Confidence by itself is not sufficient e.g. if all transactions include Z, then any rule I => Z will have confidence 100%. Other measures to filter rules

Association Rule LIFT The lift of an association rule I => J is defined as: lift = P(J|I) / P(J) Note, P(I) = (support of I) / (no. of transactions) ratio of confidence to expected confidence Interpretation: if lift > 1, then I and J are positively correlated lift < 1, then I are J are negatively correlated. lift = 1, then I and J are independent.

Other issues ARFF format very inefficient for typical market basket data Attributes represent items in a basket and most items are usually missing Interestingness of associations find unusual associations: Milk usually goes with bread, but soy milk does not.

Beyond Binary Data Hierarchies Sequences over time … drink  milk  low-fat milk  Stop&Shop low-fat milk … find associations on any level Sequences over time …

Applications Market basket analysis Finding unusual events … Store layout, client offers Finding unusual events WSARE – What is Strange About Recent Events …

Application Difficulties Wal-Mart knows that customers who buy Barbie dolls have a 60% likelihood of buying one of three types of candy bars. What does Wal-Mart do with information like that? 'I don't have a clue,' says Wal-Mart's chief of merchandising, Lee Scott See - KDnuggets 98:01 for many ideas www.kdnuggets.com/news/98/n01.html Diapers and beer urban legend