< Previous | Next | Contents >

Market basket analysis is a tool of knowledge discovery about co-occurrence of nominal or categorical items. Market Basket Transaction or Market Basket Analysis is a data mining technique to derive association between two data sets. We have categorical data of transaction records as input to the analysis and the output of the analysis is association rules as a new knowledge directly from data.

Let us start with a simple example. Suppose you have a transaction data from a small fruit store and the numbers of transactions in one day are limited as the data shown below.

Input: Transaction Records
 Transaction ID Items from the customers who bought more than 1 items 1 Apple, Banana, Cherry, Durian 2 Apple, Durian 3 Banana, Durian 4 Durian, Banana, Cherry 5 Banana, Durian 6 Apple, Banana 7 Apple, Cherry, Durian

Based on the data above, we can derive the following output of association rules using Market Basket Analysis:

Output: Association Rules
 People who bought this item Also bought the following items Support Confidence Banana Durian 57% 80% Cherry Durian 43% 100%

The association rule will have the following form

That form has meaning of "people who bought items on set X are often also bought items on set Y". For example, if X = {Apple, Banana} and Y = {Cherry, Durian} and we get the association rule indicates that people who bought Apple and Banana also bought Cherry and Durian.

Support and confidence are two measures of association rules.

Support is the frequency of transactions to have the all the items on both sets X and Y are bought together. For example, a support of 5% shows that 5% of all transactions (that we consider for the analysis) indicate that items on set X and Y are purchased together. In formula, support can be computed as probability of the union of set X and set Y.

Notation of support count indicates the total frequency of the set union and is the total number of transactions for the analysis. A rule that has very low support may occurs simply by chance. We can also view Support as the number of instances that the association rules will predict correctly.

Confidence of 80% shows that 80% of the customers who bought items on set X also bought items on set Y. In formula, confidence is computed as conditional probability to obtain set Y given set X. The conditional probability also can be computed through proportion of supports.

Notation is the total frequency of set X. Confidence is a measure of accuracy or reliability about the inference made by the rule that the number of instances that the association rules will predict correctly among all instances it applies to.

To obtain the association rules, we usually apply two criteria:

1. Minimum support
2. Minimum confidence

In the next section, I will explain how the association rules above and the two measurements of support and confidence can be computed using MS Excel.

< Previous | Next | Contents >