Market Basket Analysis Support and Confidence
To compute support and confidence, we first set our transaction data into binary data.
For simplicity, we call the items by its first letter (A for Apple, B for Banana, C for Cherry and D for Durian). Let us give name BinRecord to the binarized transaction record table. Each column in BinRecord are named as A_, B_, C_ and D_. We also set an array of 1 on the left of BinRecord to help computing the support count, and name it as BinOne.
The support count denoted by can be computed using the following MS Excel functions
(sum_range,criteria_range1,criteria1,criteria_range2,criteria2...) where we can adds the cells in a range that meet multiple criteria (this function only available from MS Excel2007). Since we have three possible items on independent variable and three possible items on dependent variable, we have total six criteria. Each one of the six criteria is a nested
function to return column A_ if we found letter "A" in the list of possible item, return column B_ if we found letter "B" in the list of possible item and so on. The following picture show the first cell of support count. Once we type this cell, we can copy and paste to the other 49 cells below.
Number of transaction, N is always 7 in our demonstration example because we have only 7 data. Support (in percent) for each association rule is simply a ratio between support count and the number of transaction.
Support for the independent variable, , is even easier to compute than the support count for the union because it is only depend on the sum of the binarized transaction record. We use MS excel function HLOOKUP that will searches for a value in the top row of BinRecord table, and then returns a value in the same column from last row in the table using exact matched criteria. The format of the function is
HLOOKUP ( lookup_value,table_array,row_index_num,range_lookup )
Using HLOOKUP alone, however, will produce error when the lookup value is empty. Therefore we add function IFERROR (this function only available from MS Excel2007) to put the value to zero if it found such error. Since we have three columns for the independent variable, we add the function for the three columns. Thus, this spreadsheet only works for four numbers of items.
Confidence is computed easily by taking ratio of support counts of the union of independent and dependent variables to the support count of independent variable.