## Using Decision Tree

Decision tree can be used to predict a pattern or to classify the class of a data. Suppose we have new unseen records of a person from the same location where the data sample was taken. The following data are called test data (in contrast to training data ) because we would like to examine the classes of these data.

 Person name Gender Car ownership Travel Cost (\$)/km Income Level Transportation Mode Alex Male 1 Standard High ? Buddy Male 0 Cheap Medium ? Cherry Female 1 Cheap High ?

The question is what transportation mode would Alex, Buddy and Cheery use? Using the decision tree that we have generated in the previous section, we will use deductive approach to classify whether a person will use car, train or bus as his or her mode along a major route in that city, based on the given attributes.

We can start from the root node which contains an attribute of Travel cost per km. If the travel cost per km is expensive, the person uses car. If the travel cost per km is standard price, the person uses train. If the travel cost is cheap, the decision tree needs to ask next question about the gender of the person. If the person is a male, then he uses bus. If the gender is female, the decision tree needs to ask again on how many cars she own in her household. If she has no car, she uses bus, otherwise she uses train.

The rules generated from the decision tree above are mutually exclusive and exhaustive for each class label on the leaf node of the tree:

Rule 1 : If Travel cost/km is expensive then mode = car

Rule 2 : If Travel cost/km is standard then mode = train

Rule 3 : If Travel cost/km is cheap and gender is male then mode = bus

Rule 4 : If Travel cost/km is cheap and gender is female and she owns no car then mode = bus

Rule 5 : If Travel cost/km is cheap and gender is female and she owns 1 car then mode = train

Based on the rules or decision tree above, the classification is very straightforward. Alex is willing to pay standard travel cost per km, thus regardless his other attributes, his transportation mode must be train. Buddy is only willing to pay cheap travel cost per km, and his gender is male, thus his selection of transportation mode should be bus. Cherry is also willing to pay cheap travel cost per km, and her gender is female and actually she owns a car, thus her transportation mode choice to work is train (probably she uses car only during weekend to shop). Variable Income level never be utilized to classify the transportation mode in this case.

 Person name Travel Cost (\$)/km Gender Car ownership Transportation Mode Alex Standard Male 1 Train Buddy Cheap Male 0 Bus Cherry Cheap Female 1 Train

Though decision tree is very powerful method, at this point, I shall give several notes to the readers in decision tree utilization. Firs, it must be noted, however, that with limited number of training data (only 10) that induce the decision tree, we cannot generalize the rules of the decision tree above to be applicable for other cases in your city. The decision tree above is only true for the cases on the given data, which is only for the particular major route in that city where the data was gathered.

The sequence of rules generated by the decision tree is based on priority of the attributes. For example, there is no rule for people who own more than 1 car because based on the data it is already covered by attribute travel cost/km. For those who own 2 cars the travel cost/km are always expensive, thus the mode is car.

Due to the limitation of decision algorithm (most algorithms of decision tree employ greedy strategy with no backtracking thus it is not exhaustive search), these sequences of priority in general is not optimum. We cannot say that the rules generated by decision tree are the best rules.

In the next section , you will learn more detail on how to generate a decision tree.

Preferable reference for this tutorial is

Teknomo, Kardi. (2009) Tutorial on Decision Tree. http://people.revoledu.com/kardi/tutorial/DecisionTree/