Decision Tree

Decision tree is a multi-class classification tool allowing a data point to be classified into one of many (two or more) classes available. A decision tree divides the sample space into a rectilinear region. This will be more clear with an example. Let us say we have this auto-insurance claim related data as shown in the following table. We want to predict what type of customer profile may more likely lead to claim payout. The decision tree model may first divide the sample space based on age. So, now we have two regions divided based on the age. Next, one of those regions will further sub-divided based Marital_status, and then that newly divided sub-regision may further get divide based on Num_of_vehicle_owned.

A decision tree is made up of a root node followed by intermediate node and leaf node. Each leaf node represents one of the class into which data points have been classified to. An intermediate node represents the decision rule based on which parent node data points have been divided into children node. This decision rule is based on the measure of impurity, either using Gini index or entropy, for each predictor variable and choosing that predictor which result in more purer children node than the parent node. The marital status is.

Claim	Age	Num_of_vehicle_owned	Marital_status
0	25	1	Single
0	30	2	Married
1	25	1	Single

The advantage of Decision tree are:

Multi-class Classifier
Heterogeneous attributes (categorical, continuous)
No Data normalization needed
Fast Training speed
Model interpretability
Fast Testing speed

Cons:

Weak classifiers
Large training data set needed
Overfitting may happen unless pruned

R-code to build decision tree:

model <- rpart("claim>0 ~ Age + Num_of_vehicle_owned + Marital_status",
data=train.data,
method="class",
control=rpart.control(cp=0.001,
minsplit=100,
minbusket=100,
maxdepth=5))

Machine Learning

Search This Blog

Decision Tree

Comments

Post a Comment

Popular posts from this blog

Recommender System using Collaborative filtering

Sentimental Analysis Using Scikit-Learn and Neural Network