HYPER HEURISTIC EVOLUTIONARY APPROACH FOR CONSTRUCTING DECISION TREE CLASSIFIERS
Decision tree models have earned a special status in predictive modeling since these are considered comprehensible for human analysis and insight. Classification and Regression Tree (CART) algorithm is one of the renowned decision tree induction algorithms to address the classification as well as regression problems. Finding optimal values for the hyper parameters of a decision tree construction algorithm is a challenging issue. While making an effective decision tree classifier with high accuracy and comprehensibility, we need to address the question of setting optimal values for its hyper parameters like the maximum size of the tree, the minimum number of instances required in a node for inducing a split, node splitting criterion and the amount of pruning. The hyper parameter setting influences the performance of the decision tree model. As researchers, we know that no single setting of hyper parameters works equally well for different datasets. A particular setting that gives an optimal decision tree for one dataset may produce a sub-optimal decision tree model for another dataset. In this paper, we present a hyper heuristic approach for tuning the hyper parameters of Recursive and Partition Trees (rpart), which is a typical implementation of CART in statistical and data analytics package R. We employ an evolutionary algorithm as hyper heuristic for tuning the hyper parameters of the decision tree classifier. The approach is named as Hyper heuristic Evolutionary Approach with Recursive and Partition Trees (HEARpart). The proposed approach is validated on 30 datasets. It is statistically proved that HEARpart performs significantly better than WEKA’s J48 algorithm in terms of error rate, F-measure, and tree size. Further, the suggested hyper heuristic algorithm constructs significantly comprehensible models as compared to WEKA’s J48, CART and other similar decision tree construction strategies. The results show that the accuracy achieved by the hyper heuristic approach is slightly less as compared to the other comparative approaches.