A novel feature and class-based globalization technique for text classification
Özet
Text classification is a very important topic in the current era due to the high volume of textual data and handling. Feature selection is one of the most important steps in text classification studies, as well as significantly affecting classification performance. In the literature, filter-based global feature selection methods are widely proposed. While these methods are globalized, although they are generally performed by looking at the class information, feature information is ignored beside the class information. When calculating the score of each feature, the information of the feature should be taken into account along with the class information. To solve this problem, a new globalization technique called Feature and Class-based Weighted Sum (FCWS) which takes into account both feature and class information is proposed. FCWS method is compared with traditional globalization techniques on four datasets named as Reuters-21,578, 20Newsgroup, Enron1 and Polarity in addition to Support Vector Machines (SVM), Decision Tree (DT) and Multinomial Naive Bayes (MNB) classifiers. Also, it was employed 50, 100, 300, 500, 1000 and 3000 as dimension. Experimental studies on benchmark datasets show that the efficiency of the proposed method is higher performance than the other three methods named as maximum (MAX), sum (SUM), and weighted-sum (AVG), in most cases according to Micro-F1 and Macro-F1 scores.