Ensemble feature selection for single-label text classification: a comprehensive analytical study
Özet
Due to the large amount of textual data, text classification is a crucial problem in the modern era. In text classification studies, feature selection is one of the most crucial processes because it has a big impact on classification accuracy. Many feature selection techniques are suggested in the field of text classification in the literature. Each method sorts the features by assigning a score according to its algorithm. Then, the classification process is performed by selecting top-N features. However, the feature order for each method is different from each other. Each method selects by assigning a high score to the features that are important according to its algorithm, while it does not select by assigning a low score to the insignificant features. However, each method selects different distinguishing features according to its algorithm. With combinations of these distinguishing features, a higher performance classification process can be achieved. So, the classification process is to combine the features in a different order according to each method in this study. Thus, it will be observed which methods are successful or unsuccessful when combined. In addition, it was observed that the methods chose how many different features from each other. Accordingly, the classification is made by combining the features of different sizes and combining two local and two global feature selection methods. Numerous studies using three benchmark datasets have shown that the combination of feature selection approaches performs better than any single feature selection method used alone. However, some combinations have lower performance rates than individual methods. Thus, a comprehensive study was carried out in text classification domain.