The Effects of Preprocessing on Turkish and English News Data
Özet
In a standard text classification (TC) study, preprocessing is one of the key components to improve performance. This study aims to look at how preprocessing effects TC according to news text, text language, and feature selection. All potential combinations of commonly used preprocessing techniques were compared on one domain, namely news data, and two different news datasets for this aim. Preprocessing technique contributions to classification performance at multiple feature sizes, possible interconnections among these techniques, and technique dependency on corresponding languages were all evaluated in this way. The effect of two important preprocessing techniques on two different common news datasets was examined. While the highest performance for the Turkish dataset is a 0.781 F1 score, the highest performance for the English dataset is a 0.980 F1 score.
Cilt
6Sayı
1Bağlantı
https://doi.org/10.35377/saucis...1207742https://search.trdizin.gov.tr/yayin/detay/1167854
https://hdl.handle.net/20.500.12450/3314