The Effects of Preprocessing on Turkish and English News Data
xmlui.dri2xhtml.METS-1.0.item-rights
info:eu-repo/semantics/openAccessDate
2023Metadata
Show full item recordAbstract
In a standard text classification (TC) study, preprocessing is one of the key components to improve performance. This study aims to look at how preprocessing effects TC according to news text, text language, and feature selection. All potential combinations of commonly used preprocessing techniques were compared on one domain, namely news data, and two different news datasets for this aim. Preprocessing technique contributions to classification performance at multiple feature sizes, possible interconnections among these techniques, and technique dependency on corresponding languages were all evaluated in this way. The effect of two important preprocessing techniques on two different common news datasets was examined. While the highest performance for the Turkish dataset is a 0.781 F1 score, the highest performance for the English dataset is a 0.980 F1 score.
Volume
6Issue
1URI
https://doi.org/10.35377/saucis...1207742https://search.trdizin.gov.tr/yayin/detay/1167854
https://hdl.handle.net/20.500.12450/3314