Performance analysis of rule based automatic SNN algorithm on big data sets [Kural tabanli otomatik SNN algoritmasinin büyük veri setleri üzerindeki performans incelemesi]
Özet
Clustering is defined as the classification of patterns into groups (clusters) without supervision. The clustering of similarities of data is a complex process that can not be done with human hands. There are various clustering algorithms based on different principles in the literature. The SNN (Shared Nearest Neighborhood) algorithm is a density-based clustering algorithm that identifies similarities between the data by looking at the shared nearest neighbors by two data. The SNN algorithm uses parameters specifying the radius (Eps) that a user enters when clustering, a radius that limits a neighborhood of a point, and the minimum number of points (minPorts) that must be in an eps-neighborhood. This leads to clustering performans has dependency of user experience. A rule-based automatic SNN algorithm has been proposed to remove this dependency from the user. In this study, the performance of the rule-based automatic SNN algorithm over the data sets with 2000 and over sample numbers is examined and presented. © 2018 IEEE.