A Novel Scheme for Improving Accuracy of KNN Classification Algorithm Based on the New Weighting Technique and Stepwise Feature Selection

Document Type : Research Paper

Authors

1 MSc, Department of Computer, Gorgan Branch, Islamic Azad University, Gorgan, Iran.

2 Assistant Prof., Department of Computer, Gorgan Branch, Islamic Azad University, Gorgan, Iran.

Abstract

K nearest neighbor algorithm is one of the most frequently used techniques in data mining for its integrity and performance. Though the KNN algorithm is highly effective in many cases, it has some essential deficiencies, which affects the classification accuracy of the algorithm. First, the effectiveness of the algorithm is affected by redundant and irrelevant features. Furthermore, this algorithm does not consider the differences between samples, which led the algorithm to have inaccurate predictions. In this paper, we proposed a novel scheme for improving the accuracy of the KNN classification algorithm based on the new weighting technique and stepwise feature selection. First, we used a stepwise feature selection method to eliminate irrelevant features and select highly correlated features with the class category. Then a new weighting method was proposed to give authority value to each sample in train dataset based on neighbor categories and Euclidean distances. This weighting approach gives a higher preference to samples that have neighbors with close Euclidean distance while they are in the same category, which can effectively increase the classification accuracy of the algorithm. We evaluated the accuracy rate of the proposed method and analyzed it with the traditional KNN algorithm and some similar works with the use of five real-world UCI datasets. The experiment results determined that the proposed scheme (denoted by WAD-KNN) performed better than the traditional KNN algorithm and considered approaches with the improvement of approximately 10% accuracy.

Keywords


Abbasi, F., Khadivar, A., &Yazdinejad, M. (2019). A Grouping Hotel Recommender System Based on Deep Learning and Sentiment Analysis. Journal of Information Technology Management, 11(2), 59-78.
Alpaydin, E. (1997). Voting over multiple condensed nearest neighbors. In Lazy learning (pp. 115-132). Springer, Dordrecht.
Angiulli, F. (2005, August). Fast condensed nearest neighbor rule. In Proceedings of the 22nd international conference on Machine learning (pp. 25-32). ACM.
Bagui, S. C., Bagui, S., Pal, K., & Pal, N. R. (2003). Breast cancer detection using rank nearest neighbor classification rules. Pattern recognition, 36(1), 25-34.
Bailey, T. (1978). A note on distance-weighted k-nearest neighbor rules. Trans. on Systems, Man, Cybernetics, 8, 311-313.
Biswas, N., Chakraborty, S., Mullick, S. S., & Das, S. (2018). A parameter independent fuzzy weighted k-nearest neighbor classifier. Pattern Recognition Letters, 101, 80-87.
Cheng, Y., Chen, K., Sun, H., Zhang, Y., & Tao, F. (2018). Data and knowledge mining with big data towards smart production. Journal of Industrial Information Integration, 9, 1-13.
Dua, D., & Graff, C. (2017). UCI machine learning repository (2017). URL http://archive. ics. uci. edu/ml.
Gates, G. (1972). The reduced nearest neighbor rule (Corresp.). IEEE transactions on information theory, 18(3), 431-433.
Gowda, K., & Krishna, G. (1979). The condensed nearest neighbor rule using the concept of mutual nearest neighborhood (Corresp.). IEEE Transactions on Information Theory, 25(4), 488-490.
Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2003, November). KNN model-based approach in classification. In OTM Confederated International Conferences" On the Move to Meaningful Internet Systems" (pp. 986-996). Springer, Berlin, Heidelberg.
Kafaf, D. A., Kim, D. K., & Lu, L. (2017). B-knn to improve the efficiency of kNN. In Proceedings of the 6th international conference on data science, technology and applications. Science and Technology Publications (pp. 126-132).
Kotenko, I., Saenko, I., &Branitskiy, A. (2018). Framework for Mobile Internet of Things Security Monitoring Based on Big Data Processing and Machine Learning. IEEE Access, 6, 72714-72723.
Kumar, M., Rath, N. K., &Rath, S. K. (2016). Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier. Journal of biomedical informatics, 60, 395-409.
Lin, W. C., Ke, S. W., & Tsai, C. F. (2015). CANN: An intrusion detection system based on combining cluster centers and nearest neighbors. Knowledge-based systems, 78, 13-21.
Pan, Z., Wang, Y., & Ku, W. (2017). A new general nearest neighbor classification based on the mutual neighborhood information. Knowledge-Based Systems, 121, 142-152.
Parvin, H., Alizadeh, H., &Minati, B. (2010). A modification on k-nearest neighbor classifier. Global Journal of Computer Science and Technology.
Reshi, J. A., & Singh, S. (2019). Investigating the Role of Code Smells in Preventive Maintenance. Journal of Information Technology Management, 10(4), 41-63.
Serpen, G., &Aghaei, E. (2018). Host-based misuse intrusion detection using PCA feature extraction and kNN classification algorithms. Intelligent Data Analysis, 22(5), 1101-1114.
Sun, C., Yao, C., Shen, L., & Yu, X. (2016). Improving classification accuracy using missing data filling algorithms for the criminal dataset. International Journal of Hybrid Information Technology, 9(4), 367-374.
Wu, X., Yang, J., & Wang, S. (2018). Tea category identification based on optimal wavelet entropy and weighted k-Nearest Neighbors algorithm. Multimedia Tools and Applications, 77(3), 3745-3759
Xueli, W., Zhiyong, J., &Dahai, Y. (2015, September). An Improved KNN Algorithm Based on Kernel Methods and Attribute Reduction. In 2015 Fifth International Conference on Instrumentation and Measurement, Computer, Communication and Control (IMCCC) (pp. 567-570). IEEE.
Zeng, Y., Yang, Y., & Zhao, L. (2009). Pseudo nearest neighbor rule for pattern classification. Expert Systems with Applications, 36(2), 3587-3595.
Zhao, M., & Chen, J. (2016). Improvement and comparison of weighted k nearest neighbors classifiers for model selection. Journal of Software Engineering, 10(1), 109-118.