The Application of Machine Learning Algorithms for Text Mining based on Sentiment Analysis Approach

Document Type : Research Paper


1 Assistant Prof. of Industrial Engineering, Alzahra University, Tehran, Iran

2 MSc. Student of Industrial Engineering, Alzahra University, Tehran, Iran


Classification of the cyber texts and comments into two categories of positive and negative sentiment among social media users is of high importance in the research are related to text mining. In this research, we applied supervised classification methods to classify Persian texts based on sentiment in cyber space. The result of this research is in a form of a system that can decide whether a comment which is published in cyber space such as social networks is considered positive or negative. The comments that are published in Persian movie and movie review websites from 1392 to 1395 are considered as the data set for this research. A part of these data are considered as training and others are considered as testing data. Prior to implementing the algorithms, pre-processing activities such as tokenizing, removing stop words, and n-germs process were applied on the texts. Naïve Bayes, Neural Networks and support vector machine were used for text classification in this study. Out of sample tests showed that there is no evidence indicating that the accuracy of SVM approach is statistically higher than Naïve Bayes or that the accuracy of Naïve Bayes is not statistically higher than NN approach. However, the researchers can conclude that the accuracy of the classification using SVM approach is statistically higher than the accuracy of NN approach in 5% confidence level.


Main Subjects

اسماعیلی، مهدی (1391). مفاهیم و تکنیکهای دادهکاوی، تهران، نیاز دانش.
نیکنام، فرزاد؛ نیک نفس، علی اکبر (1395). بهبود روش‎های متن‎کاوی در کاربرد پیش‎بینی بازار با استفاده از الگوریتم‎های انتخاب نمونۀ اولیه. فصلنامۀ علمی ـ پژوهشی مدیریت فناوری اطلاعات، 8 (2)، 432- 415.
Aggarwal, C. C., & Zhai, C. (Eds.). (2012). Mining text data. Springer Science & Business Media.
 Bhadane, C., Dalal, H., & Doshi, H. (2015). Sentiment analysis: measuring opinions. Procedia Computer Science45, 808-814.
Esmaili, M. (2012). Concepts and techniques of data mainig.Niaz Danesh Perss, Tehran. (in Persian)
Gao, K., Xu, H., & Wang, J. (2015). A rule-based approach to emotion cause detection for Chinese micro-blogs. Expert Systems with Applications42(9), 4517-4528.
He, W., Zha, S., & Li, L. (2013). Social media competitive analysis and text mining: A case study in the pizza industry. International Journal of Information Management33(3), 464-472.
Irfan, R., King, C. K., Grages, D., Ewen, S., Khan, S. U., Madani, S. A., ... & Tziritas, N. (2015). A survey on text mining in social networks. The Knowledge Engineering Review30(2), 157-170.
Jeyapriya, A., & Selvi, C. K. (2015, February). Extracting aspects and mining opinions in product reviews using supervised learning algorithm. In Electronics and Communication Systems (ICECS), 2015 2nd International Conference on (pp. 548-552). IEEE.
Jotheeswaran, J., & Kumaraswamy, Y. S. (2013). Opinion mining using decision tree based feature selection through manhattan hierarchical cluster measure. Journal of Theoretical & Applied Information Technology58(1), 72-80.
Kennedy, A., & Inkpen, D. (2006). Sentiment classification of movie reviews using contextual valence shifters. Computational intelligence22(2), 110-125.
Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal5(4), 1093-1113.
Moraes, R., Valiati, J. F., & Neto, W. P. G. (2013). Document-level sentiment classification: An empirical comparison between SVM and ANN. Expert Systems with Applications40(2), 621-633.
Mosley Jr, R. C. (2012). Social media analytics: Data mining applied to insurance Twitter posts. In Casualty Actuarial Society E-Forum (Vol. 2, p. 1).
Niknam, F., Niknafas, A.A. (2016). Improving Text Mining Methods in Market Prediction via Prototype Selection Algorithms. Jornal of Information Technology Management, 8(2), 415-434. (in Persain)
Pradhan, V. M., Vala, J., & Balani, P. (2016). A survey on Sentiment Analysis Algorithms for opinion mining. International Journal of Computer Applications, 133(9), 7-11.
Ravichandran, M., & Kulanthaivel, G. (2014). Twitter Sentiment Mining (TSM) framework based learners emotional state classification and visualization for e-learning system. Journal of Theoretical & Applied Information Technology, 69(1), 84-90.
Smeureanu, I., & Bucur, C. (2012). Applying supervised opinion mining techniques on online user reviews. Informatica economica16(2), 81-91.
Vinodhini, G., & Chandrasekaran, R. M. (2012). Sentiment analysis and opinion mining: a survey. International Journal2(6), 282-292.
Xu, K., Liao, S. S., Li, J., & Song, Y. (2011). Mining comparative opinions from customer reviews for Competitive Intelligence. Decision support systems, 50(4), 743-754.