Sentiment Analysis of Tweets Using Supervised Machine Learning Techniques Based on Term Frequency

Document Type : Special Issue on Pragmatic Approaches of Software Engineering for Big Data Analytics, Applications and Development

Authors

1 Assistant Professor, JSS Academy of Technical Education, Noida.

2 Professor, JSS Academy of Technical Education, Noida.

3 JSS Academy of Technical Education, Noida.

4 JSS Academy of Technical Education, Noida

Abstract

World of technology provides everyone with a great outlet to give their opinion, using social media like Twitter and other platforms. This paper employs machine learning methods for text analysis to obtain sentiments of reviews by the people on twitter. Sentiment analysis of the text uses Natural language processing, a machine learning technique to tell the orientation of opinion of a piece of text. This system extracts attributes from the piece of writing such as a) The polarity of text, whether the speaker is criticizing or appreciating, b) The topic of discussion, subject of the text. A comparison of the work done so far on sentiment analysis on tweets has been shown. A detailed discussion on feature extraction and feature representation is provided. Comparison of six classifiers: Naïve Bayes, Decision Tree, Logistic Regression, Support Vector Machine, XGBoost and Random Forest, based on their accuracy depending upon type of feature, is shown. Moreover, this paper also provides sentiment analysis of political views and public opinion on lockdown in India. Tweets with ‘#lockdown’ are analysed for their sentiment categorically and a schematic analysis is shown.

Keywords


Aggarwal, D. G. (2018). Sentiment Analysis: An insight into Techniques, Application and Challenges. International Journal of Computer Sciences and Engineering, 2018, 6(5), 697-703.
Aggarwal, D., Mittal, S., & Bali, V. (2019). An Insight into Machine Learning Techniques for Predictive Analysis and Feature Selection. International Journal of Innovative Technology and Exploring Engineering, 8(9S), 342-349.
Aggarwal, D., Mittal, S., & Bali, V. (2019). Prediction Model for Classifying Students Based on Performance using Machine Learning Techniques. International Journal of Recent Technology and Engineering, 8(2S7), 497-503.
Ahuja, R., Chug, A., Kohli, S., Gupta, S., & Ahuja, P. (2019). The Impact of Features Extraction on the Sentiment Analysis. Procedia Computer Science, 152, 341-348.
Alsaeedi, A. & Khan, Z. (2019). A Study on Sentiment Analysis Techniques of Twitter Data. International Journal of Advanced Computer Science and Applications (IJACSA), 10(2). 361-374.
Doaa, M., & Hussein, E. (2018). A survey on sentiment analysis challenges. Journal of King Saud University - Engineering Sciences, 30(4), 330-338.
Fan, X., Li, X., Du, F., Li, X. & Wei, M. (2016). Apply word vectors for sentiment analysis of APP reviews. 3rd International Conference on Systems and Informatics (ICSAI), Shanghai, 1062-1066.
Gangwar, S., Bali, V. & Kumar, A. (2019). Comparative Analysis of Wind Speed Forecasting Using LSTM and SVM. EAI Endorsed Transactions on Scalable Information Systems, 1-8.
Gaye, B., & Wulamu, A. (2019). Sentimental Analysis for Online Reviews using Machine learning Algorithms. International Research Journal of Engineering and Technology, 6(8), 1270-1275.
Giachanou, A. & Crestani, F. (2016). Like It or Not: A Survey of Twitter Sentiment Analysis Methods. ACM Computer Survey, 49(2), 1-41.
Gopi, A.P., Jyothi, R.N.S., Narayana, V.L. (2020). Classification of tweets data based on polarity using improved RBF kernel of SVM. International Journal of Information Technology, https://doi.org/10.1007/s41870-019-00409-4
Hasan, A, Moin, S, Karim, A, & Shamshirband, S. (2018). Machine Learning-Based Sentiment Analysis for Twitter Accounts. Mathematical and Computational Applications, 23(1), 11. https://doi.org/10.3390/mca23010011
Ikoro, V., Sharmina, M., Malik, K. & Batista-Navarro, R. (2018). Analyzing Sentiments Expressed on Twitter by UK Energy Company Consumers. Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), 95-98.
Kumar, A., Jaiswal, A. (2020). Systematic literature review of sentiment analysis on Twitter using soft computing techniques. Concurrency Computat Pract, 32: e5107.
Mäntylä, M.V., Graziotin D., & Kuutila, M. (2018). The evolution of sentiment analysis—A review of research topics, venues, and top cited papers. Computer Science Review, 27, 16-32.
Mittal, A., & Patidar, S. (2019). Sentiment Analysis on Twitter Data: A Survey. 7th International Confrence on Computer and Communications Management, Association for Computing Machinery, New York, 91-95.
Nagar, R., Aggarwal, D., Saxena, U. R. & Bali, V. (2020). Early Prediction and Diagnosis for Cancer Based on Clinical and Non-Clinical Parameters: A Review. International Journal of Grid and Distributed Computing, 13(1), 548-557.
Nokel, M.A., & Loukachevitch, N.V. (2015). Topic models: adding bigrams and taking account of the similarity between unigrams and bigrams. Numerical methods and programming, 16(2), 215-234.
Proisl, T. & Uhrig, P. (2016). SoMaJo: State-of-the-art tokenization for German web and social media texts. Proceedings of the 10th Web as Corpus Workshop, Association for Computational Linguistics, Berlin, 57-62.
Ramanathan, V. and Meyyappan, T. (2019). Twitter Text Mining for Sentiment Analysis on People’s Feedback about Oman Tourism, 4th MEC International Conference on Big Data and Smart City (ICBDSC), Muscat, Oman, 1-5.
Reddy, A.,Vasundhara, D.N., & Subhash, P. (2019). Sentiment Research on Twitter Data. International Journal of Recent Technology and Engineering, 8(2S11), 1068-1070.
Ren, Q., Cheng, H., & Han, H. (2017). Research on machine learning framework based on random forest algorithm. AIP Conference Proceedings, 1-7.
Saif, H., He, Y., Fernandez, M., & Alani, H. (2016). Contextual semantics for sentiment analysis of Twitter. Inf. Process. Manage. 52(1), 5–19.
Salinca, A. (2015). Business Reviews Classification Using Sentiment Analysis. 17th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), Timisoara, 247-250.
Uijlings, J.R.R., Smeulders, A. W. M., & Scha, R. J. H. (2009). Real-time bag of words, approximately. ACM International Conference on Image and Video Retrieval (CIVR ’09), 6, 1–8.
Vijayarani, S. & Janani, R. (2016). Text Mining: open-Source Tokenization Tools – An Analysis. Advanced Computational Intelligence: An International Journal (ACII), 3(1), 37-47.
Zhang, Y., Liu, N., & Wang S. (2018). A differential privacy protecting K-means clustering algorithm based on contour coefficients. PLoS One. 13(11): e0206832