Analyzing Hybrid C4.5 Algorithm for Sentiment Extraction over Lexical and Semantic Interpretation

Document Type : Research Paper


1 Research Scholar, Department of Computer Science and Engineering, Hemvati Nandan Bahuguna Garhwal University (A Central University), Srinagar Garhwal, Uttarakhand, India.

2 Professor, Head, Department of Computer Science and Engineering, Hemvati Nandan Bahuguna Garhwal University (A Central University), Srinagar Garhwal, Uttarakhand, India.

3 School of Computer and Systems Sciences, JNU New Delhi, India.

4 Department of Computer Science and Engineering, Gaya College of Engineering, Gaya.



Internet-based social channels have turned into an important information repository for many people to get an idea about current trends and events happening around the world. As a result of Abundance of raw information on these social media platforms, it has become a crucial platform for businesses and individuals to make decisions based on social media analytics. The ever-expanding volume of online data available on the global network necessitates the use of specialized techniques and methods to effectively analyse and utilize this vast amount of information. This study's objective is to comprehend the textual information at the Lexical and Semantic level and to extract sentiments from this information in the most accurate way possible. To achieve this, the paper proposes to cluster semantically related words by evaluating their lexical similarity with respect to feature and sequence vectors. The proposed method utilizes Natural Language Processing, semantic and lexical clustering and hybrid C4.5 algorithm to extract six subcategories of emotions over three classes of sentiments based on word-based analysis of text. The proposed approach has yielded superior results with seven existing approaches in terms of parametric values, with an accuracy of 0.96, precision of 0.92, sensitivity of 0.94, and an f1-score of 0.92.


Babu, Y. P., & Eswari, R. (2020). CIA_NITT at WNUT-0p-2020 task 2: classification of COVID-19 tweets using pre-trained language models. arXiv preprint arXiv:2009.05782.
Bollegala, D., Matsuo, Y., & Ishizuka, M. (2007). Measuring semantic similarity between words using web search engines. Proceedings of 16th World Wide Web Conference (WWW16), 757–766.
De Las Heras-Pedrosa, C., Sánchez-Núñez, P., & Peláez, J. I. (2020). Sentiment analysis and emotion understanding during the COVID-19 pandemic in Spain and its impact on digital ecosystems. International Journal of Environmental Research and Public Health, 17(15), 5542.
Elghazaly, T., Mahmoud, A., & Hefny, H. A. (2016). Political sentiment analysis using twitter data. In Proceedings of the International Conference on Internet of things and Cloud Computing (pp. 1-5).
Garcia, K., & Berton, L. (2021). Topic detection and sentiment analysis in Twitter content related to COVID-19 from Brazil and the USA. Applied Soft Computing, 101, 107057.
Gencoglu, O. (2020). Large-scale, language-agnostic discourse classification of tweets during COVID-19. Machine Learning and Knowledge Extraction, 2(4), 603-616.
Hearst Marti, A. (1992). Automatic Acquisition of Hyponyms from Large Text Corpora. In COLING 1992 Volume 2: The 14th International Conference on Computational Linguistics.
Jalil, Z., et al. (2021). Covid-19 related sentiment analysis using state-of-the-art machine learning and deep learning techniques. Public Health Frontiers, 9.
Jalil, Z., Javed, A., Rehman, M. A., et al. (2022). COVID-19 Related Sentiment Analysis Using State-of-the-Art Machine Learning and Deep Learning Techniques. Frontiers in Public Health.
Jelodar, H., Wang, Y., Orji, R., & Huang, S. (2020). Deep sentiment classification and topic discovery on novel coronavirus or COVID-19 online discussions: NLP using LSTM recurrent neural network approach. IEEE Journal of Biomedical and Health Informatics, 24(10), 2733-2742.
Khan, M., Nabiul, A. K., & Dhruba, A. (2021). Deep learning-based sentiment analysis of COVID-19 vaccination responses from Twitter data. Computational and Mathematical Methods in Medicine.
Kukkar, A., Mohana, R., Sharma, A., Nayyar, A., & Shah, M. (2023). Improving sentiment analysis in social media by handling lengthened words. IEEE Access, 11, 9775-9788.
Kumar, R., & Sharma, S. C. (2023). Hybrid optimization and ontology-based semantic model for efficient text-based information retrieval. Journal of Supercomputing, 79(3), 2251-2280.
Nagamanjula, R., & Pethalakshmi, A. (2020). A novel framework based on bi-objective optimization and LAN2FIS for Twitter sentiment analysis. Social Network Analysis and Mining, 10(1). doi:10.1007/s13278-020-00648-5.
Naithani, K., & Raiwani, Y. P. (2022). Novel ABC: Aspect Based Classification of Sentiments Using Text Mining for COVID-19 Comments. In Machine Learning, Image Processing, Network Security and Data Sciences (pp. 199-208). Springer
Naseem, U., Razzak, I., Khushi, M., Eklund, P. W., & Kim, J. (2021). COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis. IEEE Transactions on Computational Social Systems, 8(4), 1003-1015.
Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. arXiv preprint cs/0205070.
Qorich, M., & El Ouazzani, R. (2023). Text Sentiment Classification of Amazon Reviews Using Word Embeddings and Convolutional Neural Networks. Journal of Supercomputing.
Rana, S., & Singh, A. (2016). Comparative analysis of sentiment orientation using SVM and Naïve Bayes techniques. In 2nd International Conference on Next Generation Computing Technologies (NGCT), Dehradun (pp. 106-111). doi: 10.1109/NGCT.2016.7877399.
Rustam, F., Khalid, M., Aslam, W., Rupapara, V., Mehmood, A., & Choi, G. S. (2021). A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis. Plos one, 16(2), e0245909.
Safder, Z., Mahmood, R., Sarwar, S., Hassan, S., et al. (2021). Sentiment analysis for Urdu online reviews using deep learning models. Expert Systems, e12751.
Scott, C., & Dominic, W. (2003). Using LSA and Noun Coordination Information to Improve the Recall and Precision of Automatic Hyponymy Extraction. In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 (pp. 111-118).
Shahi, T., Sitaula, C., & Paudel, N. (2022). A Hybrid Feature Extraction Method for Nepali COVID-19-Related Tweets Classification. Computational Intelligence and Neuroscience.
Sharon Caraballo, A. (1999). Automatic construction of a hypernym-labeled noun hierarchy from text. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (pp. 120-126). Association for Computational Linguistics.
Snow, R., Jurafsky, D., & Ng, A. Y. (2006). Semantic taxonomy induction from heterogeneous evidence. Proceedings of the ACL-COLING, 801-808.
Vashisht, G., & Jaillia, M. (2021). Enhanced lexicon E-SLIDE framework for efficient sentiment analysis. International Journal of Information Technology, 13, 2169-2174. doi:10.100
Verma, M. (2017). Lexical Analysis of Religious Texts Using Text Mining and Machine Learning Tools. International Journal of Computer Applications, 168(8).
Wankhade, M., Annavarapu, C. S. R., & Abraham, A. (2023). MAPA BiLSTM-BERT: Multi-Aspects Position Aware Attention for Aspect Level Sentiment Analysis. Journal of Supercomputing.
Zhang, M., Liu, L., Mi, J., Li, Q., & Zhang, L. (2023). Enhanced dual-level dependency parsing for aspect-based sentiment analysis. Journal of Supercomputing, 79(1), 6290-6308.
Zhang, Y., Lyu, H., Liu, Y., Zhang, X., Wang, Y., & Luo, J. (2020). Monitoring depression trend on Twitter during the COVID-19 pandemic. arXiv preprint arXiv:2007.00228. [Original source: