Mushakkal: Detecting Arabic Clickbait Using CNN with Various Optimizers

Document Type : Research Paper

Authors

1 Department of Information Technology, College of Computer, Qassim University, Buraydah, Saudi Arabia.

2 Department of Information Technology and Communication, Security Forces Hospital, Dammam, Saudi Arabia.

3 Department of Information Technology, College of Computer, Qassim University, Saudi Arabia. Department of Computers and Control Engineering, Faculty of Engineering, Tanta University, Egypt.

10.22059/jitm.2024.99051

Abstract

The term "clickbait" refers to content specifically designed to capture readers' attention, often through misleading headlines, leading to frustration among social media users. In this study, titled "Mushakkal," which translates to "variety" in Arabic, we utilized a Convolutional Neural Network (CNN)—a deep learning approach—to detect clickbait within an Arabic dataset. We compared three optimizers: RMSprop, Adam, and Adadelta, evaluating various parameter settings to determine the most effective combination for detecting clickbait in Arabic content. Our findings revealed that the CNN model performed best when both pre-processing and Word2Vec techniques were applied. The Adam optimizer outperformed the others, achieving a Macro-F1 score of 77%. The RMSprop optimizer closely followed, attaining a Macro-F1 score of 76%. In contrast, Adadelta proved to be the least effective for classifying Arabic text.

Keywords


Ahmad, I., Alqarni, M. A., Almazroi, A. A., & Tariq, A. (2020). Experimental evaluation of clickbait detection using machine learning models. Intelligent Automation & Soft Computing, 26(4), 1335–1344.
Ahmed, S. F., Alam, M. S. B., Hassan, M., Rozbu, M. R., Ishtiak, T., Rafa, N., Mofijur, M., Ali, A. B. M. S., & Gandomi, A. H. (2023). Deep learning modelling techniques: Current progress, applications, advantages, and challenges. Artificial Intelligence Review, 13521–13617(11). https://doi.org/10.1007/s10462-023-10466-8
Albayati, A. Q., Altaie, S. A. J., Al-Obaydy, W. N. I., & Alkhalid, F. F. (2024). Performance analysis of optimization algorithms for convolutional neural network-based handwritten digit recognition. IAES International Journal of Artificial Intelligence, 13(1). https://doi.org/10.11591/ijai.v13.i1.pp563-571
Allwright, S. (2023). Metrics for imbalanced data (simply explained). In Stephen Allwright. https://stephenallwright.com/imbalanced-data-metric/
Alsaleh, D., & Larabi-Marie-Sainte, S. (2021). Arabic text classification using convolutional neural network and genetic algorithms. IEEE Access, 9, 91670–91685. https://doi.org/10.1109/ACCESS.2021.3091376
Al-Sarem, M., Saeed, F., Al-Mekhlafi, Z. G., Mohammed, B. A., Hadwan, M., Al-Hadhrami, T., Alshammari, M. T., Alreshidi, A., & Alshammari, T. S. (2021). An improved multiple features and machine learning-based approach for detecting clickbait news on social networks. Applied Sciences, 11(20), 9487. https://doi.org/10.3390/app11209487
Bakrianoo. (n.d.). GitHub - bakrianoo/aravec: AraVec is a pre-trained distributed word representation (word embedding) open source project which aims to provide the Arabic NLP research community with free to use and powerful word embedding models. In GitHub. https://github.com/bakrianoo/aravec
Bsoul, M. A., Qusef, A., & Abu-Soud, S. (2022). Building an optimal dataset for Arabic fake news detection. Procedia Computer Science, 201, 665–672. https://doi.org/10.1016/j.procs.2022.03.088
Chawda, S., Patil, A., Singh, A., & Save, A. (2019). A novel approach for clickbait detection. In 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI) (pp. 1318–1321). https://doi.org/10.1109/ICOEI.2019.8862781
Chen, Y., Conroy, N. J., & Rubin, V. L. (2015). Misleading online content: Recognizing clickbait as “false news.” In Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection (pp. 15–19). https://doi.org/10.1145/2823465.2823467
Czakon, J. (2023). Evaluation metrics for binary classification (and when to use them). In Neptune.ai. https://neptune.ai/blog/evaluation-metrics-binary-classification
Dam, S. R., Panday, S. P., & Thapa, T. B. (2021). Detecting clickbaits on Nepali news using SVM and RF. In Proceedings of 9th IOE Graduate Conference, 9, 140–146.
Kaur, S., Kumar, P., & Kumaraguru, P. (2020). Detecting clickbaits using two-phase hybrid CNN-LSTM biterm model. Expert Systems with Applications, 151, 113350. https://doi.org/10.1016/j.eswa.2020.113350
Luque, A., Carrasco, A., Martín, A., & de las Heras, A. (2019). The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognition, 91, 216–231. https://doi.org/10.1016/j.patcog.2019.02.023
Putri, D. U. K., & Pratomo, D. N. (2022). Clickbait detection of Indonesian news headlines using fine-tune bidirectional encoder representations from transformers (BERT). Inform: Jurnal Ilmiah Bidang Teknologi Informasi Dan Komunikasi, 7(2), 162–168.
Rajapaksha, R. W. V. P. C. (2020). Clickbait detection using multimodel fusion and transfer learning [Institut polytechnique de Paris]. https://doi.org/10.1016/j.eswa.2023.120537
Ruder, S. (2020). An overview of gradient descent optimization algorithms. In Ruder.io. https://www.ruder.io/optimizing-gradient-descent/
Safi, R. (n.d.). Ruqyai/Ruqia-Library: Python library used for Arabic NLP to process, prepare and clean the Arabic text. In GitHub. https://github.com/Ruqyai/Ruqia-Library
Sahana, M., Umesh, P., Kodipalli, A., & Rao, T. (2024). EmoCNN: Unleashing human emotions with customized CNN using different optimizers. Procedia Computer Science, 235, 1310–1318. https://doi.org/10.1016/J.PROCS.2024.04.124
Shaikh, M. A., & Annappanavar, S. (2020). A comparative approach for clickbait detection using deep learning. In 2020 IEEE Bombay Section Signature Conference (IBSSC) (pp. 21–24). https://doi.org/10.1109/IBSSC51096.2020.9332172
Zerrouki, T. (n.d.). Pyarabic: An Arabic language library for Python. https://pypi.org/project/PyArabic/#description
Zheng, H.-T., Chen, J.-Y., Yao, X., Sangaiah, A. K., Jiang, Y., & Zhao, C.-Z. (2018). Clickbait convolutional neural network. Symmetry, 10(5), 138. https://doi.org/10.3390/sym10050138