Data-Efficient Transformer Architectures for Image-Level Facial Forgery Detection: A Comparative Evaluation of ViT and DeiT

Document Type : Research Paper

Authors

1 Assistant Professor, Department of Computer Science& Engineering, B.M.S College of Engineering, Affiliated to Visvesvaraya Technological University, Belagavi, India.

2 Associate Professor, Department of Artificial Intelligence and Machine Learning, Bangalore Institute of Technology, Visvesvaraya Technological University, Bangalore, India.

3 Cloud Architect, Lead of AI initiative Program, Ernst & Young LLP, New York, USA.

4 Department of ECE, CMR Technical Campus, Hyderabad, Telangana, India.

5 Professor, ECE Department, Sreenidhi Institute of Science and Technology, India.

6 Assistant Professor, Department of Biotechnology, Vinayaka Mission`s Kirupananda Variyar Engineering College, Salem (Vinayaka Mission`s Research Foundation). India.

10.22059/jitm.2026.106252

Abstract

The rapid development of deepfake technologies has increased the demand for a credible and inter-pretable system for facial forgery detection. This study compares two transformer-based architec-tures—Vision Transformer (ViT) and Distilled Data-Efficient Image Transformer (DeiT)—for de-tecting real and manipulated facial images. The study aims to measure performance in terms of de-tection as well as interpretability and to address the weaknesses of traditional convolutional models. Data augmentation was applied, and a balanced dataset containing 8,000 real and fake images was constructed; both models were then fine-tuned under the same training environment. The explanatory ability of the models was incorporated using LIME. Experimental findings indicate that both models perform well, with DeiT being slightly more accurate at 94.62% than ViT at 93.6%, alongside faster convergence rates and less overfitting. Visualization of the focus on important facial areas confirms that the models reliably register synthetic artifacts. Although promising, generalization across dif-ferent datasets and enhancement of real-time performance remain challenges. Overall, the results validate transformer architectures—especially DeiT—as powerful and explainable deepfake detec-tion algorithms, valuable for ensuring safe and transparent digital media forensics.

Keywords


Akshatha G, Kempanna, M., Ashoka, S. B., & Kunta, J. P. K. C. (2026). Deep Learning for Facial Forgery Detection: Performance Evaluation of DenseNet201, InceptionV3 and ConvNeXt.
Akshatha, G., & Kempanna, M. (2025). Review of deep learning strategies in modern fakeface identification systems. Grenze International Journal of Engineering and Technology, 11(2), 4964–4974.
Altaei, M. S. M. (2022). A detection of deep fake in face images using deep learning. Wasit Journal of Computer and Mathematics Science1(4), 60-71.
Arshed, M. A., Alwadain, A., Faizan Ali, R., Mumtaz, S., Ibrahim, M., & Muneer, A. (2023). Unmasking deception: empowering deepfake detection with vision transformer network. Mathematics11(17), 3710.
Deng, L., Suo, H., & Li, D. (2022). Deepfake Video Detection Based on EfficientNet‐V2 Network. Computational Intelligence and Neuroscience, 2022(1), 3441549.
Dolhansky, B., Howes, R., Pflaum, B., Baram, N., & Ferrer, C. C. (2019). The deepfake detection challenge (dfdc) preview dataset. arXiv preprint arXiv:1910.08854.
Dosovitskiy, A. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.
Gong, L. Y., & Li, X. J. (2024). A contemporary survey on deepfake detection: datasets, algorithms, and challenges. Electronics13(3), 585.
James, U. U., Olarinoye, H. S., Uchenna, I. R., Idika, C. N., Ngene, O. J., Ijiga, O. M., & Itemuagbor, K. (2025). Combating Deepfake Threats Using X-FACTS Explainable CNN Framework for Enhanced Detection and Cybersecurity Resilience.
Korshunov, P., & Marcel, S. (2018). Deepfakes: a new threat to face recognition? assessment and detection. arXiv preprint arXiv:1812.08685.
Kumar, M. and Selvam, A., 2025. Deep Fake Face Detection Using Advanced R-CNN Architectures. IJSAT-International Journal on Science and Technology16(2).
Lad, S. (2024). Applied Ethical and Explainable AI in Adversarial Deepfake Detection: From Theory to Real-World Systems. Journal of Artificial Intelligence General science (JAIGS)6(1), 126-137.
Mansoor, N., & Iliev, A. I. (2025). Explainable AI for deepfake detection. Applied Sciences15(2), 725.
Nagahisarchoghaei, M., Nur, N., Cummins, L., Nur, N., Karimi, M. M., Nandanwar, S., ... & Rahimi, S. (2023). An empirical survey on explainable ai technologies: Recent trends, use-cases, and categories from technical and application perspectives. Electronics12(5), 1092.
Nida, N., Irtaza, A., & Ilyas, N. (2021). Forged face detection using ELA and deep learning techniques. In 2021 International Bhurban Conference on Applied Sciences and Technologies (IBCAST), 271-275.
Omodunbi, B. A., Sobowale, A., & Soladoye, A. Detection of Image-Based Deepfake using Deep Transfer Learning Algorithms.
Omotosho, L. O., Ogundoyin, I. K., Oyeniyi, J. O., & Oyeniran, O. A. (2021). A real time face recognition system using Alexnet deep convolutional network transfer learning model. Journal of Engineering Studies and Research27(2), 82-88.
Oulad-Kaddour, M., Haddadou, H., Vilda, C. C., Palacios-Alonso, D., Benatchba, K., & Cabello, E. (2023). Deep learning-based gender classification by training with fake data. IEEE Access11, 120766-120779.
Pai, G., & Sharmila, K. M. (2023). Semi-Dense U-Net: A Novel U-Net Architecture for Face Detection. International Journal of Advanced Computer Science and Applications14(6).
Rahman, M. H., Jannat, M. K. A., Islam, M. S., Grossi, G., Bursic, S., & Aktaruzzaman, M. (2023). Real-time face mask position recognition system based on MobileNet model. Smart health28, 100382.
Rajagukguk, N., Kencana, I. P. E. N., & Kusuma, I. G. L. W. (2024, May). Classification of Original and Fake Images Using Deep Learning-Resnet50. In Proceedings of the First International Conference on Applied Mathematics, Statistics, and Computing (ICAMSAC 2023), 110, 51.
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., & Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. In International conference on machine learning, 10347-10357.
Yasser, B., Hani, J., El-Gayar, S., Amgad, O., Ahmed, N., Ebied, H. M., ... & Salah, M. (2023). Deepfake Detection Using EfficientNet and XceptionNet. In 2023 Eleventh International Conference on Intelligent Computing and Information Systems (ICICIS), 598-603.
Zhou, L., & Yu, W. (2022). Improved Convolutional Neural Image Recognition Algorithm based on LeNet‐5. Journal of Computer Networks and Communications2022(1), 1636203.