Real Time Object Detection using CNN based Single Shot Detector Model

Document Type : Special Issue on Pragmatic Approaches of Software Engineering for Big Data Analytics, Applications and Development


1 Professor, Department of IT, KIET Group of Institutions, Delhi-NCR Ghaziabad, Uttar Pradesh, India.

2 Professor, Department of CSE, IITM Group of Institutions, Sonipat, Haryana, India.

3 Student, Department of CSE, BMIET, Sonipat, Haryana, India.


Object Detection has been one of the areas of interest of research community for over years and has made significant advances in its journey so far. There is a tremendous scope in the applications that would benefit with more innovations in the domain of object detection. Rapid growth in the field of machine learning has complemented the efforts in this area and in the recent times, research community has contributed a lot in real time object detection. In the current work, authors have implemented real time object detection and have made efforts to improve the accuracy of the detection mechanism. In the current research, we have used ssd_v2_inception_coco model as Single Shot Detection models deliver significantly better results. A dataset of more than 100 raw images is used for training and then xml files are generated using labellimg. Tensor flow records generated are passed through training pipelines using the proposed model. OpenCV captures real-time images and CNN performs convolution operations on images. The real time object detection delivers an accuracy of 92.7%, which is an improvement over some of the existing models already proposed earlier. Model detects hundreds of objects simultaneously. In the proposed model, accuracy of object detection significantly improvises over existing methodologies in practice. There is a substantial dataset to evaluate the accuracy of proposed model. The model may be readily useful for object detection applications including parking lots, human identification, and inventory management.


Aggarwal, D., Bali, V., & Mittal, S. (2019). An insight into machine learning techniques for predictive analysis and feature selection. International Journal of Innovative Technology and Exploring Engineering, 8 Special(9), 342–349.
Alganci, U., Soydas, M., & Sertel, E. (2020). Comparative research on deep learning approaches for airplane detection from very high-resolution satellite images. Remote Sensing, 12(3).
Alie, N. M., Karis, M. S., Wong, G. J., Bahar, M. B., Sulaiman, M., Ibrahim, M. M., & Abidin, A. F. Z. (2017). Quality checking and inspection based on machine vision technique to determine tolerancevalue using single ceramic cup. ARPN Journal of Engineering and Applied Sciences, 12(8), 2737–2742.
Arad, B., Kurtser, P., Barnea, E., Harel, B., Edan, Y., & Ben-Shahar, O. (2019). Controlled lighting and illumination-independent target detection for real-time cost-efficient applications. The case study of sweet pepper robotic harvesting. Sensors (Switzerland), 19(6), 1–15.
Bali, V., Kumar, A., & Gangwar, S. (2020). A novel approach for wind speed forecasting using LSTM-ARIMA deep learning models. International Journal of Agricultural and Environmental Information Systems, 11(3), 13–30.
Bashiri, F. S., LaRose, E., Peissig, P., & Tafti, A. P. (2018). MCIndoor20000: A fully-labeled image dataset to advance indoor objects detection. Data in Brief, 17, 71–75.
Basri, H., Syarif, I., & Sukaridhoto, S. (2019). Faster R-CNN implementation method for multi-fruit detection using tensorflow platform. International Electronics Symposium on Knowledge Creation and Intelligent Computing, IES-KCIC 2018 - Proceedings, 337–340.
Chandel, H., & Vatta, S. (2015). Occlusion Detection and Handling: A Review. International Journal of Computer Applications, 120(10), 33–38.
Christlein, V., Spranger, L., Seuret, M., Nicolaou, A., Kr, P., & Maier, A. (2020). Deep Generalized Max Pooling. 2020(211).
Dignam, J. D., Martin, P. L., Shastry, B. S., & Roeder, R. G. (1983). Eukaryotic gene transcription with purified components. Methods in Enzymology, 101(C), 582–598.
Emami, S., & Suciu, V. P. (2012). Facial Recognition using OpenCV. Journal of Mobile, Embedded and Distributed Systems, 4(1), 38–43.
Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118.
Fiedler, N., Bestmann, M., & Hendrich, N. (2019). ImageTagger: An Open Source Online Platform for Collaborative Image Labeling. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11374 LNAI(January), 162–169.
Guennouni, S., Ahaitouf, A., & Mansouri, A. (2015). Multiple object detection using OpenCV on an embedded platform. Colloquium in Information Science and Technology, CIST, 2015-Janua(January), 374–377.
Hamad, K., & Kaya, M. (2016). A Detailed Analysis of Optical Character Recognition Technology. International Journal of Applied Mathematics, Electronics and Computers, 4(Special Issue-1), 244–244.
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2020). Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 386–397.
Hernandez-Penaloza, G., Belmonte-Hernandez, A., Quintana, M., & Alvarez, F. (2017). A Multi-Sensor Fusion Scheme to Increase Life Autonomy of Elderly People with Cognitive Problems. IEEE Access, 6(c), 12775–12789.
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.
Hu, R., Dollar, P., He, K., Darrell, T., & Girshick, R. (2018). Learning to Segment Every Thing. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 4233–4241.
Khan, M., Chakraborty, S., Astya, R., & Khepra, S. (2019). Face Detection and Recognition Using OpenCV. Proceedings - 2019 International Conference on Computing, Communication, and Intelligent Systems, ICCCIS 2019, 2019-Janua, 116–119.
Klette, R. (2014). Object Detection. 163050048, 375–413.
Lakhal, M. I., Çevikalp, H., Escalera, S., & Ofli, F. (2018). Recurrent neural networks for remote sensing image classification. IET Computer Vision, 12(7), 1040–1045.
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9905 LNCS, 21–37.
Memon, Q., Ahmed, M., Ali, S., Memon, A. R., & Shah, W. (2016). Self-driving and driver relaxing vehicle. 2016 2nd International Conference on Robotics and Artificial Intelligence, ICRAI 2016, November 2016, 170–174.
Mishra, S., Sarkar, U., Taraphder, S., Datta, S., Swain, D., Saikhom, R., Panda, S., & Laishram, M. (2017). Multivariate Statistical Data Analysis- Principal Component Analysis (PCA). International Journal of Livestock Research, January, 1.
Mitlohner, J., Neumaier, S., Umbrich, J., & Polleres, A. (2016). Characteristics of open data CSV files. Proceedings - 2016 2nd International Conference on Open and Big Data, OBD 2016, 838, 72–79.
Mulfari, D., Longo Minnolo, A., & Puliafito, A. (2017). Building tensor flow applications in smart city scenarios. 2017 IEEE International Conference on Smart Computing, SMARTCOMP 2017.
Ouadiay, F. Z., Bouftaih, H., Bouyakhf, E. H., & Himmi, M. M. (2018). Simultaneous object detection and localization using convolutional neural networks. 2018 International Conference on Intelligent Systems and Computer Vision, ISCV 2018, 2018-May, 1–8.
Owen, D., & Chang, P.-L. (2019). Detecting Reflections by Combining Semantic and Instance Segmentation. 1–12.
Phadnis, R., Mishra, J., & Bendale, S. (2018). Objects Talk - Object Detection and Pattern Tracking Using TensorFlow. Proceedings of the International Conference on Inventive Communication and Computational Technologies, ICICCT 2018, 1216–1219.
Qin, Y., He, S., Zhao, Y., & Gong, Y. (2016). RoI Pooling Based Fast Multi-Domain Convolutional Neural Networks for Visual Tracking. 133, 198–202.
Rajaraman, S., Sornapudi, S., Kohli, M., & Antani, S. (2019). Assessment of an ensemble of machine learning models toward abnormality detection in chest radiographs. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, 3689–3692.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem, 779–788.
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149.
Ren, S., He, K., Girshick, R., Zhang, X., & Sun, J. (2017). Object detection networks on convolutional feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(7), 1476–1481.
Sanjay, N. S., & Ahmadinia, A. (2019). MobileNet-Tiny: A deep neural network-based real-time object detection for rasberry Pi. Proceedings - 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019, 647–652.
Scherer, D., Müller, A., & Behnke, S. (2010). Evaluation of pooling operations in convolutional architectures for object recognition. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 6354 LNCS(PART 3), 92–101.
Sedghi, H., Gupta, V., & Long, P. M. (2019). The Singular Values of Convolutional Layers. 1–12.
Shang, Y. (2020). Consensus of Hybrid Multi-Agent Systems with Malicious Nodes. IEEE Transactions on Circuits and Systems II: Express Briefs, 67(4), 685–689.
Smith, R., Gu, C., Lee, D. S., Hu, H., Unnikrishnan, R., Ibarz, J., Arnoud, S., & Lin, S. (2016). End-to-end interpretation of the French street name signs dataset. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9913 LNCS(June), 411–426.
Sugimura, P., & Hartl, F. (2018). Building a Reproducible Machine Learning Pipeline.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 07-12-June, 1–9.
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the Inception Architecture for Computer Vision. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem, 2818–2826.
Tran, D. S., Ho, N. H., Yang, H. J., Baek, E. T., Kim, S. H., & Lee, G. (2020). Real-time hand gesture spotting and recognition using RGB-D Camera and 3D convolutional neural network. Applied Sciences (Switzerland), 10(2).
Tu, T., O’Hallaron, D. R., & López, J. C. (2004). Etree: A database-oriented method for generating large octree meshes. Engineering with Computers, 20(2), 117–128.
Zhang, Y., Kong, J., Qi, M., Liu, Y., Wang, J., & Lu, Y. (2020). Object detection based on multiple information fusion net. Applied Sciences (Switzerland), 10(1).
Zhao, Z. Q., Zheng, P., Xu, S. T., & Wu, X. (2019). Object Detection with Deep Learning: A Review. IEEE Transactions on Neural Networks and Learning Systems, 30(11), 3212–3232.
Zhou, J., Zheng, H., Yin, H., & Chai, Y. (2018). Object detection from images based on MFF-RPN and multi-scale CNN. Lecture Notes in Electrical Engineering, 460, 343–351.