Real Time Object Detection using CNN based Single Shot Detector Model

Document Type : Special Issue on Pragmatic Approaches of Software Engineering for Big Data Analytics, Applications and Development

Authors

1 Professor, Department of IT, KIET Group of Institutions, Delhi-NCR Ghaziabad, Uttar Pradesh, India.

2 Professor, Department of CSE, IITM Group of Institutions, Sonipat, Haryana, India.

3 Student, Department of CSE, BMIET, Sonipat, Haryana, India.

Abstract

Object Detection has been one of the areas of interest of research community for over years and has made significant advances in its journey so far. There is a tremendous scope in the applications that would benefit with more innovations in the domain of object detection. Rapid growth in the field of machine learning has complemented the efforts in this area and in the recent times, research community has contributed a lot in real time object detection. In the current work, authors have implemented real time object detection and have made efforts to improve the accuracy of the detection mechanism. In the current research, we have used ssd_v2_inception_coco model as Single Shot Detection models deliver significantly better results. A dataset of more than 100 raw images is used for training and then xml files are generated using labellimg. Tensor flow records generated are passed through training pipelines using the proposed model. OpenCV captures real-time images and CNN performs convolution operations on images. The real time object detection delivers an accuracy of 92.7%, which is an improvement over some of the existing models already proposed earlier. Model detects hundreds of objects simultaneously. In the proposed model, accuracy of object detection significantly improvises over existing methodologies in practice. There is a substantial dataset to evaluate the accuracy of proposed model. The model may be readily useful for object detection applications including parking lots, human identification, and inventory management.

Keywords


Aggarwal, D., Bali, V., & Mittal, S. (2019). An insight into machine learning techniques for predictive analysis and feature selection. International Journal of Innovative Technology and Exploring Engineering, 8 Special(9), 342–349. https://doi.org/10.35940/ijitee.I1055.0789S19
Alganci, U., Soydas, M., & Sertel, E. (2020). Comparative research on deep learning approaches for airplane detection from very high-resolution satellite images. Remote Sensing, 12(3). https://doi.org/10.3390/rs12030458
Alie, N. M., Karis, M. S., Wong, G. J., Bahar, M. B., Sulaiman, M., Ibrahim, M. M., & Abidin, A. F. Z. (2017). Quality checking and inspection based on machine vision technique to determine tolerancevalue using single ceramic cup. ARPN Journal of Engineering and Applied Sciences, 12(8), 2737–2742.
Arad, B., Kurtser, P., Barnea, E., Harel, B., Edan, Y., & Ben-Shahar, O. (2019). Controlled lighting and illumination-independent target detection for real-time cost-efficient applications. The case study of sweet pepper robotic harvesting. Sensors (Switzerland), 19(6), 1–15. https://doi.org/10.3390/s19061390
Bali, V., Kumar, A., & Gangwar, S. (2020). A novel approach for wind speed forecasting using LSTM-ARIMA deep learning models. International Journal of Agricultural and Environmental Information Systems, 11(3), 13–30. https://doi.org/10.4018/IJAEIS.2020070102
Bashiri, F. S., LaRose, E., Peissig, P., & Tafti, A. P. (2018). MCIndoor20000: A fully-labeled image dataset to advance indoor objects detection. Data in Brief, 17, 71–75. https://doi.org/10.1016/j.dib.2017.12.047
Basri, H., Syarif, I., & Sukaridhoto, S. (2019). Faster R-CNN implementation method for multi-fruit detection using tensorflow platform. International Electronics Symposium on Knowledge Creation and Intelligent Computing, IES-KCIC 2018 - Proceedings, 337–340. https://doi.org/10.1109/KCIC.2018.8628566
Chandel, H., & Vatta, S. (2015). Occlusion Detection and Handling: A Review. International Journal of Computer Applications, 120(10), 33–38. https://doi.org/10.5120/21264-3857
Christlein, V., Spranger, L., Seuret, M., Nicolaou, A., Kr, P., & Maier, A. (2020). Deep Generalized Max Pooling. 2020(211).
Dignam, J. D., Martin, P. L., Shastry, B. S., & Roeder, R. G. (1983). Eukaryotic gene transcription with purified components. Methods in Enzymology, 101(C), 582–598. https://doi.org/10.1016/0076-6879(83)01039-3
Emami, S., & Suciu, V. P. (2012). Facial Recognition using OpenCV. Journal of Mobile, Embedded and Distributed Systems, 4(1), 38–43. http://www.jmeds.eu/index.php/jmeds/article/view/57
Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H. M., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118. https://doi.org/10.1038/nature21056
Fiedler, N., Bestmann, M., & Hendrich, N. (2019). ImageTagger: An Open Source Online Platform for Collaborative Image Labeling. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11374 LNAI(January), 162–169. https://doi.org/10.1007/978-3-030-27544-0_13
Guennouni, S., Ahaitouf, A., & Mansouri, A. (2015). Multiple object detection using OpenCV on an embedded platform. Colloquium in Information Science and Technology, CIST, 2015-Janua(January), 374–377. https://doi.org/10.1109/CIST.2014.7016649
Hamad, K., & Kaya, M. (2016). A Detailed Analysis of Optical Character Recognition Technology. International Journal of Applied Mathematics, Electronics and Computers, 4(Special Issue-1), 244–244. https://doi.org/10.18100/ijamec.270374
He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2020). Mask R-CNN. IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(2), 386–397. https://doi.org/10.1109/TPAMI.2018.2844175
Hernandez-Penaloza, G., Belmonte-Hernandez, A., Quintana, M., & Alvarez, F. (2017). A Multi-Sensor Fusion Scheme to Increase Life Autonomy of Elderly People with Cognitive Problems. IEEE Access, 6(c), 12775–12789. https://doi.org/10.1109/ACCESS.2017.2735809
Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. http://arxiv.org/abs/1704.04861
Hu, R., Dollar, P., He, K., Darrell, T., & Girshick, R. (2018). Learning to Segment Every Thing. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 4233–4241. https://doi.org/10.1109/CVPR.2018.00445
Khan, M., Chakraborty, S., Astya, R., & Khepra, S. (2019). Face Detection and Recognition Using OpenCV. Proceedings - 2019 International Conference on Computing, Communication, and Intelligent Systems, ICCCIS 2019, 2019-Janua, 116–119. https://doi.org/10.1109/ICCCIS48478.2019.8974493
Klette, R. (2014). Object Detection. 163050048, 375–413. https://doi.org/10.1007/978-1-4471-6320-6_10
Lakhal, M. I., Çevikalp, H., Escalera, S., & Ofli, F. (2018). Recurrent neural networks for remote sensing image classification. IET Computer Vision, 12(7), 1040–1045. https://doi.org/10.1049/iet-cvi.2017.0420
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multibox detector. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9905 LNCS, 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Memon, Q., Ahmed, M., Ali, S., Memon, A. R., & Shah, W. (2016). Self-driving and driver relaxing vehicle. 2016 2nd International Conference on Robotics and Artificial Intelligence, ICRAI 2016, November 2016, 170–174. https://doi.org/10.1109/ICRAI.2016.7791248
Mishra, S., Sarkar, U., Taraphder, S., Datta, S., Swain, D., Saikhom, R., Panda, S., & Laishram, M. (2017). Multivariate Statistical Data Analysis- Principal Component Analysis (PCA). International Journal of Livestock Research, January, 1. https://doi.org/10.5455/ijlr.20170415115235
Mitlohner, J., Neumaier, S., Umbrich, J., & Polleres, A. (2016). Characteristics of open data CSV files. Proceedings - 2016 2nd International Conference on Open and Big Data, OBD 2016, 838, 72–79. https://doi.org/10.1109/OBD.2016.18
Mulfari, D., Longo Minnolo, A., & Puliafito, A. (2017). Building tensor flow applications in smart city scenarios. 2017 IEEE International Conference on Smart Computing, SMARTCOMP 2017. https://doi.org/10.1109/SMARTCOMP.2017.7946991
Ouadiay, F. Z., Bouftaih, H., Bouyakhf, E. H., & Himmi, M. M. (2018). Simultaneous object detection and localization using convolutional neural networks. 2018 International Conference on Intelligent Systems and Computer Vision, ISCV 2018, 2018-May, 1–8. https://doi.org/10.1109/ISACV.2018.8354045
Owen, D., & Chang, P.-L. (2019). Detecting Reflections by Combining Semantic and Instance Segmentation. 1–12. http://arxiv.org/abs/1904.13273
Phadnis, R., Mishra, J., & Bendale, S. (2018). Objects Talk - Object Detection and Pattern Tracking Using TensorFlow. Proceedings of the International Conference on Inventive Communication and Computational Technologies, ICICCT 2018, 1216–1219. https://doi.org/10.1109/ICICCT.2018.8473331
Qin, Y., He, S., Zhao, Y., & Gong, Y. (2016). RoI Pooling Based Fast Multi-Domain Convolutional Neural Networks for Visual Tracking. 133, 198–202. https://doi.org/10.2991/aiie-16.2016.46
Rajaraman, S., Sornapudi, S., Kohli, M., & Antani, S. (2019). Assessment of an ensemble of machine learning models toward abnormality detection in chest radiographs. Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, 3689–3692. https://doi.org/10.1109/EMBC.2019.8856715
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem, 779–788. https://doi.org/10.1109/CVPR.2016.91
Ren, S., He, K., Girshick, R., & Sun, J. (2017). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Ren, S., He, K., Girshick, R., Zhang, X., & Sun, J. (2017). Object detection networks on convolutional feature maps. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(7), 1476–1481. https://doi.org/10.1109/TPAMI.2016.2601099
Sanjay, N. S., & Ahmadinia, A. (2019). MobileNet-Tiny: A deep neural network-based real-time object detection for rasberry Pi. Proceedings - 18th IEEE International Conference on Machine Learning and Applications, ICMLA 2019, 647–652. https://doi.org/10.1109/ICMLA.2019.00118
Scherer, D., Müller, A., & Behnke, S. (2010). Evaluation of pooling operations in convolutional architectures for object recognition. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 6354 LNCS(PART 3), 92–101. https://doi.org/10.1007/978-3-642-15825-4_10
Sedghi, H., Gupta, V., & Long, P. M. (2019). The Singular Values of Convolutional Layers. 1–12.
Shang, Y. (2020). Consensus of Hybrid Multi-Agent Systems with Malicious Nodes. IEEE Transactions on Circuits and Systems II: Express Briefs, 67(4), 685–689. https://doi.org/10.1109/TCSII.2019.2918752
Smith, R., Gu, C., Lee, D. S., Hu, H., Unnikrishnan, R., Ibarz, J., Arnoud, S., & Lin, S. (2016). End-to-end interpretation of the French street name signs dataset. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9913 LNCS(June), 411–426. https://doi.org/10.1007/978-3-319-46604-0_30
Sugimura, P., & Hartl, F. (2018). Building a Reproducible Machine Learning Pipeline. http://arxiv.org/abs/1810.04570
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 07-12-June, 1–9. https://doi.org/10.1109/CVPR.2015.7298594
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the Inception Architecture for Computer Vision. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-Decem, 2818–2826. https://doi.org/10.1109/CVPR.2016.308
Tran, D. S., Ho, N. H., Yang, H. J., Baek, E. T., Kim, S. H., & Lee, G. (2020). Real-time hand gesture spotting and recognition using RGB-D Camera and 3D convolutional neural network. Applied Sciences (Switzerland), 10(2). https://doi.org/10.3390/app10020722
Tu, T., O’Hallaron, D. R., & López, J. C. (2004). Etree: A database-oriented method for generating large octree meshes. Engineering with Computers, 20(2), 117–128. https://doi.org/10.1007/s00366-004-0283-5
Zhang, Y., Kong, J., Qi, M., Liu, Y., Wang, J., & Lu, Y. (2020). Object detection based on multiple information fusion net. Applied Sciences (Switzerland), 10(1). https://doi.org/10.3390/app10010418
Zhao, Z. Q., Zheng, P., Xu, S. T., & Wu, X. (2019). Object Detection with Deep Learning: A Review. IEEE Transactions on Neural Networks and Learning Systems, 30(11), 3212–3232. https://doi.org/10.1109/TNNLS.2018.2876865
Zhou, J., Zheng, H., Yin, H., & Chai, Y. (2018). Object detection from images based on MFF-RPN and multi-scale CNN. Lecture Notes in Electrical Engineering, 460, 343–351. https://doi.org/10.1007/978-981-10-6499-9